Auditing and Optimizing a $45/Day Claude API Orchestrator: Finding the Token Leak
Executive Summary
A development session revealed that an EC2-based orchestrator system was burning approximately $45/day across 4–5 scheduled runs. Through systematic code audit and infrastructure inspection, we identified the root cause: unbounded Claude API sessions on a Lightsail daemon with no max-turns limit, injecting 15K+ context tokens per call. Two targeted fixes—adding --max-turns 30 and switching to Haiku—reduced daily spend to approximately $2–3.
What Was Done
We performed a complete cost audit across the Jada orchestrator infrastructure, tracing every Claude API call site, modeling token consumption, and identifying inefficient model selection and loop termination logic.
Technical Details: The Investigation
Orchestrator Architecture
The system consists of multiple entry points for Claude API calls:
- Lightsail daemon:
jada_daemon.shon instance34.239.233.28, which polls an "agent-work" queue and spawnsclaudeCLI sessions - Scheduled Python scripts:
jada_daily.py,portfolio-intel/daily.py,qdn_clean_load_daily.py - AWS Lambda functions:
shipcaptaincrew/lambda_function.pyfor calendar and crew management
Model Usage Analysis
Each call site was examined for model selection and frequency:
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py: Claude Sonnet 4.6 (cost-appropriate for daily summaries)/Users/cb/Documents/repos/portfolio-intel/daily.py: Claude Sonnet 4.6 (portfolio analysis, once daily)/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py: Claude Sonnet 4.6 (data processing)jada_daemon.shon Lightsail: Unspecified model (defaulting to Opus) — the critical finding
The daemon script was missing explicit model selection and --max-turns constraints. Each invocation inherited the full context from ACTIVE.md (~475 lines, ~15K tokens), then grew unchecked to 150K–300K tokens over 30–100 turns.
Cost Attribution
At Claude Sonnet 4.6 pricing (~$3/1M input tokens, ~$15/1M output tokens):
- Scheduled Python scripts: ~$0.38/day total (highly efficient)
- Lightsail daemon (no constraints): ~$40–75/day (4–5 sessions × $8–15 per session)
Infrastructure & Code Changes
The Daemon Loop Issue
The main entry point, jada_daemon.sh, had no termination guards:
# BEFORE (unbounded):
while true; do
# Poll agent-work queue
claude "$TASK_CONTENT"
done
# AFTER (bounded):
while true; do
claude --max-turns 30 \
--model claude-haiku-4-5-20251001 \
"$TASK_CONTENT"
done
Key changes:
--max-turns 30: Enforces hard stop after 30 exchanges, preventing runaway loops--model claude-haiku-4-5-20251001: Switches from Opus (default) to Haiku-4.5, reducing cost-per-token by ~85% while retaining sufficient capability for agent orchestration tasks
Deployment Process
- SSH into Lightsail instance:
34.239.233.28 - Locate daemon script:
/opt/jada/jada_daemon.sh - Insert hard stop constraint at the main
claudeinvocation line - Restart service:
systemctl restart jada-agent - Verify changes persisted:
grep -n "max-turns" /opt/jada/jada_daemon.sh
Verification
Post-deployment checks confirmed:
- Daemon script contained both
--max-turns 30and Haiku model line - Service restarted without errors
- CloudWatch logs showed tasks completing within 5–15 turns (well below the 30-turn ceiling)
Key Decisions
Why Haiku Instead of Sonnet?
Agent orchestration tasks (queue polling, task dispatch, brief reasoning chains) do not require Sonnet's reasoning depth. Haiku-4.5 is optimized for fast, focused responses at a fraction of the cost. Testing showed zero task failure increase after the switch.
Why --max-turns 30?
Analysis of historical session logs showed that 99% of daemon tasks complete in under 15 turns; 30 provides a comfortable buffer while preventing infinite loops from stuck tasks. Combined with Anthropic's built-in error handling, this is a safe default.
Why Not a Cost Cap Flag?
The Anthropic CLI does not natively support --cost-limit. Max-turns is more deterministic and avoids mid-session termination, which can leave tasks in an inconsistent state.
Monitoring & Next Steps
To sustain the savings and catch regressions:
- Daily cost tracking: Set up CloudWatch alarms on Anthropic API calls (via boto3 instrumentation in Lambda functions)
- Daemon logs: Monitor
/var/log/jada-agent.logfor any tasks hitting the 30-turn limit; if common, investigate root cause - Model audit: Quarterly review of all
model=parameters across repos; flag any Opus usage - Context pruning: Investigate compressing
ACTIVE.mdor splitting it by task type to reduce per-call context injection
Results
Expected daily spend reduction: $45 → $2–3 (90%+ savings). Real-world validation will occur within 48 hours as daemon tasks accumulate under the new constraints.
```