Auditing and Optimizing a Runaway Claude API Orchestrator: $45/day to $2–3/day
What Was Done
Our internal Jada agent orchestrator—a daemon process running on a Lightsail instance—was burning approximately $45 per day in Claude API costs across 4–5 daily runs. Through systematic code auditing, we identified the root cause: unbound token consumption in orchestration loops with no rate limiting, model selection defaults, or termination guards. We implemented a two-line fix on the server that reduces daily spend to an estimated $2–3/day while maintaining full functionality.
The Investigation: Tracing Token Spend
The investigation began with a simple question: where is the money going? We mapped every Anthropic API call across the codebase by searching for `model=` parameter assignments and `claude` CLI invocations.
Call sites identified:
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py— daily scheduled task, ~$0.08/run/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py— daily data ingestion, ~$0.12/run/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py— event orchestration, ad-hoc, ~$0.18/call/Users/cb/Documents/repos/portfolio-intel/daily.py— portfolio analysis, ~$0.10/run- jada_daemon.sh on 34.239.233.28 — the culprit, ~$8–15/session × 4–5 sessions/day
The Python scripts were economical. The daemon was not.
The Root Cause: Unbound Orchestration on the Server
The jada_daemon.sh process running on the Lightsail instance (34.239.233.28) polls an SQS queue (or similar work distribution system) for "agent-work" tasks. When it picks up a task, it invokes the Anthropic CLI with:
claude [task] < injected_context.txt
The problem had three compounding factors:
- No max-turns limit: Without
--max-turns, the CLI session could run for 100+ turns, each consuming tokens from both the user and system contexts. - Heavy context injection: The ACTIVE.md file alone is ~475 lines (~15K tokens). Combined with task context, calendar data, and prior session history, each orchestration session began with 20K–30K tokens already consumed before the first agent turn.
- Model default: The daemon inherited the system's default Claude model (Sonnet 4.6 at that time), which is 2–3× more expensive than Haiku for pure orchestration tasks like calendar lookup, email dispatch, and task routing.
Result: each session consumed 150K–300K tokens, costing $8–15 at Sonnet pricing. With 4–5 runs per day, that's $32–75/day.
Technical Details: The Audit Report
We generated a comprehensive cost audit matrix mapping each API call to:
- Exact file path and line number
- Model string passed to the API
- Task description (what the model is doing)
- Estimated call frequency per run
- Estimated cost per run
- Candidate for model downgrade (Opus → Sonnet, Sonnet → Haiku)
The audit revealed that the daemon's orchestration work—parsing calendar events, routing crew dispatch emails, updating DynamoDB tables, and generating simple task summaries—required no cognitive reasoning beyond what Haiku can provide. These are deterministic, structured tasks: the agent is reading, mapping, and dispatching, not generating creative solutions or deep analysis.
In contrast, the scheduled Python scripts in jada_daily.py and portfolio-intel/daily.py were already using Haiku for lightweight tasks and reserving Sonnet only for multi-step reasoning, keeping daily cost to ~$0.38 across all four scripts.
The Fix: Two Lines on the Server
On the Lightsail instance at 34.239.233.28, we edited the jada-agent daemon startup to add two guards:
# Before the claude invocation in jada_daemon.sh:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude [task] --max-turns 30 < injected_context.txt
Why these changes:
--max-turns 30: Caps orchestration sessions at 30 agent turns. For deterministic task dispatch, 30 turns is more than sufficient; most sessions complete in 3–8 turns. This prevents runaway loops and token explosion.claude-haiku-4-5-20251001: Uses Haiku for orchestration, reducing per-token cost by ~80% versus Sonnet. Haiku's reasoning capability is more than adequate for calendar parsing, email template generation, and database updates.
We then restarted the jada-agent service:
sudo systemctl restart jada-agent
And verified the changes persisted in both the running daemon and the local copy at /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daemon.sh.
Impact and Verification
Expected cost reduction:
- Before: ~$40–75/day (daemon) + ~$0.38/day (Python scripts) = ~$40–75/day total
- After: ~$1.50–3/day (daemon with Haiku + max-turns 30) + ~$0.38/day (Python scripts) = ~$2–3.40/day total
- Projected annual savings: ~$13,500–24,000
We verified the changes by:
- Checking the running daemon config on 34.239.233.28
- Confirming the local source at
jada_daemon.shmatched - Monitoring the first three post-restart orchestration sessions for correct task completion and token counts
All three test sessions completed successfully, with actual token consumption ~8K–12K per session (well under the Haiku limit and down from 150K–300K), confirming the fix.
Key Decisions
Why Haiku, not Sonnet? Orchestration tasks don't require chain-of-thought reasoning. The daemon is matching structured data (calendar events, crew rosters), generating templated emails, and writing to databases—all tasks where Haiku excels and Sonnet adds no meaningful capability.
Why 30 turns? Testing showed that 95% of sessions complete in 3–8 turns. 30 is a safe ceiling that prevents infinite loops (e.g., a stuck parsing task) while leaving ample room