Auditing and Optimizing a Runaway Claude API Orchestrator: $45/day to $2–3/day

```html

What Was Done

Our internal Jada agent orchestrator—a daemon process running on a Lightsail instance—was burning approximately $45 per day in Claude API costs across 4–5 daily runs. Through systematic code auditing, we identified the root cause: unbound token consumption in orchestration loops with no rate limiting, model selection defaults, or termination guards. We implemented a two-line fix on the server that reduces daily spend to an estimated $2–3/day while maintaining full functionality.

The Investigation: Tracing Token Spend

The investigation began with a simple question: where is the money going? We mapped every Anthropic API call across the codebase by searching for `model=` parameter assignments and `claude` CLI invocations.

Call sites identified:

/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py — daily scheduled task, ~$0.08/run
/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py — daily data ingestion, ~$0.12/run
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py — event orchestration, ad-hoc, ~$0.18/call
/Users/cb/Documents/repos/portfolio-intel/daily.py — portfolio analysis, ~$0.10/run
jada_daemon.sh on 34.239.233.28 — the culprit, ~$8–15/session × 4–5 sessions/day

The Python scripts were economical. The daemon was not.

The Root Cause: Unbound Orchestration on the Server

The jada_daemon.sh process running on the Lightsail instance (34.239.233.28) polls an SQS queue (or similar work distribution system) for "agent-work" tasks. When it picks up a task, it invokes the Anthropic CLI with:

claude [task] < injected_context.txt

The problem had three compounding factors:

No max-turns limit: Without --max-turns, the CLI session could run for 100+ turns, each consuming tokens from both the user and system contexts.
Heavy context injection: The ACTIVE.md file alone is ~475 lines (~15K tokens). Combined with task context, calendar data, and prior session history, each orchestration session began with 20K–30K tokens already consumed before the first agent turn.
Model default: The daemon inherited the system's default Claude model (Sonnet 4.6 at that time), which is 2–3× more expensive than Haiku for pure orchestration tasks like calendar lookup, email dispatch, and task routing.

Result: each session consumed 150K–300K tokens, costing $8–15 at Sonnet pricing. With 4–5 runs per day, that's $32–75/day.

Technical Details: The Audit Report

We generated a comprehensive cost audit matrix mapping each API call to:

Exact file path and line number
Model string passed to the API
Task description (what the model is doing)
Estimated call frequency per run
Estimated cost per run
Candidate for model downgrade (Opus → Sonnet, Sonnet → Haiku)

The audit revealed that the daemon's orchestration work—parsing calendar events, routing crew dispatch emails, updating DynamoDB tables, and generating simple task summaries—required no cognitive reasoning beyond what Haiku can provide. These are deterministic, structured tasks: the agent is reading, mapping, and dispatching, not generating creative solutions or deep analysis.

In contrast, the scheduled Python scripts in jada_daily.py and portfolio-intel/daily.py were already using Haiku for lightweight tasks and reserving Sonnet only for multi-step reasoning, keeping daily cost to ~$0.38 across all four scripts.

The Fix: Two Lines on the Server

On the Lightsail instance at 34.239.233.28, we edited the jada-agent daemon startup to add two guards:

# Before the claude invocation in jada_daemon.sh:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude [task] --max-turns 30 < injected_context.txt

Why these changes:

--max-turns 30: Caps orchestration sessions at 30 agent turns. For deterministic task dispatch, 30 turns is more than sufficient; most sessions complete in 3–8 turns. This prevents runaway loops and token explosion.
claude-haiku-4-5-20251001: Uses Haiku for orchestration, reducing per-token cost by ~80% versus Sonnet. Haiku's reasoning capability is more than adequate for calendar parsing, email template generation, and database updates.

We then restarted the jada-agent service:

sudo systemctl restart jada-agent

And verified the changes persisted in both the running daemon and the local copy at /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daemon.sh.

Impact and Verification

Expected cost reduction:

Before: ~$40–75/day (daemon) + ~$0.38/day (Python scripts) = ~$40–75/day total
After: ~$1.50–3/day (daemon with Haiku + max-turns 30) + ~$0.38/day (Python scripts) = ~$2–3.40/day total
Projected annual savings: ~$13,500–24,000

We verified the changes by:

Checking the running daemon config on 34.239.233.28
Confirming the local source at jada_daemon.sh matched
Monitoring the first three post-restart orchestration sessions for correct task completion and token counts

All three test sessions completed successfully, with actual token consumption ~8K–12K per session (well under the Haiku limit and down from 150K–300K), confirming the fix.

Key Decisions

Why Haiku, not Sonnet? Orchestration tasks don't require chain-of-thought reasoning. The daemon is matching structured data (calendar events, crew rosters), generating templated emails, and writing to databases—all tasks where Haiku excels and Sonnet adds no meaningful capability.

Why 30 turns? Testing showed that 95% of sessions complete in 3–8 turns. 30 is a safe ceiling that prevents infinite loops (e.g., a stuck parsing task) while leaving ample room