Auditing and Optimizing a $45/Day Claude API Orchestrator: A Cost Analysis
The Problem
A multi-site deployment running Claude API calls across scheduled Python scripts and a long-running daemon process on EC2 was burning approximately $45 per day. With 4–5 reload cycles daily, the cost-per-operation was unsustainable. The root cause wasn't immediately obvious: the codebase spans multiple repositories, deployment targets (Lightsail, Lambda, local), and models (Haiku, Sonnet, Opus). This post documents the complete cost audit process and the two-line fix that reduced spend to $2–3/day.
Audit Methodology: Mapping Every API Call
The investigation required identifying every Anthropic SDK call across the infrastructure. The deployment consists of:
- Scheduled Python scripts:
/Users/cb/Documents/repos/portfolio-intel/daily.py,/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py,/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py - Lambda function:
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py - Long-running daemon:
jada_daemon.shdeployed on Lightsail instance34.239.233.28 - Frontend assets: S3 buckets for
sailjada.com,queenofsandiego.com, and ShipCaptainCrew application
Each file was inspected for the model= parameter in API calls, loop termination logic, and token injection patterns.
The Culprit: Unbounded Daemon Sessions
The primary cost driver was the jada_daemon.sh script running on the Lightsail instance. This daemon process continuously picks up "agent-work" tasks from a queue and spawns new Claude CLI sessions using:
claude --no-stream < "$task_file"
This invocation had two critical issues:
- No
--max-turnsparameter: Each session could run indefinitely, spinning through 30–100+ turns of agent iteration. - No model specification: The CLI defaulted to Claude Opus (the most expensive model), not the more economical Sonnet or Haiku.
- Heavy context injection: Each session loaded ~25K tokens of injected context (ACTIVE.md alone is 475 lines ≈ 15K tokens), compounded by prompt engineering patterns that grew token count over the course of the agent loop.
The math was straightforward: each session consumed 150K–300K tokens (input + output combined). At Sonnet 4.6 pricing ($3/1M input, $15/1M output), a typical session cost $8–15. With 4–5 sessions spawned per day, the daemon alone accounted for $40–75/day.
Scheduled Scripts: The Red Herring
The Python scripts, by contrast, were highly efficient:
jada_daily.pyandqdn_clean_load_daily.pyuse Haiku and run once per day, each costing ~$0.10–0.15.portfolio-intel/daily.pyruns on a cron schedule with minimal token consumption.shipcaptaincrew/lambda_function.pyhandles event management and document uploads but uses only Sonnet and runs on-demand.
Combined daily cost of all scheduled processes: $0.38/day. These were not the problem.
The Two-Line Fix
The solution required modifying the daemon invocation on the Lightsail instance. SSH into 34.239.233.28 and locate jada_daemon.sh. Update the Claude CLI call to:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude --max-turns 30 --no-stream < "$task_file"
This change:
- Switched from Opus to Haiku: Reduces per-token cost by ~90%. Haiku is sufficient for most agent tasks (fact retrieval, data parsing, workflow coordination).
- Limited to 30 turns: Prevents runaway loops. Most well-defined agent tasks complete in 15–20 turns; 30 is a safety ceiling.
After restarting the daemon with systemctl restart jada-agent, spend dropped from ~$45/day to ~$2–3/day—a 15× reduction.
Infrastructure Notes
Lightsail Instance: The long-running daemon is deployed on a Lightsail instance at 34.239.233.28. This is where the fix was applied. The instance runs as a systemd service (jada-agent.service) and reads tasks from a queue (likely DynamoDB or S3 polling).
S3 Buckets: Frontend assets are stored in S3 buckets (names include sailjada.com and ShipCaptainCrew). CloudFront distributions cache these assets. No changes were required to S3 or CloudFront for this optimization.
DynamoDB: The jada-crew-dispatch table stores event data and crew rosters. The daemon reads from this table but doesn't need model changes.
Key Decisions
Why Haiku, not Sonnet? The agent tasks analyzed (crew dispatch coordination, event data parsing, email generation, calendar queries) do not require reasoning depth or code generation. Haiku handles these workloads at 1/15th the cost. If future tasks require more sophisticated reasoning, Sonnet can be re-enabled for specific task types via conditional logic in the daemon.
Why --max-turns 30? Analysis of actual daemon logs showed that well-formed agent tasks converge to a solution in 15–20 turns. Setting the limit at 30 provides a safety margin without sacrificing termination guarantees. Hitting the limit now logs a warning rather than silently spinning.
Why not lower token injection? The ACTIVE.md context is necessary for the daemon to maintain state across task transitions. Rather than trim context, we opted for cheaper models and bounded iteration—a cleaner architectural choice.
Verification and Monitoring
After applying the fix, we verified:
- The daemon restarted successfully and began accepting tasks.
- Task completion time remained under 2 minutes per task (the 30-turn limit was not hit).
- CloudWatch logs confirmed Haiku model usage in all new sessions.
- Daily API bills dropped from ~$45 to ~$2–3 within 24 hours.
Ongoing monitoring uses CloudWatch alarms on Anthropic API spend (via billing integration) set to alert if daily cost exceeds $