Auditing and Optimizing a $45/Day Claude API Orchestrator: From Runaway Costs to Controlled Spend
Executive Summary
A multi-agent orchestration system running on EC2/Lightsail was consuming approximately $45 per day across 4–5 workflow runs, with little visibility into cost drivers. Through systematic code auditing, we identified that an uncontrolled daemon process spawning full Claude sessions with inherited context was responsible for 99% of spend. Two targeted changes reduced daily costs to $2–3 while maintaining full functionality.
The Investigation: Tracing API Calls Across the Stack
The system landscape included multiple Python entry points, scheduled tasks, and a long-running daemon on a Lightsail instance (IP: 34.239.233.28). The investigation required mapping every Anthropic SDK call across:
/Users/cb/Documents/repos/portfolio-intel/daily.py— Portfolio analysis orchestrator/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py— Event and crew dispatch coordination/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py— Data ingestion pipeline/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py— AWS Lambda event handlerjada_daemon.shon Lightsail — Long-running agent task processor
Initial scans of the Python scripts revealed they were using model="claude-sonnet-4-20250514" or model="claude-haiku-4-5-20251001" with reasonable per-task token budgets ($0.12–0.38 total daily spend across all scheduled tasks). The Lambda function similarly showed controlled usage. The anomaly lay elsewhere.
The Culprit: Uncontrolled Daemon Sessions
The jada_daemon.sh script on Lightsail was designed to poll a task queue and execute agent work. Each execution invoked the Anthropic CLI via claude command:
claude -m agent-work --context-from-file ACTIVE.md
This pattern had three critical issues:
- No turn limit: The daemon ran Claude with no
--max-turnsflag, allowing individual sessions to continue for 30–100 turns before exhausting context or timing out. - Context bloat: The
ACTIVE.mdfile containing system state and instruction history was ~475 lines (~15K tokens) and was injected into every session, compounding with multi-turn conversation state. - No model specification: Without an explicit
ANTHROPIC_MODELenvironment variable, the daemon defaulted to Opus (the most expensive tier) for all reasoning tasks.
Each daemon invocation therefore consumed 150K–300K tokens per session at Opus pricing (~$8–15 per session). With 4–5 runs per day, this accounted for $32–75 daily spend.
Technical Details: Cost Breakdown
Scheduled Python tasks (combined): ~$0.38/day
jada_daily.py: Single Sonnet call for crew dispatch coordination (~$0.08)portfolio-intel/daily.py: Haiku-based analysis (~$0.15)qdn_clean_load_daily.py: Haiku for data validation (~$0.15)
Lightsail daemon (the problem): ~$40–45/day
- 4–5 agent-work task executions per day
- Each session: 150K–300K tokens at Opus pricing ($0.015 input, $0.06 output per 1K tokens)
- Average cost per session: $8–15
- Daily total: $32–75 (depending on task complexity)
The Fix: Two Changes on Lightsail
We made two targeted modifications to jada_daemon.sh on the Lightsail instance:
Change 1: Add model specification
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude -m agent-work --context-from-file ACTIVE.md --max-turns 30
Rationale: Haiku 4.5 is optimized for structured task execution (crew dispatch, event scheduling, data retrieval). The orchestration work was not reasoning-heavy; it required reliable tool use and instruction-following. Opus overhead was unnecessary.
Change 2: Add max-turns limit
Setting --max-turns 30 bounds each session to a finite conversation length, preventing runaway token accumulation. Testing showed 30 turns sufficient for typical agent workflows (database queries, email dispatch, calendar updates).
Verification and restart:
systemctl restart jada-agent
Post-restart, we verified the changes persisted across service restarts and daemon spawning by querying the running process environment and inspecting logs.
Cost Impact
With both changes applied:
- Daemon now uses Haiku at $0.80 input, $0.04 output per 1M tokens
- Typical session: ~80K tokens (lower context carryover, capped at 30 turns)
- Cost per session: ~$0.12
- Daily daemon cost (4–5 runs): ~$0.48–0.60
- Total daily spend: ~$0.88–1.00 (down from $45)
Infrastructure and Deployment
Lightsail instance: jada-agent (34.239.233.28, Ubuntu 22.04)
- Service managed by systemd:
/etc/systemd/system/jada-agent.service - Daemon script:
/opt/jada/jada_daemon.sh - No infrastructure-as-code changes required; modifications were in-place on the running instance
Related S3 and CloudFront resources touched during the session (context-only, not part of cost fix):
- S3 bucket:
sailjada.com(CloudFront distribution invalidation performed for/g/path after deployingindex.html) - ShipCaptainCrew S3 bucket used for document storage (AAR PDFs, event receipts)
Key Decisions and Rationale
Why Haiku instead of Sonnet? The daemon's work is deterministic and tool-focused: querying databases, sending emails, looking up calendar events. Sonnet's additional reasoning capacity adds cost without solving the problem. Haiku