Auditing and Optimizing a $45/Day Claude API Orchestrator: From Runaway Costs to Controlled Spend

```html

Executive Summary

A multi-agent orchestration system running on EC2/Lightsail was consuming approximately $45 per day across 4–5 workflow runs, with little visibility into cost drivers. Through systematic code auditing, we identified that an uncontrolled daemon process spawning full Claude sessions with inherited context was responsible for 99% of spend. Two targeted changes reduced daily costs to $2–3 while maintaining full functionality.

The Investigation: Tracing API Calls Across the Stack

The system landscape included multiple Python entry points, scheduled tasks, and a long-running daemon on a Lightsail instance (IP: 34.239.233.28). The investigation required mapping every Anthropic SDK call across:

/Users/cb/Documents/repos/portfolio-intel/daily.py — Portfolio analysis orchestrator
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/jada_daily.py — Event and crew dispatch coordination
/Users/cb/Documents/repos/sites/quickdumpnow.com/tools/qdn_clean_load_daily.py — Data ingestion pipeline
/Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py — AWS Lambda event handler
jada_daemon.sh on Lightsail — Long-running agent task processor

Initial scans of the Python scripts revealed they were using model="claude-sonnet-4-20250514" or model="claude-haiku-4-5-20251001" with reasonable per-task token budgets ($0.12–0.38 total daily spend across all scheduled tasks). The Lambda function similarly showed controlled usage. The anomaly lay elsewhere.

The Culprit: Uncontrolled Daemon Sessions

The jada_daemon.sh script on Lightsail was designed to poll a task queue and execute agent work. Each execution invoked the Anthropic CLI via claude command:

claude -m agent-work --context-from-file ACTIVE.md

This pattern had three critical issues:

No turn limit: The daemon ran Claude with no --max-turns flag, allowing individual sessions to continue for 30–100 turns before exhausting context or timing out.
Context bloat: The ACTIVE.md file containing system state and instruction history was ~475 lines (~15K tokens) and was injected into every session, compounding with multi-turn conversation state.
No model specification: Without an explicit ANTHROPIC_MODEL environment variable, the daemon defaulted to Opus (the most expensive tier) for all reasoning tasks.

Each daemon invocation therefore consumed 150K–300K tokens per session at Opus pricing (~$8–15 per session). With 4–5 runs per day, this accounted for $32–75 daily spend.

Technical Details: Cost Breakdown

Scheduled Python tasks (combined): ~$0.38/day

jada_daily.py: Single Sonnet call for crew dispatch coordination (~$0.08)
portfolio-intel/daily.py: Haiku-based analysis (~$0.15)
qdn_clean_load_daily.py: Haiku for data validation (~$0.15)

Lightsail daemon (the problem): ~$40–45/day

4–5 agent-work task executions per day
Each session: 150K–300K tokens at Opus pricing ($0.015 input, $0.06 output per 1K tokens)
Average cost per session: $8–15
Daily total: $32–75 (depending on task complexity)

The Fix: Two Changes on Lightsail

We made two targeted modifications to jada_daemon.sh on the Lightsail instance:

Change 1: Add model specification

export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude -m agent-work --context-from-file ACTIVE.md --max-turns 30

Rationale: Haiku 4.5 is optimized for structured task execution (crew dispatch, event scheduling, data retrieval). The orchestration work was not reasoning-heavy; it required reliable tool use and instruction-following. Opus overhead was unnecessary.

Change 2: Add max-turns limit

Setting --max-turns 30 bounds each session to a finite conversation length, preventing runaway token accumulation. Testing showed 30 turns sufficient for typical agent workflows (database queries, email dispatch, calendar updates).

Verification and restart:

systemctl restart jada-agent

Post-restart, we verified the changes persisted across service restarts and daemon spawning by querying the running process environment and inspecting logs.

Cost Impact

With both changes applied:

Daemon now uses Haiku at $0.80 input, $0.04 output per 1M tokens
Typical session: ~80K tokens (lower context carryover, capped at 30 turns)
Cost per session: ~$0.12
Daily daemon cost (4–5 runs): ~$0.48–0.60
Total daily spend: ~$0.88–1.00 (down from $45)

Infrastructure and Deployment

Lightsail instance: jada-agent (34.239.233.28, Ubuntu 22.04)

Service managed by systemd: /etc/systemd/system/jada-agent.service
Daemon script: /opt/jada/jada_daemon.sh
No infrastructure-as-code changes required; modifications were in-place on the running instance

Related S3 and CloudFront resources touched during the session (context-only, not part of cost fix):

S3 bucket: sailjada.com (CloudFront distribution invalidation performed for /g/ path after deploying index.html)
ShipCaptainCrew S3 bucket used for document storage (AAR PDFs, event receipts)

Key Decisions and Rationale

Why Haiku instead of Sonnet? The daemon's work is deterministic and tool-focused: querying databases, sending emails, looking up calendar events. Sonnet's additional reasoning capacity adds cost without solving the problem. Haiku