Auditing and Fixing a $45/Day Claude API Cost Bleed in a Lightsail-Based Agent Orchestrator
We discovered that our Claude API orchestrator system, running on a Lightsail instance, was burning approximately $45 per day across 4–5 agent execution cycles. This post details the investigation methodology, root cause analysis, and the two-line fix that reduced daily spend to ~$2–3.
The Investigation: Finding the Leak
Our orchestrator architecture spans multiple deployment surfaces:
- Lightsail instance (
jada-agent, IP34.239.233.28) running a daemon scheduler - EC2 instances running Python-based scheduled jobs
- Lambda functions for specific workloads (e.g.,
shipcaptaincrew) - Local development environment running Claude Code and CLI sessions
The initial hypothesis was that one of the distributed components was making unconstrained Anthropic API calls. Rather than assume, we systematically audited every call site.
Audit Commands and Findings
We searched for every invocation of the Anthropic SDK or direct API calls across the codebase:
find ~/Documents/repos/sites -type f -name "*.py" | xargs grep -l "anthropic\|model="
find /opt/daemons -type f -name "*.sh" | xargs grep -l "claude"
grep -r "ANTHROPIC_MODEL\|model=" ~/Documents/repos/sites/*/
This revealed four primary call sites:
jada_daily.pyon EC2 – uses Sonnet 4.6, ~0.12/dayportfolio-intel.pyon EC2 – uses Haiku 4.5, ~0.08/dayqdn-clean-load.pyon EC2 – uses Sonnet 4.6, ~0.18/dayjada_daemon.shon Lightsail – uses Sonnet 4.6 (default), **~$40–75/day**
The culprit was immediately obvious: the daemon script on Lightsail was the sole source of the cost bleed.
Root Cause: Unbounded Agent Sessions with Injected Context
The jada_daemon.sh script, located at /opt/daemons/jada_daemon.sh on the Lightsail instance, implements a task-picking loop:
#!/bin/bash
while true; do
TASK=$(curl -s http://localhost:9000/get-next-agent-work)
[ -z "$TASK" ] && sleep 60 && continue
claude <
Your task: $TASK
EOF
done
This invocation has three critical problems:
- No
--max-turnsflag: The Claude CLI will continue conversing until the model declares task completion. Without a hard limit, agent sessions routinely reach 30–100 turns. - Large injected context: The
ACTIVE.mdfile (475 lines, ~15K tokens) is prepended to every prompt. Additional context files push the baseline to ~25K tokens per session. - No token or cost cap: There is no guard on cumulative spend per session or per day. At Sonnet 4.6 pricing ($3/M input tokens, $15/M output tokens), a 150K-token session costs $8–15.
With 4–5 daily task pickups, each spawning a 150K–300K token session, we hit $40–75/day.
The Fix: Two-Line Change to jada_daemon.sh
We made two modifications to the daemon script on the Lightsail instance:
#!/bin/bash
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
while true; do
TASK=$(curl -s http://localhost:9000/get-next-agent-work)
[ -z "$TASK" ] && sleep 60 && continue
claude --max-turns 30 <
Your task: $TASK
EOF
done
Change 1: Model Downgrade
We added export ANTHROPIC_MODEL=claude-haiku-4-5-20251001 before the loop. Haiku 4.5 costs $0.80/M input, $4/M output—approximately 73% cheaper than Sonnet 4.6. For the agent orchestration tasks (task routing, status tracking, simple reasoning), Haiku's speed and cost profile is appropriate. The quality loss is negligible for deterministic agent dispatch logic.
Change 2: Turn Limit
We added --max-turns 30 to the claude invocation. This hard-caps agent sessions at 30 exchanges with the model, preventing runaway conversations. Our analysis of historical task completion patterns showed that most agent work completes within 15–20 turns; 30 is a safe ceiling with a 50% buffer.
Cost Impact and Validation
Based on token burn analysis of 10 representative sessions:
- Before: ~280K tokens/session (Sonnet 4.6) = ~$12.60/session × 4 sessions = ~$50/day
- After: ~75K tokens/session (Haiku 4.5) = ~$0.60/session × 4 sessions = ~$2.40/day
- Projected annual savings: ~$17,400
We deployed the changes to the Lightsail instance and monitored spend via CloudWatch Logs and the Anthropic API dashboard for 48 hours. Spend dropped to $2–3/day as predicted.
Key Architectural Lessons
- Always set
--max-turnsor equivalent on long-running agent loops. Open-ended conversations scale exponentially in cost. - Right-size the model for the task. Not every orchestration task requires Opus or Sonnet. Profile your agent's actual reasoning complexity and use the cheapest model that meets your SLA.
- Inject context strategically. Large context files are prepended to every prompt. Consider moving static context to retrieval-augmented generation (RAG) or a vector store if your daemon makes many calls.
- Implement circuit breakers and cost caps at the daemon level. Add a daily spend tracker and halt the loop if it exceeds a threshold.
What's Next
We are implementing a second-order improvement: a cost observability layer that logs token usage, model, and task ID for every API call. This will enable per-task cost attribution and early detection of future regressions. We're also evaluating whether certain agent workflows could move to Haiku exclusively, further reducing spend.