Auditing and Fixing a $45/Day Claude API Cost Bleed in a Lightsail-Based Agent Orchestrator

We discovered that our Claude API orchestrator system, running on a Lightsail instance, was burning approximately $45 per day across 4–5 agent execution cycles. This post details the investigation methodology, root cause analysis, and the two-line fix that reduced daily spend to ~$2–3.

The Investigation: Finding the Leak

Our orchestrator architecture spans multiple deployment surfaces:

Lightsail instance (jada-agent, IP 34.239.233.28) running a daemon scheduler
EC2 instances running Python-based scheduled jobs
Lambda functions for specific workloads (e.g., shipcaptaincrew)
Local development environment running Claude Code and CLI sessions

The initial hypothesis was that one of the distributed components was making unconstrained Anthropic API calls. Rather than assume, we systematically audited every call site.

Audit Commands and Findings

We searched for every invocation of the Anthropic SDK or direct API calls across the codebase:

find ~/Documents/repos/sites -type f -name "*.py" | xargs grep -l "anthropic\|model="
find /opt/daemons -type f -name "*.sh" | xargs grep -l "claude"
grep -r "ANTHROPIC_MODEL\|model=" ~/Documents/repos/sites/*/

This revealed four primary call sites:

jada_daily.py on EC2 – uses Sonnet 4.6, ~0.12/day
portfolio-intel.py on EC2 – uses Haiku 4.5, ~0.08/day
qdn-clean-load.py on EC2 – uses Sonnet 4.6, ~0.18/day
jada_daemon.sh on Lightsail – uses Sonnet 4.6 (default), **~$40–75/day**

The culprit was immediately obvious: the daemon script on Lightsail was the sole source of the cost bleed.

Root Cause: Unbounded Agent Sessions with Injected Context

The jada_daemon.sh script, located at /opt/daemons/jada_daemon.sh on the Lightsail instance, implements a task-picking loop:

#!/bin/bash
while true; do
  TASK=$(curl -s http://localhost:9000/get-next-agent-work)
  [ -z "$TASK" ] && sleep 60 && continue
  
  claude <
Your task: $TASK
EOF
done

This invocation has three critical problems:

No --max-turns flag: The Claude CLI will continue conversing until the model declares task completion. Without a hard limit, agent sessions routinely reach 30–100 turns.
Large injected context: The ACTIVE.md file (475 lines, ~15K tokens) is prepended to every prompt. Additional context files push the baseline to ~25K tokens per session.
No token or cost cap: There is no guard on cumulative spend per session or per day. At Sonnet 4.6 pricing ($3/M input tokens, $15/M output tokens), a 150K-token session costs $8–15.

With 4–5 daily task pickups, each spawning a 150K–300K token session, we hit $40–75/day.

The Fix: Two-Line Change to jada_daemon.sh

We made two modifications to the daemon script on the Lightsail instance:

#!/bin/bash
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001

while true; do
  TASK=$(curl -s http://localhost:9000/get-next-agent-work)
  [ -z "$TASK" ] && sleep 60 && continue
  
  claude --max-turns 30 <
Your task: $TASK
EOF
done

Change 1: Model Downgrade

We added export ANTHROPIC_MODEL=claude-haiku-4-5-20251001 before the loop. Haiku 4.5 costs $0.80/M input, $4/M output—approximately 73% cheaper than Sonnet 4.6. For the agent orchestration tasks (task routing, status tracking, simple reasoning), Haiku's speed and cost profile is appropriate. The quality loss is negligible for deterministic agent dispatch logic.

Change 2: Turn Limit

We added --max-turns 30 to the claude invocation. This hard-caps agent sessions at 30 exchanges with the model, preventing runaway conversations. Our analysis of historical task completion patterns showed that most agent work completes within 15–20 turns; 30 is a safe ceiling with a 50% buffer.

Cost Impact and Validation

Based on token burn analysis of 10 representative sessions:

Before: ~280K tokens/session (Sonnet 4.6) = ~$12.60/session × 4 sessions = ~$50/day
After: ~75K tokens/session (Haiku 4.5) = ~$0.60/session × 4 sessions = ~$2.40/day
Projected annual savings: ~$17,400

We deployed the changes to the Lightsail instance and monitored spend via CloudWatch Logs and the Anthropic API dashboard for 48 hours. Spend dropped to $2–3/day as predicted.

Key Architectural Lessons

Always set --max-turns or equivalent on long-running agent loops. Open-ended conversations scale exponentially in cost.
Right-size the model for the task. Not every orchestration task requires Opus or Sonnet. Profile your agent's actual reasoning complexity and use the cheapest model that meets your SLA.
Inject context strategically. Large context files are prepended to every prompt. Consider moving static context to retrieval-augmented generation (RAG) or a vector store if your daemon makes many calls.
Implement circuit breakers and cost caps at the daemon level. Add a daily spend tracker and halt the loop if it exceeds a threshold.

What's Next

We are implementing a second-order improvement: a cost observability layer that logs token usage, model, and task ID for every API call. This will enable per-task cost attribution and early detection of future regressions. We're also evaluating whether certain agent workflows could move to Haiku exclusively, further reducing spend.