Cutting Claude API Costs 95% on the Jada Agent Daemon: A Token Audit and Fix
We discovered our orchestrator daemon was burning $45/day on Claude API calls—roughly 15x more than intended. The root cause: unbounded context injection and no model targeting strategy. Here's how we diagnosed the bleed and cut it to $2–3/day.
The Problem: Runaway Token Consumption
The jada-agent service running on our Lightsail instance (34.239.233.28) was spawning 4–5 Claude CLI sessions per day, each consuming 150K–300K tokens over 30–100 turns. At Sonnet 4.6 pricing (~$3 per million input tokens, ~$15 per million output), a single 250K-token session cost $8–15.
The culprit: the daemon shell script /opt/jada/jada_daemon.sh was invoking the Anthropic CLI with:
claude -c /opt/jada/ACTIVE.md < task.txt
No model specification. No turn limit. No context pruning. The system context file alone (ACTIVE.md) contained ~475 lines of structured data—roughly 15K tokens—injected into every request.
Technical Diagnosis
We ran a comprehensive audit across all Claude-calling code paths:
- Lightsail daemon:
/opt/jada/jada_daemon.sh— invoking bareclaudeCLI with no model flag, defaulting to Claude 3.5 Sonnet (the most expensive non-Opus option) - Lambda orchestrator:
/opt/jada/shipcaptaincrew/lambda_function.py— usingmodel="claude-3-5-sonnet-20241022"for calendar queries and document processing - Daily scripts:
jada_daily.pyandqdn_clean_load_daily.py— using Sonnet, running once/day, costing ~$0.38/day combined (negligible) - Portfolio intelligence:
portfolio-intel/daily.py— Haiku model, minimal cost (~$0.02/day)
The scheduled Python jobs were fine. The daemon was the problem.
Root Cause: Unbounded Turns + Heavy Context
The CLI session had no --max-turns parameter, so a single daemon task could spiral into 50, 80, even 100 back-and-forth turns before exhausting itself or hitting a soft failure. Each turn carried the full 15K-token context injection, plus accumulated conversation history.
Because no model was specified, the CLI defaulted to Claude 3.5 Sonnet—appropriate for complex reasoning but overkill for the daemon's primary tasks (calendar lookups, crew dispatch coordination, document fetching).
The Fix: Two-Line Change
We modified /opt/jada/jada_daemon.sh:
#!/bin/bash
# Before changes:
claude -c /opt/jada/ACTIVE.md < "$task_file"
# After changes:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude -c /opt/jada/ACTIVE.md --max-turns 30 < "$task_file"
Why Claude Haiku 4.5? The daemon handles three categories of work:
- Calendar queries: "Find Jennifer Sanderson charter on May 12" — information retrieval, no reasoning
- Email generation: Crew dispatch notifications — template-driven, high volume, low complexity
- Database lookups: Fetch event details from DynamoDB — structured data retrieval and formatting
None of these require Sonnet's reasoning capability. Haiku is 80% cheaper (~$0.80 per million input tokens vs. $3 for Sonnet) and fast enough for latency-insensitive daemon work.
Why --max-turns 30? Most daemon tasks resolve in 3–8 turns. A hard cap at 30 prevents pathological loops (e.g., malformed API responses causing retry spirals) while leaving headroom for legitimate multi-step orchestration.
Deployment and Verification
We deployed the changes directly on the Lightsail instance:
# SSH into 34.239.233.28
ssh ec2-user@34.239.233.28
# Edit the daemon script
sudo vi /opt/jada/jada_daemon.sh
# Apply the two changes (model export + --max-turns flag)
# Then restart the service:
sudo systemctl restart jada-agent
# Verify the daemon is running:
sudo systemctl status jada-agent
sudo journalctl -u jada-agent -f
We then monitored three complete daily cycles to confirm:
- Daemon processes completed normally
- No tasks hit the turn limit (max observed: 18 turns)
- Crew dispatch emails were generated and sent correctly
- Calendar queries returned accurate results
Cost Impact: Before and After
| Component | Before | After | Savings |
|---|---|---|---|
| Daemon (Lightsail) | $40–45/day | $1.50–2/day | 95% |
| Scheduled scripts | $0.38/day | $0.38/day | — |
| Total | ~$45/day | ~$2–3/day | 94% |
Architecture Lessons
This incident highlights three patterns to watch:
- Context bloat: Injecting 15K tokens of system context into every