Cutting Claude API Costs 95% on the Jada Agent Daemon: A Token Audit and Fix

```html

We discovered our orchestrator daemon was burning $45/day on Claude API calls—roughly 15x more than intended. The root cause: unbounded context injection and no model targeting strategy. Here's how we diagnosed the bleed and cut it to $2–3/day.

The Problem: Runaway Token Consumption

The jada-agent service running on our Lightsail instance (34.239.233.28) was spawning 4–5 Claude CLI sessions per day, each consuming 150K–300K tokens over 30–100 turns. At Sonnet 4.6 pricing (~$3 per million input tokens, ~$15 per million output), a single 250K-token session cost $8–15.

The culprit: the daemon shell script /opt/jada/jada_daemon.sh was invoking the Anthropic CLI with:

claude -c /opt/jada/ACTIVE.md < task.txt

No model specification. No turn limit. No context pruning. The system context file alone (ACTIVE.md) contained ~475 lines of structured data—roughly 15K tokens—injected into every request.

Technical Diagnosis

We ran a comprehensive audit across all Claude-calling code paths:

Lightsail daemon: /opt/jada/jada_daemon.sh — invoking bare claude CLI with no model flag, defaulting to Claude 3.5 Sonnet (the most expensive non-Opus option)
Lambda orchestrator: /opt/jada/shipcaptaincrew/lambda_function.py — using model="claude-3-5-sonnet-20241022" for calendar queries and document processing
Daily scripts: jada_daily.py and qdn_clean_load_daily.py — using Sonnet, running once/day, costing ~$0.38/day combined (negligible)
Portfolio intelligence: portfolio-intel/daily.py — Haiku model, minimal cost (~$0.02/day)

The scheduled Python jobs were fine. The daemon was the problem.

Root Cause: Unbounded Turns + Heavy Context

The CLI session had no --max-turns parameter, so a single daemon task could spiral into 50, 80, even 100 back-and-forth turns before exhausting itself or hitting a soft failure. Each turn carried the full 15K-token context injection, plus accumulated conversation history.

Because no model was specified, the CLI defaulted to Claude 3.5 Sonnet—appropriate for complex reasoning but overkill for the daemon's primary tasks (calendar lookups, crew dispatch coordination, document fetching).

The Fix: Two-Line Change

We modified /opt/jada/jada_daemon.sh:

#!/bin/bash
# Before changes:
claude -c /opt/jada/ACTIVE.md < "$task_file"

# After changes:
export ANTHROPIC_MODEL=claude-haiku-4-5-20251001
claude -c /opt/jada/ACTIVE.md --max-turns 30 < "$task_file"

Why Claude Haiku 4.5? The daemon handles three categories of work:

Calendar queries: "Find Jennifer Sanderson charter on May 12" — information retrieval, no reasoning
Email generation: Crew dispatch notifications — template-driven, high volume, low complexity
Database lookups: Fetch event details from DynamoDB — structured data retrieval and formatting

None of these require Sonnet's reasoning capability. Haiku is 80% cheaper (~$0.80 per million input tokens vs. $3 for Sonnet) and fast enough for latency-insensitive daemon work.

Why --max-turns 30? Most daemon tasks resolve in 3–8 turns. A hard cap at 30 prevents pathological loops (e.g., malformed API responses causing retry spirals) while leaving headroom for legitimate multi-step orchestration.

Deployment and Verification

We deployed the changes directly on the Lightsail instance:

# SSH into 34.239.233.28
ssh ec2-user@34.239.233.28

# Edit the daemon script
sudo vi /opt/jada/jada_daemon.sh

# Apply the two changes (model export + --max-turns flag)
# Then restart the service:
sudo systemctl restart jada-agent

# Verify the daemon is running:
sudo systemctl status jada-agent
sudo journalctl -u jada-agent -f

We then monitored three complete daily cycles to confirm:

Daemon processes completed normally
No tasks hit the turn limit (max observed: 18 turns)
Crew dispatch emails were generated and sent correctly
Calendar queries returned accurate results

Cost Impact: Before and After

Component	Before	After	Savings
Daemon (Lightsail)	$40–45/day	$1.50–2/day	95%
Scheduled scripts	$0.38/day	$0.38/day	—
Total	~$45/day	~$2–3/day	94%

Architecture Lessons

This incident highlights three patterns to watch:

Context bloat: Injecting 15K tokens of system context into every