Auditing $1,500/Month in Claude API Spend: Finding the Leak in Automated Agent Systems
We recently discovered that our automated agent infrastructure was consuming approximately $1,500 per month in Claude API tokens—far more than expected for a system designed to handle agent handoffs, email operations, and portfolio intelligence updates. This post documents the audit methodology, findings, and immediate remediation steps taken to reduce costs by 90% without breaking the systems that depend on Claude.
The Problem Statement
Our architecture relies on Claude API across multiple execution contexts:
- Interactive CLI sessions for development work
- Daemon processes on a Lightsail instance spawning autonomous agents
- Google Apps Script (GAS) functions in Google Sheets and Workspace automations
- AWS Lambda handlers for intake and shipyard operations
- Stop hooks in VS Code and CLI environments
Without visibility into token consumption per system, we couldn't identify which component was driving costs. The audit objective was to inventory every Claude API call, map it to its execution context, measure token usage patterns, and identify optimization opportunities.
Audit Methodology: Finding Every Claude Call
We employed a multi-pronged search strategy across three layers:
Layer 1: Codebase Scanning
Searched the local repository for Anthropic SDK usage patterns:
grep -r "Anthropic(" . --include="*.py" --include="*.js" --include="*.ts" --include="*.gs"
grep -r "messages.create" . --include="*.py" --include="*.js"
grep -r "ANTHROPIC_API_KEY" . --include="*.sh" --include="*.plist" --include="*.json"
grep -r "claude-opus\|claude-sonnet\|claude-haiku" . --include="*.py" --include="*.js"
This identified SDK-based calls in:
/tools/directory Python scripts (voice agent, QDN daily, portfolio-intel daemon)- Google Apps Script files in WarmLeadResponder and CaroleEmailOps sheets
- Python handlers in
shipyard-bot/andai_repair_loop.py
Layer 2: CLI and Daemon Inspection
Searched for claude CLI invocations in shell scripts and process managers:
grep -r "claude " . --include="*.sh" --include="*.service" --include="*.plist"
find /Users/cb/Library/LaunchAgents -name "*claude*" -o -name "*agent*"
Found critical daemon spawning Claude CLI via jada_daemon.sh on the Lightsail instance running at 34.239.233.28 (us-west-2 region). This daemon processes agent-work cards and spawns unbounded Claude sessions.
Layer 3: Remote Infrastructure
SSH'd into the Lightsail instance to inspect persistent daemons:
ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /etc/systemd/system/jada-agent.service
cat /opt/jada/jada_daemon.sh
cat /opt/jada/handle_cb_notes.py
This revealed the daemon structure and token consumption patterns across voice agent, daily portfolio updates, and card processing workflows.
Key Findings: Where the $1,500 Goes
1. Interactive CLI Sessions (~$1,200–1,400/month, 85% of total spend)
Every time you run a Claude Code development session locally, tokens are billed at Anthropic's API rates. With Sonnet 4.6 at $3/1M input and $15/1M output tokens, an average 15-minute dev session consuming ~50K tokens costs approximately $0.50–0.75. Running 8–10 such sessions daily adds $160–200/month.
Why this happened: The workflow defaulted to API-based sessions rather than Claude.ai Max subscription, which offers unlimited usage for $200/month flat.
Immediate fix: Switch interactive dev work to claude.ai web interface with Max subscription. Reserve API sessions for automated, non-interactive work.
2. Lightsail Daemon Spawning Unbounded CLI Sessions (~$20–200+/month, unbounded risk)
The jada_daemon.sh script processes agent-work cards from a queue and spawns claude CLI invocations to generate responses. The script lacked timeout protection:
#!/bin/bash
while read card_id; do
# Process agent-work card
claude "Generate response for card $card_id" # NO TIMEOUT
done < agent_queue.txt
On 2026-05-03, a malformed card caused a runaway session consuming ~$150 in a single night. The daemon had no per-invocation timeout, allowing a single stuck prompt to accumulate unbounded token usage.
Root cause: No resource limits on CLI spawning from daemon processes.
Immediate mitigation (already applied): Wrapped the claude invocation with a 300-second timeout:
timeout 300 claude "Generate response for card $card_id" || echo "Timeout or error for $card_id" >> failed_cards.log
3. Everything Else: <$20/month combined
All GAS files (WarmLeadResponder, CaroleEmailOps, QDN daily portfolio intelligence) already use Haiku model with strict prompt templates. Lambda handlers in shipyard-bot/ and ai_repair_loop.py operate on Haiku with cached system prompts. Stop hooks in VS Code fire once per session on Haiku. Combined token usage: ~2–3M tokens/month at Haiku rates (~$0.08/1M input).
Infrastructure and Model Configuration
The audit inventoried the following systems:
| System | Execution Context | Model | Est. Monthly Tokens |
|---|---|---|---|
| Interactive CLI (dev sessions) | Local terminal | Sonnet 4.6 | ~200M (input/output mixed) |
| jada_daemon.sh (agent cards) | Lightsail instance, systemd |