Auditing $1,500/Month Claude API Spend: Finding the Leak in Automated Agent Systems
We were spending approximately $1,500 per month on Claude API tokens across a distributed system of automated agents, development workflows, and scheduled jobs. The goal: identify where the money was going and cut it to 1/10–1/20 of current spend without breaking production systems.
This post documents the audit methodology, findings, and immediate fixes we deployed.
The Audit Strategy: Following the Token Trail
Rather than guessing, we systematically inventoried every system making Claude API calls. The approach:
- Grep the entire codebase for Anthropic SDK imports, API key references, and model identifiers
- Trace shell scripts and daemons that invoke the `claude` CLI
- SSH into production boxes and inspect running processes, daemon logs, and systemd service configurations
- Read Google Apps Script (GAS) files that call the Anthropic API via
UrlFetchApp - Document each system with: name, purpose, location, model used, frequency, and estimated monthly tokens
This was a read-only audit—no changes to production during discovery.
Technical Details: What We Found
1. Interactive Claude Code CLI Sessions (~85% of spend)
The largest leak: billing at Claude API rates for every local development session using the claude CLI.
- Where:
/Users/cb/Documents/repos/notes/and local development environments - Model: Claude Sonnet 4.6
- How: Each
claudeinvocation during dev work hits the Anthropic API directly, charged per token - Cost: Estimated $1,200–1,400/month
- Why: The Claude Code editor integrates with the Anthropic API and bills usage at commercial API rates rather than subscription rates
The fix: Switch development workflows to claude.ai with a Max subscription (~$100–200/month flat). This moves billing from per-token to a fixed monthly fee, cutting this category by ~90%. For local automation that requires API access, use Haiku (our low-cost model) instead.
2. Lightsail Daemon Spawning Claude CLI (~$20–200+, unbounded)
The jada-agent daemon on Lightsail (IP 34.239.233.28, us-west-2 region) spawns claude CLI processes to handle agent-work cards. Location: jada_daemon.sh and systemd service jada-agent.service.
- What it does: Reads work cards from a queue, invokes
claudeto generate responses, writes results to SES/email - Model: Defaults to Sonnet (no explicit
--model haikuflag observed in early runs) - Risk: No timeout on the subprocess. On 2026-05-03, a runaway process generated multiple pages of output before being killed manually
- Estimated monthly cost: Normally $20–50 per month, but unbounded due to missing timeout protection
The critical fix: Add a timeout wrapper to prevent runaway processes. Before the claude invocation in jada_daemon.sh, prepend:
timeout 300 claude --model haiku message "..."
This caps any single invocation to 5 minutes and ensures the process terminates even if Claude generates excessive output or the daemon hangs.
3. Everything Else (Stop Hooks, GAS, Lambda, Scheduled Jobs) — Minimal Spend
- Claude Code Stop hooks in
settings.json: Already using Haiku. Cost: <$5/month. - Google Apps Script files (WarmLeadResponder, CaroleEmailOps, QDN daily, portfolio-intel): All using Haiku or GPT-4 (non-Claude). Cost: <$10/month combined.
- tech_blog Stop hook script (
/Users/cb/.cursor/rules/tech_blog): Haiku. Cost: <$2/month. - ai_repair_loop, shipyard-bot, intake handlers: All Haiku or batch-processed. Cost: <$5/month combined.
These systems are already optimized. Minor further savings via prompt caching or the Batch API are possible but immaterial at this scale.
Audit Methodology & Commands
To replicate this audit:
Find Python files using Anthropic SDK:
grep -r "from anthropic import" --include="*.py" /Users/cb/Documents/repos
grep -r "Anthropic(" --include="*.py" /Users/cb/Documents/repos
grep -r "messages.create" --include="*.py" /Users/cb/Documents/repos
Find JavaScript/TypeScript using Anthropic:
grep -r "require.*anthropic\|import.*anthropic" --include="*.js" --include="*.gs" --include="*.ts" /Users/cb/Documents/repos
Find shell and config files with Anthropic references:
grep -r "ANTHROPIC_API_KEY\|claude\|anthropic" --include="*.sh" --include="*.plist" --include="*.service" /Users/cb/Documents/repos
find /Library/LaunchAgents -name "*claude*" -o -name "*anthropic*"
SSH into Lightsail and inspect the daemon:
ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /path/to/jada_daemon.sh
systemctl status jada-agent
journalctl -u jada-agent -n 100
Key Decisions & Trade-offs
- Keep Haiku for low-stakes automation: Haiku's input cost is ~90% cheaper than Sonnet. All recurring scheduled jobs should default to Haiku unless accuracy is critical.
- Switch dev workflows to subscription model: Paying $100–200/month for unlimited development is cheaper than $1,200+/month for per-token billing at development volumes.
- Add timeout protection immediately: The runaway daemon incident proves that unbounded CLI invocations are a financial and operational risk. Timeouts are free risk mitigation.
- Use Batch API for non-urgent work: Jobs that can tolerate 24-hour latency (like nightly reports) should use the Batch API, which offers a 50% discount. We haven't deployed this yet but should.
What's Next
Immediate (this week):
- Deploy timeout wrapper to
jada_daemon.sh