Auditing $1,500/Month Claude API Spend: Finding the Leak in Automated Agent Systems

```html

We were spending approximately $1,500 per month on Claude API tokens across a distributed system of automated agents, development workflows, and scheduled jobs. The goal: identify where the money was going and cut it to 1/10–1/20 of current spend without breaking production systems.

This post documents the audit methodology, findings, and immediate fixes we deployed.

The Audit Strategy: Following the Token Trail

Rather than guessing, we systematically inventoried every system making Claude API calls. The approach:

Grep the entire codebase for Anthropic SDK imports, API key references, and model identifiers
Trace shell scripts and daemons that invoke the `claude` CLI
SSH into production boxes and inspect running processes, daemon logs, and systemd service configurations
Read Google Apps Script (GAS) files that call the Anthropic API via UrlFetchApp
Document each system with: name, purpose, location, model used, frequency, and estimated monthly tokens

This was a read-only audit—no changes to production during discovery.

Technical Details: What We Found

1. Interactive Claude Code CLI Sessions (~85% of spend)

The largest leak: billing at Claude API rates for every local development session using the claude CLI.

Where: /Users/cb/Documents/repos/notes/ and local development environments
Model: Claude Sonnet 4.6
How: Each claude invocation during dev work hits the Anthropic API directly, charged per token
Cost: Estimated $1,200–1,400/month
Why: The Claude Code editor integrates with the Anthropic API and bills usage at commercial API rates rather than subscription rates

The fix: Switch development workflows to claude.ai with a Max subscription (~$100–200/month flat). This moves billing from per-token to a fixed monthly fee, cutting this category by ~90%. For local automation that requires API access, use Haiku (our low-cost model) instead.

2. Lightsail Daemon Spawning Claude CLI (~$20–200+, unbounded)

The jada-agent daemon on Lightsail (IP 34.239.233.28, us-west-2 region) spawns claude CLI processes to handle agent-work cards. Location: jada_daemon.sh and systemd service jada-agent.service.

What it does: Reads work cards from a queue, invokes claude to generate responses, writes results to SES/email
Model: Defaults to Sonnet (no explicit --model haiku flag observed in early runs)
Risk: No timeout on the subprocess. On 2026-05-03, a runaway process generated multiple pages of output before being killed manually
Estimated monthly cost: Normally $20–50 per month, but unbounded due to missing timeout protection

The critical fix: Add a timeout wrapper to prevent runaway processes. Before the claude invocation in jada_daemon.sh, prepend:

timeout 300 claude --model haiku message "..."

This caps any single invocation to 5 minutes and ensures the process terminates even if Claude generates excessive output or the daemon hangs.

3. Everything Else (Stop Hooks, GAS, Lambda, Scheduled Jobs) — Minimal Spend

Claude Code Stop hooks in settings.json: Already using Haiku. Cost: <$5/month.
Google Apps Script files (WarmLeadResponder, CaroleEmailOps, QDN daily, portfolio-intel): All using Haiku or GPT-4 (non-Claude). Cost: <$10/month combined.
tech_blog Stop hook script (/Users/cb/.cursor/rules/tech_blog): Haiku. Cost: <$2/month.
ai_repair_loop, shipyard-bot, intake handlers: All Haiku or batch-processed. Cost: <$5/month combined.

These systems are already optimized. Minor further savings via prompt caching or the Batch API are possible but immaterial at this scale.

Audit Methodology & Commands

To replicate this audit:

Find Python files using Anthropic SDK:

grep -r "from anthropic import" --include="*.py" /Users/cb/Documents/repos
grep -r "Anthropic(" --include="*.py" /Users/cb/Documents/repos
grep -r "messages.create" --include="*.py" /Users/cb/Documents/repos

Find JavaScript/TypeScript using Anthropic:

grep -r "require.*anthropic\|import.*anthropic" --include="*.js" --include="*.gs" --include="*.ts" /Users/cb/Documents/repos

Find shell and config files with Anthropic references:

grep -r "ANTHROPIC_API_KEY\|claude\|anthropic" --include="*.sh" --include="*.plist" --include="*.service" /Users/cb/Documents/repos
find /Library/LaunchAgents -name "*claude*" -o -name "*anthropic*"

SSH into Lightsail and inspect the daemon:

ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /path/to/jada_daemon.sh
systemctl status jada-agent
journalctl -u jada-agent -n 100

Key Decisions & Trade-offs

Keep Haiku for low-stakes automation: Haiku's input cost is ~90% cheaper than Sonnet. All recurring scheduled jobs should default to Haiku unless accuracy is critical.
Switch dev workflows to subscription model: Paying $100–200/month for unlimited development is cheaper than $1,200+/month for per-token billing at development volumes.
Add timeout protection immediately: The runaway daemon incident proves that unbounded CLI invocations are a financial and operational risk. Timeouts are free risk mitigation.
Use Batch API for non-urgent work: Jobs that can tolerate 24-hour latency (like nightly reports) should use the Batch API, which offers a 50% discount. We haven't deployed this yet but should.

What's Next

Immediate (this week):

Deploy timeout wrapper to jada_daemon.sh