```html

Auditing $1,500/Month Claude API Spend: Finding the Leak in Automated Agent Systems

We were spending approximately $1,500 per month on Claude API tokens across a distributed system of automated agents, development workflows, and scheduled jobs. The goal: identify where the money was going and cut it to 1/10–1/20 of current spend without breaking production systems.

This post documents the audit methodology, findings, and immediate fixes we deployed.

The Audit Strategy: Following the Token Trail

Rather than guessing, we systematically inventoried every system making Claude API calls. The approach:

  • Grep the entire codebase for Anthropic SDK imports, API key references, and model identifiers
  • Trace shell scripts and daemons that invoke the `claude` CLI
  • SSH into production boxes and inspect running processes, daemon logs, and systemd service configurations
  • Read Google Apps Script (GAS) files that call the Anthropic API via UrlFetchApp
  • Document each system with: name, purpose, location, model used, frequency, and estimated monthly tokens

This was a read-only audit—no changes to production during discovery.

Technical Details: What We Found

1. Interactive Claude Code CLI Sessions (~85% of spend)

The largest leak: billing at Claude API rates for every local development session using the claude CLI.

  • Where: /Users/cb/Documents/repos/notes/ and local development environments
  • Model: Claude Sonnet 4.6
  • How: Each claude invocation during dev work hits the Anthropic API directly, charged per token
  • Cost: Estimated $1,200–1,400/month
  • Why: The Claude Code editor integrates with the Anthropic API and bills usage at commercial API rates rather than subscription rates

The fix: Switch development workflows to claude.ai with a Max subscription (~$100–200/month flat). This moves billing from per-token to a fixed monthly fee, cutting this category by ~90%. For local automation that requires API access, use Haiku (our low-cost model) instead.

2. Lightsail Daemon Spawning Claude CLI (~$20–200+, unbounded)

The jada-agent daemon on Lightsail (IP 34.239.233.28, us-west-2 region) spawns claude CLI processes to handle agent-work cards. Location: jada_daemon.sh and systemd service jada-agent.service.

  • What it does: Reads work cards from a queue, invokes claude to generate responses, writes results to SES/email
  • Model: Defaults to Sonnet (no explicit --model haiku flag observed in early runs)
  • Risk: No timeout on the subprocess. On 2026-05-03, a runaway process generated multiple pages of output before being killed manually
  • Estimated monthly cost: Normally $20–50 per month, but unbounded due to missing timeout protection

The critical fix: Add a timeout wrapper to prevent runaway processes. Before the claude invocation in jada_daemon.sh, prepend:

timeout 300 claude --model haiku message "..."

This caps any single invocation to 5 minutes and ensures the process terminates even if Claude generates excessive output or the daemon hangs.

3. Everything Else (Stop Hooks, GAS, Lambda, Scheduled Jobs) — Minimal Spend

  • Claude Code Stop hooks in settings.json: Already using Haiku. Cost: <$5/month.
  • Google Apps Script files (WarmLeadResponder, CaroleEmailOps, QDN daily, portfolio-intel): All using Haiku or GPT-4 (non-Claude). Cost: <$10/month combined.
  • tech_blog Stop hook script (/Users/cb/.cursor/rules/tech_blog): Haiku. Cost: <$2/month.
  • ai_repair_loop, shipyard-bot, intake handlers: All Haiku or batch-processed. Cost: <$5/month combined.

These systems are already optimized. Minor further savings via prompt caching or the Batch API are possible but immaterial at this scale.

Audit Methodology & Commands

To replicate this audit:

Find Python files using Anthropic SDK:

grep -r "from anthropic import" --include="*.py" /Users/cb/Documents/repos
grep -r "Anthropic(" --include="*.py" /Users/cb/Documents/repos
grep -r "messages.create" --include="*.py" /Users/cb/Documents/repos

Find JavaScript/TypeScript using Anthropic:

grep -r "require.*anthropic\|import.*anthropic" --include="*.js" --include="*.gs" --include="*.ts" /Users/cb/Documents/repos

Find shell and config files with Anthropic references:

grep -r "ANTHROPIC_API_KEY\|claude\|anthropic" --include="*.sh" --include="*.plist" --include="*.service" /Users/cb/Documents/repos
find /Library/LaunchAgents -name "*claude*" -o -name "*anthropic*"

SSH into Lightsail and inspect the daemon:

ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /path/to/jada_daemon.sh
systemctl status jada-agent
journalctl -u jada-agent -n 100

Key Decisions & Trade-offs

  • Keep Haiku for low-stakes automation: Haiku's input cost is ~90% cheaper than Sonnet. All recurring scheduled jobs should default to Haiku unless accuracy is critical.
  • Switch dev workflows to subscription model: Paying $100–200/month for unlimited development is cheaper than $1,200+/month for per-token billing at development volumes.
  • Add timeout protection immediately: The runaway daemon incident proves that unbounded CLI invocations are a financial and operational risk. Timeouts are free risk mitigation.
  • Use Batch API for non-urgent work: Jobs that can tolerate 24-hour latency (like nightly reports) should use the Batch API, which offers a 50% discount. We haven't deployed this yet but should.

What's Next

Immediate (this week):

  • Deploy timeout wrapper to jada_daemon.sh