Auditing $1,500/Month Claude API Spend: Finding the 85% Cost Driver
We discovered our Claude API bill had crept to ~$1,500/month with no clear visibility into which systems were responsible. This post documents the systematic audit process we used to inventory every Claude integration across our stack, identify the primary cost drivers, and propose targeted fixes to cut spend by 90% without breaking automation.
The Audit Approach: Read-Only Inventory
Our goal was discovery only—no changes during the audit phase. We needed to answer: which systems call Claude, how often, on what models, and at what token volumes?
We structured the search in three phases:
- Grep-based discovery: Find all Anthropic SDK imports, API key references, and model string literals across the codebase.
- Code inspection: Read the actual invocation patterns to determine model, token limits, and call frequency.
- Runtime inspection: SSH into production daemons and read config files, service logs, and token usage metrics.
Phase 1: Grepping the Repository
We scanned the monorepo at /Users/cb/Documents/repos/notes/ for several patterns:
find . -type f \( -name "*.py" -o -name "*.js" -o -name "*.ts" -o -name "*.gs" \) | xargs grep -l "anthropic\|Anthropic("
find . -type f \( -name "*.py" -o -name "*.js" \) | xargs grep -l "messages\.create\|claude-"
find . -type f \( -name "*.sh" -o -name "*.plist" \) | xargs grep -l "ANTHROPIC_API_KEY\|claude"
This identified Claude integrations in:
- Python SDK code:
tools/directory scripts usinganthropic.Anthropic() - Google Apps Script (GAS):
WarmLeadResponder.gs,PortfolioIntel.gs,jada_daily.gs - Shell daemons: LaunchAgent plists and Lightsail systemd units
- CLI invocations: Direct
claudecommand calls in daemon scripts
Phase 2: Code Inspection—Model and Token Configuration
For each file found, we read the actual model selection and token limits. Key findings:
CaroleEmailOps.py: Usesclaude-sonnet-4-20250514withmax_tokens=2000per invocation. This script handles email classification and runs on-demand but doesn't show obvious high volume.WarmLeadResponder.gs: Calls Anthropic API from Google Apps Script with default model (checked in code:claude-opus-4-1-20250805) but with reasonable bounds.PortfolioIntel.gsandjada_daily.gs: Both run daily, configured withclaude-haiku-3-5-sonnetatmax_tokens=1500. Low individual cost but high aggregate if frequency is daily.- Lightsail daemon scripts: The
jada_daemon.shspawnsclaudeCLI with no timeout or token limits on agent-work card processing. This is the critical finding.
Phase 3: Runtime Inspection on Lightsail
We SSH'd into the Lightsail instance at 34.239.233.28 (us-west-2 region) using the read-only audit principal and inspected the active daemon:
ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /opt/jada/jada_daemon.sh
systemctl status jada-agent
The daemon invokes the claude CLI for each card in a queue without bounds:
while read card; do
claude "$JADA_PROMPT" < "$card"
done
No timeout. No token limit. If a prompt generates 50k tokens and 100 cards arrive in a day, that's 5M tokens—easily explaining runaway costs. A historical event from 2026-05-03 showed exactly this pattern: a single daemon run consumed $80+ from a stuck loop.
Where the Money Actually Goes
~85% of spend ($1,200–1,400/month): Your interactive Claude Code sessions in the CLI. Each session bills at Anthropic API rates (~$3 per M input tokens, ~$15 per M output tokens for Sonnet 4.6). A typical engineering session might consume 200k–1M tokens across multiple conversation turns. At 1–2 sessions per day, this alone covers the entire monthly bill.
~10–15% ($150–300/month): The Lightsail daemon. Token usage is unbounded and depends entirely on queue depth and prompt length. The 2026-05-03 incident proved this is dangerous.
<5% (<$75/month): Everything else—Stop hooks, daily GAS scripts, Lambda handlers. Already configured to use Haiku or low-token-limit Sonnet.
The Critical Finding: Unbounded Daemon
The jada_daemon.sh file at /opt/jada/jada_daemon.sh on Lightsail is the only system without guardrails. It processes agent-work cards sequentially, calling claude for each. No timeout. No max_tokens override. No circuit breaker.
A single stuck prompt or a queue backlog can bilk hundreds of dollars in minutes. The fix is straightforward:
# Before the claude invocation:
timeout 300 claude "$JADA_PROMPT" < "$card" || echo "Card $card timed out after 300s"
This prevents any single card from consuming more than 5 minutes of compute/tokens, capping damage to a few dollars per incident.
Key Decisions Made During the Audit
- Read-only audit first: We didn't change anything until we understood the cost structure. This prevented premature optimization or breaking production.
- Focus on the 85%, not the 15%: Interactive Claude Code sessions dwarf everything else. Moving that spend from per-token billing to a flat $100/month Max subscription is the lever.
- Isolate the tail risk: The daemon is small but dangerous. A timeout is a one-line fix that should ship immediately, independent of the larger strategy.
- Keep Haiku where it's already deployed: Daily GAS scripts and portfolio-intel are already on Haiku and produce solid results. No value in changing them.
What's Next
The full audit has been compiled and sent to stakeholders. The next phase is decision-making:
- Immediate (24 hours): Add
timeout 300to