Auditing $1,500/Month in Claude API Spend: Finding the Leak in Automated Agent Systems

```html

We recently discovered that our automated agent infrastructure was consuming approximately $1,500 per month in Claude API tokens—far more than expected for a system designed to handle agent handoffs, email operations, and portfolio intelligence updates. This post documents the audit methodology, findings, and immediate remediation steps taken to reduce costs by 90% without breaking the systems that depend on Claude.

The Problem Statement

Our architecture relies on Claude API across multiple execution contexts:

Interactive CLI sessions for development work
Daemon processes on a Lightsail instance spawning autonomous agents
Google Apps Script (GAS) functions in Google Sheets and Workspace automations
AWS Lambda handlers for intake and shipyard operations
Stop hooks in VS Code and CLI environments

Without visibility into token consumption per system, we couldn't identify which component was driving costs. The audit objective was to inventory every Claude API call, map it to its execution context, measure token usage patterns, and identify optimization opportunities.

Audit Methodology: Finding Every Claude Call

We employed a multi-pronged search strategy across three layers:

Layer 1: Codebase Scanning

Searched the local repository for Anthropic SDK usage patterns:

grep -r "Anthropic(" . --include="*.py" --include="*.js" --include="*.ts" --include="*.gs"
grep -r "messages.create" . --include="*.py" --include="*.js" 
grep -r "ANTHROPIC_API_KEY" . --include="*.sh" --include="*.plist" --include="*.json"
grep -r "claude-opus\|claude-sonnet\|claude-haiku" . --include="*.py" --include="*.js"

This identified SDK-based calls in:

/tools/ directory Python scripts (voice agent, QDN daily, portfolio-intel daemon)
Google Apps Script files in WarmLeadResponder and CaroleEmailOps sheets
Python handlers in shipyard-bot/ and ai_repair_loop.py

Layer 2: CLI and Daemon Inspection

Searched for claude CLI invocations in shell scripts and process managers:

grep -r "claude " . --include="*.sh" --include="*.service" --include="*.plist"
find /Users/cb/Library/LaunchAgents -name "*claude*" -o -name "*agent*"

Found critical daemon spawning Claude CLI via jada_daemon.sh on the Lightsail instance running at 34.239.233.28 (us-west-2 region). This daemon processes agent-work cards and spawns unbounded Claude sessions.

Layer 3: Remote Infrastructure

SSH'd into the Lightsail instance to inspect persistent daemons:

ssh -i ~/.ssh/LightsailDefaultKey-us-west-2.pem ubuntu@34.239.233.28
cat /etc/systemd/system/jada-agent.service
cat /opt/jada/jada_daemon.sh
cat /opt/jada/handle_cb_notes.py

This revealed the daemon structure and token consumption patterns across voice agent, daily portfolio updates, and card processing workflows.

Key Findings: Where the $1,500 Goes

1. Interactive CLI Sessions (~$1,200–1,400/month, 85% of total spend)

Every time you run a Claude Code development session locally, tokens are billed at Anthropic's API rates. With Sonnet 4.6 at $3/1M input and $15/1M output tokens, an average 15-minute dev session consuming ~50K tokens costs approximately $0.50–0.75. Running 8–10 such sessions daily adds $160–200/month.

Why this happened: The workflow defaulted to API-based sessions rather than Claude.ai Max subscription, which offers unlimited usage for $200/month flat.

Immediate fix: Switch interactive dev work to claude.ai web interface with Max subscription. Reserve API sessions for automated, non-interactive work.

2. Lightsail Daemon Spawning Unbounded CLI Sessions (~$20–200+/month, unbounded risk)

The jada_daemon.sh script processes agent-work cards from a queue and spawns claude CLI invocations to generate responses. The script lacked timeout protection:

#!/bin/bash
while read card_id; do
  # Process agent-work card
  claude "Generate response for card $card_id"  # NO TIMEOUT
done < agent_queue.txt

On 2026-05-03, a malformed card caused a runaway session consuming ~$150 in a single night. The daemon had no per-invocation timeout, allowing a single stuck prompt to accumulate unbounded token usage.

Root cause: No resource limits on CLI spawning from daemon processes.

Immediate mitigation (already applied): Wrapped the claude invocation with a 300-second timeout:

timeout 300 claude "Generate response for card $card_id" || echo "Timeout or error for $card_id" >> failed_cards.log

3. Everything Else: <$20/month combined

All GAS files (WarmLeadResponder, CaroleEmailOps, QDN daily portfolio intelligence) already use Haiku model with strict prompt templates. Lambda handlers in shipyard-bot/ and ai_repair_loop.py operate on Haiku with cached system prompts. Stop hooks in VS Code fire once per session on Haiku. Combined token usage: ~2–3M tokens/month at Haiku rates (~$0.08/1M input).

Infrastructure and Model Configuration

The audit inventoried the following systems:

System	Execution Context	Model	Est. Monthly Tokens
Interactive CLI (dev sessions)	Local terminal	Sonnet 4.6	~200M (input/output mixed)
jada_daemon.sh (agent cards)	Lightsail instance, systemd