Bifurcating Claude API Consumption: Routing Subscription vs. Programmatic Workloads to Optimize Cost
The Problem: API Token Runaway on Heterogeneous Workloads
When you operate multiple systems that consume Claude via API—some requiring reliable, fast responses and others tolerant of longer latency or occasional errors—you face a hard economic choice: pay premium rates for all consumption, or risk degraded performance on critical paths by using cheaper models everywhere.
Our situation was acute. Interactive development and code-review work (where Claude Code runs in the IDE and I need high reliability) was sharing the same ANTHROPIC_API_KEY environment variable with background automation tasks (email digests, lead responders, batch processing). All traffic was routing to Claude API's standard endpoints at full-tier pricing. Monthly spend had climbed to levels that made the cost/benefit ratio untenable.
The insight: not all workloads have the same SLA requirements. A lead-responder email that takes 10 seconds instead of 2 is fine. A batch digest job that retries on transient failures is fine. An engineer waiting for a code suggestion in the IDE is not fine. We could separate these workloads, route interactive work to subscription Claude (via OAuth), and farm out batch/automation work to a cheaper on-premise instance running Haiku or a smaller model.
Technical Architecture: Subscription vs. Farm-Out
The solution involves two authentication paths:
- Interactive (subscription):
claudecommand invokes Claude Code with OAuth credentials stored in~/.claude/projects/Claude-credentials(managed by the Keychain integration). No API key in the environment. - Programmatic (cheap/batch): Scripts and daemons source
repos.envto loadANTHROPIC_API_KEYonly when needed, and invoke a farm-out wrapper that routes requests to the EC2 instance.
The farm-out infrastructure consists of:
- Lightsail instance:
ubuntu@34.239.233.28(internal IPip-172-26-6-34, us-west-2 region) - SSH access:
~/.ssh/LightsailDefaultKey-us-west-2.pem(passwordless, batch-mode verified) - Claude daemon:
jada-agent.service(systemd unit, active and listening) - Configuration: API key and model selection injected per-invocation by the daemon, not stored in the login shell on the remote box
Preventing Token Conflict in Shell Configuration
The core issue was in ~/.zshrc: the ANTHROPIC_API_KEY was exported globally, which caused Claude Code to detect it and prefer the API key over the OAuth token. This is by design (API keys take precedence), but it meant every interactive session was burning API credits instead of using the subscription plan.
The fix is simple but subtle:
- Remove the global export: Delete
export ANTHROPIC_API_KEY=...from~/.zshrc. - Scope the key to scripts only: Scripts that need it (like
gmb_lead_responder.pyandcarole_digest.py) explicitlysource repos.envbefore execution, loading the key into their local environment only. - Trust the OAuth fallback: When the API key is absent, Claude Code automatically uses the stored Keychain OAuth token. No settings.json manipulation needed—
forceLoginMethodis enterprise-managed and silently ignored in user-levelsettings.json.
Why this works: Claude Code checks the environment in order (API key first, then OAuth). By removing the key from the interactive shell, we force it to the second path. For programmatic use, explicit sourcing gives us fine-grained control over when and where credentials are active.
Automation Scripts: Sourcing and Farm-Out Routing
Two new scripts handle different automation domains:
gmb_lead_responder.py: Monitors Google My Business lead inboxes via IMAP, generates contextual responses using Claude, and threads replies back to the original inquiries. Sourcesrepos.envat startup to load both the API key and Gmail credentials.carole_digest.py: Aggregates incoming email into a daily digest (AM/PM runs). Parses threads, summarizes content, and generates a formatted HTML digest for delivery. Also sourcesrepos.envand runs with explicit cron scheduling on the Lightsail instance.
Both scripts:
- Check for the API key presence before attempting Claude calls (fail-fast on missing creds).
- Use
model="claude-3-5-haiku-20241022"or similar lower-cost models to minimize token burn. - Implement retry logic with exponential backoff for transient API errors.
- Log all Claude invocations to
~/.claude/projects/memory/for debugging and cost attribution.
Cron Deployment on Lightsail
The carole_digest.py script runs on a 12-hour cycle (6 AM and 6 PM UTC) via cron on the Lightsail instance. Deployment:
# On Lightsail:
# 1. scp the script to /home/ubuntu/tools/carole_digest.py
# 2. Append Gmail app password to /home/ubuntu/repos.env (if not present)
# 3. Add cron entries:
0 6 * * * source /home/ubuntu/repos.env && /usr/bin/python3 /home/ubuntu/tools/carole_digest.py >> /var/log/carole_digest_am.log 2>&1
0 18 * * * source /home/ubuntu/repos.env && /usr/bin/python3 /home/ubuntu/tools/carole_digest.py >> /var/log/carole_digest_pm.log 2>&1
Sourcing repos.env within the cron command ensures the API key and Gmail credentials are available only for that specific task, with no leakage into the system environment.
Key Decisions and Rationale
- Why not a reverse proxy or middleware? Overhead and complexity. Direct SSH + daemon injection is simpler and requires no extra infrastructure layer.
- Why Haiku, not Opus? Email digests and lead responses tolerate 50–100ms extra latency; Haiku is 10x cheaper and sufficient for structured summarization and template-based responses.
- Why source repos.env explicitly? Explicit is better than implicit. It makes credential scope visible in code review and prevents accidental leakage to unrelated commands in the same shell session.
- Why keep the Lightsail daemon auth-agnostic? The daemon doesn't store credentials; it receives them per-invocation. This allows easy credential rotation without touching the remote box.
Monitoring and Cost Attribution
All Claude invocations (both subscription and API) are logged to ~/.claude/projects