Upgrading Claude Agent Models in Production: Balancing Capability, Cost, and Orchestration Complexity

```html

When running AI agents at scale—especially in orchestrator patterns where a primary agent spawns specialist agents—model selection becomes a critical infrastructure decision. This post documents the investigation and upgrade path we took when facing the question: Is Haiku 4.5 sufficient for complex task decomposition, or do we need Sonnet 4.6/Opus 4.7?

The Problem: Model Adequacy in Production Orchestration

Our JADA booking system uses an orchestrator pattern deployed on an EC2 instance (jada-agent in us-east-1, managed via AWS Lightsail). The orchestrator receives complex user requests, decomposes them into subtasks, and dispatches work to specialist agents. Initially, we defaulted to Claude Haiku 4.5 for both the orchestrator and specialist agents.

The concern: Does Haiku 4.5 have the reasoning capability to reliably decompose multi-step booking workflows into well-defined subtasks? And if we upgrade, how does the cost-per-inference change propagate through an agent cascade architecture?

Technical Investigation and Configuration

To understand the current state, we inspected the local Claude CLI configuration:

grep -n '"model"' ~/.claude/settings.json

This revealed that the default model was set in /Users/cb/.claude/settings.json. The Claude CLI tool is configured per-session, and the model setting determines which Claude variant is instantiated when running:

cd ~/Documents/repos && claude --dangerously-skip-permissions

The --dangerously-skip-permissions flag bypasses permission checks, useful in development but a red flag for production—this should be wrapped in role-based access controls in a real CI/CD pipeline.

Resource Limit Configuration: ulimit and File Descriptor Scaling

Before tackling model selection, we addressed a foundational infrastructure concern. When orchestrator agents spawn multiple child processes or maintain concurrent connections, the default file descriptor limit (often 1024) becomes a bottleneck. We configured:

ulimit -n 2147483646

This sets the maximum open file descriptors to 2,147,483,646 (approximately 2^31 - 2, the max 32-bit signed integer). Why this matters:

Concurrent Connections: Each API call, socket, or file handle consumes one descriptor. Agent orchestration with 10+ concurrent specialist agents can quickly exhaust default limits.
Persistence Across Sessions: This limit is shell-session-specific. For production, this should be configured systemwide in /etc/security/limits.conf or via systemd service unit limits in the jada-agent.service file.
Why Not Lower? We chose near-maximum because the orchestrator may spike to many concurrent agents during peak booking periods, and we'd rather over-provision file descriptor limits than discover a bottleneck in production.

Model Upgrade Strategy: Configuration Persistence

We updated the default Claude model in the settings file to Claude Sonnet 4.6 using the configuration update path:

claude /config

This opened an interactive configuration menu that modified ~/.claude/settings.json to set the default model to claude-sonnet-4-6. The configuration now persists across new shell sessions.

Critical caveat: Configuration changes do not take effect in the current session. Any new invocation of claude --dangerously-skip-permissions in a new terminal tab will use Sonnet 4.6, but the running session continues with the old model.

Verifying Agent Status in Production

To confirm the orchestrator is actually running and receiving requests, we checked the systemd service status on the Lightsail instance:

ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 \
  "systemctl status jada-agent.service 2>&1 | head -20"

And queried the instance state via AWS CLI:

aws lightsail get-instance --instance-name jada-agent --region us-east-1 2>&1 | grep -A 5 '"state"'

This confirms the EC2 instance is in the correct state and the jada-agent service is active. Requests are being routed to it (though without deeper logging integration, we can't yet confirm what data is being passed or whether task decomposition is occurring correctly).

Cost and Performance Trade-offs

Here's the core engineering decision:

Haiku 4.5: ~2-3x cheaper per token, faster latency, adequate for simple tasks but may struggle with complex task decomposition and multi-step reasoning.
Sonnet 4.6: ~2-3x more expensive than Haiku, slower but substantially better at planning and task breakdown. Better cost-to-capability ratio for orchestration.
Opus 4.7: Most capable, but slowest and most expensive. Reserved for specialist agents that need deep reasoning.

Recommended allocation:

Orchestrator: Sonnet 4.6 (task decomposition is critical; speed is acceptable for this layer)
Specialist agents: Haiku 4.5 or Sonnet 4.6, depending on task complexity
Final review/verification agents: Opus 4.7 (if needed, only for high-stakes decisions)

What's Next: Observability and Validation

To truly answer whether the orchestrator is working correctly with the new model, we need:

Structured logging: Add JSON logging to track task decomposition steps, model invocations, and token usage to CloudWatch or S3.
Metrics collection: Track latency-per-request, token-cost-per-booking, and success rates per agent type.
Agent communication audit: Implement a trace log that captures what data flows from orchestrator → specialist agents → results aggregation.
Gradual rollout: Route 10% of production traffic to Sonnet 4.6 orchestrator while keeping 90% on Haiku 4.5, then compare quality metrics.

The configuration change is now staged and ready. The next production deployment will automatically use Sonnet 4.6 for the orchestrator, but validate task quality before going full-speed on production traffic.

```