Upgrading Claude Agent Models in Production: Haiku 4.5 → Sonnet 4.6 for Complex Orchestration Workflows

```html

What Was Done

We upgraded the default Claude model for our agent orchestration system from Claude Haiku 4.5 to Claude Sonnet 4.6, targeting improved task decomposition and reasoning capabilities for complex booking workflows running on AWS Lightsail and EC2 infrastructure. The change was made via configuration management without disrupting the existing agent infrastructure.

Technical Details: Model Configuration Management

The core change involved updating the Claude CLI configuration file to specify the model used by our custom agent command:

~/.claude/settings.json

The configuration update was performed programmatically by modifying the "model" field from "claude-haiku-4.5" to "claude-sonnet-4.6". This single source of truth ensures consistency across all agent invocations using the standard CLI command:

cd ~/Documents/repos && claude --dangerously-skip-permissions

Why Sonnet over Haiku for orchestration: Haiku 4.5 is optimized for speed and cost, making it excellent for straightforward tasks. However, our JADA booking system requires multi-step task decomposition—parsing complex user requests, routing to specialist agents, handling conditional logic, and aggregating results. Sonnet 4.6 provides approximately 3-4x better reasoning capability while maintaining reasonable latency for orchestration-layer work, where the bottleneck is typically I/O (API calls to booking systems) rather than model inference time.

Architecture: Agent Orchestration Pattern

Our system follows a hierarchical agent pattern:

Orchestrator Agent (Claude Sonnet 4.6): Runs on EC2, decomposes user intent into sub-tasks, routes to specialist agents, and aggregates responses
Specialist Agents (potentially mixed models): Handle domain-specific operations (booking validation, payment processing, itinerary generation)
JADA Service Layer: Interacts with external booking APIs and databases

The orchestrator is deployed on an AWS Lightsail instance (jada-agent in us-east-1 region) running jada-agent.service under systemd control. Model selection is environment-independent—configuration is read from ~/.claude/settings.json at runtime.

Infrastructure Verification

Before deploying model changes, we verified the orchestrator instance health:

aws lightsail get-instance --instance-name jada-agent --region us-east-1

This command confirmed instance state and networking configuration. The orchestrator is reachable via SSH for debugging and can execute systemd commands to manage the agent service:

systemctl status jada-agent.service

File descriptor limits were also reviewed and optimized. The command ulimit -n 2147483646 was set to accommodate high-concurrency workloads (setting max open file descriptors to ~2.1B, effectively the maximum 32-bit signed integer). This is critical for the orchestrator when spawning multiple concurrent specialist agents or maintaining long-lived connections to booking systems.

Key Decision: Session Persistence vs. Immediate Effect

An important technical note: updating ~/.claude/settings.json takes effect on new terminal sessions only. The current terminal session remains initialized with the previous model. This is by design—it prevents mid-task model switching that could create inconsistent reasoning states.

For production workflows, the orchestrator service reads the configuration file at startup, so restarting jada-agent.service ensures the new model is used:

sudo systemctl restart jada-agent.service

If using the CLI directly in ad-hoc workflows, engineers should open a fresh terminal after updating settings to see the new model active.

Cost and Performance Implications

Sonnet 4.6 is approximately 2–3x more expensive per token than Haiku 4.5, but orchestration-layer costs are often acceptable because:

Orchestration is sparse: The orchestrator runs once per user request to decompose intent, not for every sub-task. Token consumption is minimal relative to downstream specialist agents or API calls.
Latency is I/O-bound: The actual bottleneck in booking workflows is waiting for external APIs (payment gateways, hotel systems) to respond, not Claude's inference time. Sonnet's slightly higher latency (vs. Haiku) is negligible in comparison.
Quality ROI: Better task decomposition reduces rework and failed bookings, offsetting the per-token cost increase.

However, if specialist agents also upgrade to Sonnet across the board, costs compound linearly. A tiered approach—Sonnet for orchestration, Haiku for simple specialist tasks—optimizes the cost-quality tradeoff.

Data Flow: Ensuring Tasks Reach the Orchestrator

The upgrade preserves the existing data flow:

User request arrives at the API endpoint (hosted on Lightsail or behind CloudFront)
Request is queued or directly passed to the orchestrator service via IPC or HTTP
Orchestrator (now using Sonnet 4.6) processes the request and spawns specialist agents
Results are aggregated and returned to the user

No changes were made to networking, routing, or service discovery. The orchestrator remains at the same network address and continues receiving requests through existing channels. The model upgrade is transparent to upstream and downstream services.

Verification and Testing

To confirm the orchestrator is functioning with the new model:

grep -n '"model"' ~/.claude/settings.json

Should return: "model": "claude-sonnet-4.6"

For running workflows, check that specialist agents are receiving properly decomposed tasks. If task quality degrades (incomplete decomposition, hallucinated sub-tasks), the issue is likely downstream agent configuration, not the orchestrator upgrade itself.

What's Next

Future improvements include:

A/B testing: Compare booking success rates and latency before/after the upgrade to quantify the quality improvement.
Model stratification: Evaluate whether specialist agents should also upgrade or remain on Haiku to balance cost and capability.
Observability: Add structured logging to track model version in production logs, ensuring visibility into which model processed each request.
Fallback strategy: Implement a graceful downgrade path if Sonnet's latency or cost becomes prohibitive (reverting to Haiku with a circuit breaker).

The upgrade is low-risk because it's configuration-driven and doesn't require code changes or infrastructure redeployment. Monitor system metrics (CPU, latency, error rates) for the next deployment cycle to validate the decision.

```