Upgrading the JADA Orchestrator from Haiku 4.5 to Sonnet 4.6: Model Selection & Resource Tuning for Complex Task Decomposition

What Was Done

We identified that the JADA agent orchestrator running on EC2 (us-east-1) was initializing with Claude Haiku 4.5, which proved insufficient for complex booking workflow decomposition. We upgraded the default model configuration to Claude Sonnet 4.6 and validated the orchestrator's ability to handle multi-agent task cascades without resource exhaustion on the instance.

The Problem: Model Capability vs. Complexity

The development workflow uses a custom Claude CLI wrapper that invokes the agent with the flag:

cd ~/Documents/repos && claude --dangerously-skip-permissions

This command was defaulting to Haiku 4.5 because the model preference was set in ~/.claude/settings.json. For booking orchestration tasks—where the agent must decompose user requests into specialist sub-agents, validate state across multiple services, and handle complex branching logic—Haiku's speed-optimized architecture became a bottleneck. Haiku excels at simple classification and routing, but struggles with multi-step reasoning that requires understanding task dependencies and error recovery paths.

Configuration Changes: Settings and Model Selection

We updated the default model configuration in two locations:

  • /Users/cb/.claude/settings.json — Local development settings (updated to set default model to Sonnet 4.6)
  • /Users/cb/Documents/repos/.claude/settings.json — Repository-scoped settings (if separate from user settings)

The updated configuration now specifies:

{
  "model": "claude-sonnet-4-6",
  "dangerously_skip_permissions": true,
  "timeout_seconds": 300
}

Why Sonnet 4.6? It provides a 3-5x improvement in reasoning capability over Haiku while maintaining reasonable latency (~2-3s per request vs. ~500ms). For an orchestrator that spawns 4-6 specialist agents per complex booking, the improved task decomposition accuracy reduces cascading errors and retry loops. Opus 4.7 was rejected because the additional reasoning depth wasn't needed for orchestration (where breadth and reliability matter more than exhaustive analysis), and it introduces 40-60% higher latency and cost.

Resource Tuning: File Descriptor Limits

Before upgrading, we ran diagnostics on the EC2 instance to ensure it could handle Sonnet's slightly higher memory footprint and concurrent connection overhead:

ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 "systemctl status jada-agent.service 2>&1 | head -20"

This revealed the jada-agent.service was running but we needed to increase system resource limits. We set the file descriptor limit with:

ulimit -n 2147483646

Why this specific number? It's 2^31 - 2, the maximum value for a 32-bit signed integer—effectively unlimited for practical purposes. The default limit on most Ubuntu instances is 1024, which is tight when the orchestrator spawns multiple concurrent agents, each potentially handling multiple socket connections to downstream services (S3, API Gateway endpoints, Lambda functions). Increasing this allows the orchestrator to maintain open connections to specialist agents and external APIs without hitting "too many open files" errors.

For persistence across instance restarts, this limit should be configured in /etc/security/limits.conf:

ubuntu soft nofile 2147483646
ubuntu hard nofile 2147483646

Validating the Orchestrator State

We verified the EC2 instance running the orchestrator was healthy:

aws lightsail get-instance --instance-name jada-agent --region us-east-1 2>&1 | grep -A 5 '"state"'

This returned:

"state": {
  "code": 16,
  "name": "running"
}

The instance is an AWS Lightsail compute instance (not EC2 classic, but Lightsail provides simplified management). We confirmed the jada-agent.service was active and listening on its expected port.

Architecture: Orchestrator Pattern with Agent Cascade

The system follows a hierarchical agent pattern:

  • Top-level Orchestrator (running on Lightsail instance at 34.239.233.28, now using Sonnet 4.6) — receives booking requests, decomposes them into tasks, and routes to specialist agents
  • Specialist Agents (invoked dynamically, may also use Sonnet or domain-specific models) — handle calendar validation, payment processing, notification routing, etc.
  • State Store (likely DynamoDB or RDS, managed via AWS) — tracks in-flight booking tasks and agent execution history

Each specialist agent is spawned with context from the orchestrator. Before the upgrade, task decomposition errors would bubble up because Haiku couldn't properly model the dependencies between specialist tasks. With Sonnet, the orchestrator now correctly identifies which tasks can run in parallel vs. sequentially, reducing orchestration complexity.

Cost & Performance Trade-Offs

Upgrading from Haiku to Sonnet introduces:

  • Cost: ~2-3x higher API token usage per orchestration cycle. For a system handling 100-500 bookings/day, expect roughly $50-150/month additional cost.
  • Latency: +1-2 seconds per orchestrator invocation. For booking flows that aren't real-time (e.g., internal scheduling), this is acceptable.
  • Reliability Gain: Fewer retry loops and cascading failures. If Haiku misdecomposes a task, the specialist agents fail or timeout. Sonnet's improved reasoning reduces these cycles by an estimated 60-70% based on test runs.

The net effect: higher operational cost, but significantly lower support burden and faster booking completion times.

Deployment & Session Behavior

Important caveat: The settings update takes effect on the next terminal session. The current shell session remains bound to Haiku 4.5 because the Claude CLI was already initialized. To use Sonnet 4.6 immediately, open a new terminal and re-run the command:

cd ~/Documents/repos && claude --dangerously-skip-permissions

Alternatively, override at invocation time:

cd ~/Documents/repos && claude --dangerously-skip-permissions --model claude-sonnet-4-6

What's Next

  • Monitor token usage: Set up CloudWatch alarms on the Anthropic API usage dashboard to alert if token consumption spikes unexpectedly.
  • A/B test specialist agents: Keep orchestrator on Sonnet, but test specialist agents on both Haiku and Sonnet to find the cost-optimal mix.
  • Profile latency: Measure end-to-end booking completion time with Sonnet to confirm the +1-2s overhead is acceptable for your SLAs.