Optimizing Claude Agent Orchestration: Model Selection, File Descriptor Limits, and EC2 Resource Management

During a recent development session, we encountered a critical decision point when scaling our JADA orchestrator system: should we upgrade from Claude Haiku 4.5 to Sonnet 4.6 or Opus 4.7, and what infrastructure changes are required to support more complex agentic workflows? This post walks through the technical decisions, configuration changes, and system health checks we implemented.

The Problem: Model Adequacy for Complex Task Decomposition

Our orchestrator pattern delegates task breakdown to Claude, which then spawns specialist agents for booking workflows, data processing, and user-facing interactions. Initial testing with Haiku 4.5 showed promise for simple tasks, but complex multi-step workflows—particularly those requiring nuanced reasoning about constraint conflicts or novel booking scenarios—were failing silently or producing suboptimal task graphs.

The core question: Is this a model capability gap, or a configuration/infrastructure issue?

Technical Investigation: Checking System Health

Before upgrading models, we needed to verify that our EC2 orchestrator instance was actually receiving and processing requests correctly.

Step 1: Verify Service Status

ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ubuntu@34.239.233.28 "systemctl status jada-agent.service 2>&1 | head -20"

This command checks whether the jada-agent.service systemd unit on our orchestrator EC2 instance is running. The 5-second timeout prevents hangs if the instance is unreachable; StrictHostKeyChecking=no bypasses fingerprint verification for automation contexts.

Step 2: Inspect AWS Lightsail Instance State

aws lightsail get-instance --instance-name jada-agent --region us-east-1 2>&1 | grep -A 5 '"state"'

We use AWS Lightsail for the orchestrator to reduce operational overhead compared to raw EC2. This command verifies the instance is in a running state rather than stopped or pending.

Step 3: Audit Model Configuration Across Environments

grep -n '"model"' /Users/cb/.claude/settings.json /Users/cb/Documents/repos/.claude/settings.json

We discovered multiple settings.json files across different directory contexts. The Claude CLI respects a hierarchy: project-local settings override user-home settings. By auditing these files, we identified that the orchestrator was inheriting a default model setting that wasn't explicitly documented.

env | grep -i claude | grep -i model

Environment variables can also override CLI defaults. This check ensures no conflicting model specifications are set in the shell environment.

Infrastructure: The File Descriptor Limit Issue

One command that appeared in the session logs caught our attention:

ulimit -n 2147483646

What this does: Sets the per-process file descriptor limit to 2,147,483,646 (approximately 2^31 - 2).

Why it matters: When an orchestrator spawns multiple concurrent agents, each agent may open file handles for:

  • Network sockets to Claude API endpoints
  • Temporary files for prompt staging and response caching
  • Log file descriptors for structured logging
  • Database connection pools

The default Linux limit is often 1024 or 2048 per process. With 10+ concurrent agents, we could exhaust descriptors and see cryptic "too many open files" errors.

Critical note: This command only affects the current shell session. To make it persistent for the jada-agent service:

/etc/security/limits.conf
ubuntu soft nofile 2147483646
ubuntu hard nofile 2147483646

Then restart the systemd service to pick up the new limits.

Model Selection: Technical Decision Framework

We evaluated three options:

  • Haiku 4.5: ~$0.80 per 1M input tokens. Fast (good for latency-sensitive tasks). Limited reasoning depth for complex task decomposition.
  • Sonnet 4.6: ~$3 per 1M input tokens. Balanced capability/cost. Excellent for orchestration and multi-step reasoning.
  • Opus 4.7: ~$15 per 1M input tokens. Maximum reasoning capability. Overkill for orchestration; better reserved for specialist agents handling novel problems.

Decision: Sonnet 4.6 for the orchestrator. Reasoning:

  • Task decomposition is the critical path—failures here cascade to all downstream agents
  • Sonnet's improved instruction-following reduces hallucinated task edges in the DAG
  • At typical orchestration token budgets (~5–10K tokens per workflow), cost increase is acceptable
  • Sonnet is still 5x cheaper than Opus, important when scaling to 100+ daily orchestration runs

Configuration Changes: Making Model Updates Persistent

We updated the default model in two places:

User-Level Configuration

~/.claude/settings.json
{
  "model": "claude-sonnet-4-6",
  "defaultProvider": "anthropic",
  "apiVersion": "2024-06"
}

This file is checked first when running claude commands in the home directory context.

Project-Level Override

~/Documents/repos/.claude/settings.json
{
  "model": "claude-sonnet-4-6",
  "contextWindow": "200k",
  "temperature": 0.3
}

Lower temperature (0.3 instead of default 1.0) for the orchestrator ensures more deterministic task decomposition. When agents need creative problem-solving, they're invoked separately with higher temperature.

CLI Override for One-Off Tasks

cd ~/Documents/repos && claude --dangerously-skip-permissions --model claude-sonnet-4-6

The --dangerously-skip-permissions flag allows Claude to read/write files in the repo without interactive prompts—essential for automated workflows. The --model override takes precedence over settings files.

Important caveat: Configuration changes take effect only on subsequent CLI invocations. The current shell session preserves the model it was initialized with.

Verification: Is the Orchestrator Receiving Work?

To confirm the EC2 instance is actively processing tasks:

aws lightsail get-instance --instance-name jada-agent --region us-east-1

Check the networkingDetails section for recent activity. Cross-reference with CloudWatch metrics:

aws cloudwatch get-metric-statistics --namespace AWS/Lightsail --metric-name NetworkIn --start-time 2024-01-15T00:00:00Z --end-time 2024-01-15T02:00:00Z --period 300