Diagnosing and Remediating the JADA Agent Daemon: OAuth Token Expiration, Turn Limits, and Service Health
During a routine health check of the JADA orchestrator daemon running on AWS Lightsail instance 34.239.233.28, we discovered a critical OAuth token failure in the port_sheet_sync.py script, alongside expected (but worth documenting) Claude API turn-limit exits. This post covers the diagnostic process, root causes, and remediation strategy.
What Was Done
- Established SSH access to the Lightsail instance using AWS Lightsail temporary credentials (via the Lightsail API) after discovering the persistent
jada-keyprivate key was not stored locally - Collected comprehensive daemon health telemetry: service status, uptime, CPU/memory/disk usage, and CloudWatch metrics
- Analyzed 48 hours of daemon logs to identify recurring error patterns and session behavior
- Isolated the root cause of
port_sheet_syncfailures: expired or revoked Google OAuth token - Documented the expected behavior of Claude API turn-limit exits (exit code 1) and confirmed no service crash occurred
- Flagged the need for OAuth token re-authentication and potential turn-limit adjustment for complex tasks
Technical Details: Daemon Health Snapshot
Service Status
The jada-agent.service systemd unit has been active and running since May 10, 2026—three days of continuous uptime with no unexpected restarts. The instance itself has 11 days of uptime, indicating stable infrastructure. Load average sits at 0.00 during idle periods, with CPU averaging 0.65% during the 60-second polling loop. Memory consumption is minimal at 144MB of 914MB available; disk usage is 6.2GB of 39GB (17%), leaving ample headroom for logs and task artifacts.
Session Activity (UTC, May 13)
The daemon uses a 5-session-per-day quota. Today's breakdown:
- Session 1 (00:00 UTC): Hit max turns (30), exit code 1—expected behavior when task complexity exceeds Claude's turn budget within a single session
- Session 2 (00:02 UTC): Completed successfully, processed e-signature link blockers and crew page generator code, created a needs-you task
- Session 3 (00:05 UTC): Hit max turns (30), exit code 1—again, a turn limit rather than a crash
- Post-Session 3: Daemon idled, no new tasks available in the progress dashboard
Yesterday's Pattern (May 12)
The daemon hit the hard stop of 5/5 sessions before midnight UTC, with 3 pending tasks queued. As expected, these tasks cleared at midnight when the session quota reset. This is the intended behavior for rate-limiting and cost control.
Critical Issue: Google OAuth Token Failure in port_sheet_sync.py
Every 30-minute sync cycle since at least May 13 afternoon has failed with:
[port-sheet] token error: HTTP Error 400: Bad Request
Root Cause: The Google OAuth token stored for port_sheet_sync.py` (located in the secrets directory referenced by repos.env) is expired or has been revoked by Google. OAuth 2.0 refresh tokens can expire if unused for more than 6 months, or if the user revokes access via Google Account settings.
Impact: Port sheet syncs have been non-functional. Any task that depends on up-to-date port sheet data will lack fresh information. If the port sheet is critical to booking automation or crew scheduling, this represents a data staleness risk.
Why This Happened: The Google OAuth token for this service account was likely authenticated months ago and has not been refreshed. Google's OAuth 2.0 implementation automatically revokes long-idle refresh tokens as a security measure. The auth_ga.py script we created during this session uses the google-auth-oauthlib library and supports re-authentication, but it was designed for Analytics (GA4) credentials, not for the general-purpose Google Sheets API token used by port_sheet_sync.py.
Infrastructure & Credential Management
SSH Access Pattern
The jada-key private key (key pair name: jada-key) is managed through AWS Lightsail. Since we do not persist the private key locally in the repository, we use the Lightsail API to request temporary SSH access credentials:
aws lightsail get-instance-access-details \
--instance-name jada-agent \
--region us-east-1
This returns a temporary key valid for a limited window, reducing the blast radius if credentials are compromised. The alternative—SSM Session Manager—is equally valid but requires IAM permissions and a VPC endpoint.
Secrets Storage Location
Google OAuth tokens and other credentials are stored in a dedicated secrets directory referenced in repos.env. During this session, we verified that the GA4 credentials (client_id and client_secret for dangerouscentaur@gmail.com) exist and are valid, allowing the new auth_ga.py` script to reuse them for subsequent GA4 property enumeration and reporting.
File Permissions
We enforced restrictive permissions on the client secrets file (chmod 600) to ensure only the owner can read credentials—a best practice for any file containing API keys or OAuth credentials.
Claude API Turn Limits: Expected Behavior
Two of today's three agent sessions exited with code 1 after hitting the 30-turn limit. This is not a daemon failure; it's the expected behavior when a single task or task queue is too complex for one 30-turn session. The daemon logs this as an error for visibility, but it continues running and picks up new tasks on the next session cycle.
Why This Occurs: Complex multi-step tasks (e.g., refactoring booking widget JavaScript, analyzing analytics reports, or generating new content pages) can easily consume 20–30 turns when they involve iterative debugging, multiple file edits, or API calls with error handling. Once the turn limit is hit, the session cleanly exits, and the task is either completed (if it was simple enough to finish) or remains partially done until the next session.
Mitigation Strategies:
- Break complex tasks into smaller, single-purpose jobs (e.g., "fix booking widget JavaScript" as one task, "deploy to staging" as another)
- Increase the turn limit if tasks are routinely incomplete after 30 turns (requires adjusting the daemon configuration)
- Use external tools (shell scripts, Python utilities) to handle repetitive or deterministic work outside the agent loop, reducing turn consumption
Key Decisions
- Ephemeral SSH Credentials Over Persistent Keys: We chose Lightsail's temporary credential API over storing a private key in the repo. This reduces exposure and aligns with AWS security best practices.
- OAuth Token Re-authentication for
port_sheet_sync.py: Rather than