Diagnosing and Remediating the JADA Agent Daemon: OAuth Token Expiration and Turn Limit Analysis
On May 13, 2026, a health check of the JADA orchestrator daemon running on AWS Lightsail instance 34.239.233.28 revealed a healthy service with one critical authentication failure and a recurring operational pattern worth documenting. This post covers the diagnostic approach, findings, and remediation path.
Service Health Status: Mostly Operational
The jada-agent.service systemd unit has been running continuously since May 10 with 11 days of underlying instance uptime. Key metrics:
- CPU utilization: 0.65% average over the polling window, with no spike anomalies
- Memory consumption: 144MB of 914MB available
- Disk usage: 6.2GB of 39GB (17%), leaving ample headroom for logs and task artifacts
- Status checks: Zero failures in the prior 2-hour window
- Load average: 0.00, indicating the daemon is idle between task executions
The instance is stable. The daemon's 60-second polling loop accounts for the minimal CPU footprint.
Session Activity and Turn Limit Behavior
Today's activity log shows three sessions executed within the first five minutes of the UTC day:
- Session 1 (00:00 UTC): Hit max turn limit (30 turns) and exited with code 1
- Session 2 (00:02 UTC): Completed successfully; processed e-signature and crew page blockers, created a needs-you task
- Session 3 (00:05 UTC): Hit max turn limit again; exited with code 1
After session 3, the daemon found no pending tasks in the progress dashboard and resumed normal idle polling. This pattern is important: the "max turns" exits are not daemon crashes or service failures. They are expected terminations when a Claude agent session reaches its 30-turn conversation limit. The daemon logs these as error-level exits (code 1) but continues normal operation on the next polling cycle.
Why this matters: If task scope is expanding such that complex work requires more than 30 turns to complete, we have two remediation paths: either increase the turn limit per session, or decompose larger tasks into smaller, sequential subtasks that fit within a single session's budget.
Critical Issue: Broken Google OAuth Token in port_sheet_sync
The port_sheet_sync.py script, which syncs port sheet data to Google Sheets every 30 minutes, has been failing consistently since at least early afternoon UTC with the same error:
[port-sheet] token error: HTTP Error 400: Bad Request
This indicates the stored Google OAuth token for the service account or user account is expired or has been revoked. No port sheet syncs have executed since the failures began.
Root cause: Google OAuth2 access tokens have a default lifetime of 3600 seconds (1 hour). The token stored in the daemon's credential cache has expired. The refresh token may also be invalid, or the credential file itself may have been corrupted during a deployment or manual intervention.
Affected component: The token is stored in the jada agent's credential cache, likely in a JSON file referenced by the `port_sheet_sync.py` script during initialization.
Diagnostic Approach: SSH Access via Lightsail API
The private key for the jada-key SSH key pair was not available in the local ~/.ssh/ directory. Rather than recreate or redeploy the key, we used the AWS Lightsail API to generate temporary SSH credentials:
# Pseudocode: Get temporary access details from Lightsail API
aws lightsail get-instance-access-details \
--instance-name jada-agent \
--region us-east-1
# Response contains temporary certificate and private key, valid for 60 seconds
# Write temp key to file and immediately SSH
ssh -i /tmp/jada_temp_key.pem \
-o StrictHostKeyChecking=accept-new \
ubuntu@34.239.233.28
This approach avoids long-term key storage on the development machine and follows the principle of least privilege—temporary credentials are generated on-demand and discarded after the session.
Data Collected via SSH
Once connected, we collected:
- Service status:
systemctl status jada-agent.service - Recent logs:
journalctl -u jada-agent.service -n 100 --no-pager - Process info:
ps aux | grep jada-agent - System metrics:
free -h,df -h,uptime - Daemon logs with error context: Last 2 hours of stderr/stdout from the daemon's log file
All data was collected without modifying the system or interrupting service.
Remediation Path
To resolve the port_sheet_sync failure:
- Identify the credential file path used by
port_sheet_sync.py(likely in/home/ubuntu/.jada_creds/or similar) - Run the Google OAuth re-authentication flow:
python3 auth_ga.py --account [service-account-email] - Verify the new token is written and that the script can read it
- Trigger a manual sync:
python3 port_sheet_sync.py --force - Monitor the next three 30-minute cycles for errors in the daemon logs
For the turn limit behavior, review recent session 1 and 3 task descriptions to determine if they can be split into smaller, sequentially-dependent tasks, or if the turn limit should be increased in the daemon's configuration.
Key Decisions
- Why Lightsail API for SSH access: Eliminates the need to manage long-lived SSH keys on development machines; reduces attack surface.
- Why we didn't restart the service: The service is healthy and performing its intended function. Restarting would interrupt any in-progress work.
- Why we logged max-turn exits but didn't escalate: These are design-level signals, not errors. They tell us task scope is hitting session limits; the daemon recovers normally.
What's Next
The daemon is stable and ready for continued operation. Priority actions are to re-authenticate the Google OAuth token for port sheet syncs and to review the turn limit behavior in task design. No infrastructure changes are required at this time.
```