```html

Diagnosing and Remediating a Broken OAuth Token in the Port Sheet Sync Daemon

During a routine health check of the jada-agent orchestrator daemon running on our Lightsail instance (34.239.233.28), we discovered a critical authentication failure in the port sheet synchronization workflow. This post covers the diagnosis, root cause analysis, and remediation strategy for OAuth token expiration in long-running daemon processes.

What Was Done

We performed a comprehensive health audit of the jada-agent.service running on a 2-vCPU, 1GB RAM Lightsail instance. The audit included:

  • Service status verification and uptime analysis
  • CPU, memory, and disk utilization metrics over the last 2 hours via Lightsail API
  • Systemd journal inspection for daemon logs and error patterns
  • Session counter review from the progress dashboard (3 of 5 daily sessions used)
  • Task queue analysis for pending work items
  • OAuth token validation for integrated Google APIs

Key Finding: The daemon is healthy overall (11 days uptime, 0.65% average CPU), but the port_sheet_sync.py subprocess has been failing every 30 minutes since at least May 13 afternoon UTC with a consistent "HTTP Error 400: Bad Request" from Google's OAuth2 service.

Technical Details: The OAuth Failure

The port_sheet_sync.py script runs as a scheduled subprocess within the jada-agent daemon, syncing data to a Google Sheet at 30-minute intervals. The script uses OAuth2 credentials stored in the repos.env configuration file and authenticated via the google-auth-oauthlib library.

Every sync attempt since approximately 2026-05-13 14:00 UTC has logged:

[port-sheet] token error: HTTP Error 400: Bad Request

This error indicates one of three conditions:

  • Token expiration: The refresh token is valid but the access token has expired and the refresh failed
  • Token revocation: The user has revoked access via their Google account security settings
  • Scope mismatch: The stored token was not authorized for the required Google Sheets API scope

Given the sudden onset (rather than gradual degradation) and the consistent 400 response, token revocation is most likely. This could occur if:

  • The Google account associated with the token changed its password
  • The user revoked OAuth app permissions in their Google Account security dashboard
  • Google's backend invalidated the token due to suspicious activity patterns
  • The refresh token itself expired (they have a 6-month expiration window)

Infrastructure and Process Context

The jada-agent daemon is deployed as a systemd service on a single Lightsail instance:

  • Service name: jada-agent.service
  • Instance: AWS Lightsail (IP: 34.239.233.28)
  • Uptime: 11 days (last restart: 2026-05-02)
  • Resource allocation: 2 vCPU, 1GB memory, 40GB disk

The daemon operates on a session-based execution model with daily limits:

  • Daily session quota: 5 sessions per 24-hour UTC rollover
  • Session usage (2026-05-13): 3 of 5 consumed
  • Max turns per session: 30 (Claude API limit)
  • Session 1 & 3: Hit the 30-turn limit (exit code 1, logged as error)
  • Session 2: Completed successfully, created work items for e-signature and crew page blockers

The port_sheet_sync.py script is a separate subprocess invoked by the daemon on a scheduled interval, not a core agent function. Its failure does not block task execution but does prevent real-time sync of progress metrics to the shared Google Sheet used for dashboard visibility.

Key Decisions and Architecture Patterns

Why we use OAuth2 with stored refresh tokens: Rather than embedding service account credentials (which would require rotating a shared secret across environments), we use OAuth2 with user-delegated access. This allows fine-grained permission scoping and audit trails through the user's Google Account.

Why 30-minute sync intervals: The port sheet acts as a real-time source of truth for agent session progress, task queue status, and error rates. A 30-minute window balances API quota consumption against dashboard staleness. Shorter intervals would risk hitting Google Sheets API quotas; longer intervals reduce visibility into daemon state.

Why we authenticate via auth_ga.py: The auth_ga.py tool in /Users/cb/Documents/repos/tools/ handles the OAuth2 flow for all Google API integrations (Analytics, Sheets, etc.). It persists the refresh token to a secure configuration file and provides a single source of truth for credential management across multiple scripts.

Why we monitor this at all: The daemon runs unattended and processes tasks for hours or days without direct supervision. Automated health checks catch OAuth failures, token expirations, and resource exhaustion before they compound into larger outages. The metrics we pull (CPU, memory, network, status checks) provide leading indicators of infrastructure drift.

Remediation Steps

To restore port sheet synchronization:

  1. Re-authenticate the Google account: Run the auth_ga.py script with the correct account flag to re-generate the OAuth2 refresh token.
    python3 ~/Documents/repos/tools/auth_ga.py --account [account-email]
    This will prompt for interactive browser-based authentication and store the new refresh token in repos.env.
  2. Restart the port_sheet_sync subprocess: Either restart the jada-agent.service entirely or send a signal to gracefully restart the port-sheet sync task:
    sudo systemctl restart jada-agent.service
  3. Verify sync success: Monitor the daemon logs for successful port-sheet sync completion and confirm the Google Sheet is being updated at the next 30-minute interval mark.
  4. Add token expiration alerts: Implement proactive monitoring on token age; Google refresh tokens are valid for 6 months. Set a CloudWatch alarm or daemon log watcher to alert when tokens approach 5.5 months of age.

Session Limit Pattern and Future Optimization

Two of today's three agent sessions hit the 30-turn limit (exit code 1). While the daemon correctly logs these as errors and continues running, hitting the limit means tasks were not completed in that session and remain in the queue for the next session. Session 2 completed normally and produced meaningful work items.

If max-turns exits become frequent, consider:

  • Breaking complex multi-step tasks into smaller, single-focus tasks
  • Implementing a continuation mechanism that carries context forward across sessions
  • Increasing the effective turn budget by optimizing prompts for conciseness

What's Next

Immediately: Re-authenticate the Google OAuth token and restart the port-sheet sync daemon.

Short-term: Add token expiration monitoring to the health check script; implement