Diagnosing and Resolving OAuth Token Expiration in Automated Google Sheets Sync Pipeline

What Was Done

During a routine health check of the jada-agent orchestrator daemon running on AWS Lightsail instance 34.239.233.28, we identified a critical failure in the automated port sheet synchronization pipeline. The port_sheet_sync.py script, responsible for bidirectional syncing between a Google Sheet and our internal task tracking system, has been failing every 30 minutes for the past 18+ hours with HTTP 400 errors. This post documents the diagnosis, root cause analysis, and remediation strategy.

Technical Details: The Failure Pattern

The jada-agent daemon maintains a 60-second poll loop that checks for pending tasks in the progress dashboard and routes them to Claude for execution. Simultaneously, a cron job fires port_sheet_sync.py every 30 minutes to keep our Google Sheets data synchronized with our task queue.

Examining the daemon logs via SSH connection revealed a consistent error pattern:

[port-sheet] token error: HTTP Error 400: Bad Request
Timestamp: 2026-05-13 12:30:15 UTC
Timestamp: 2026-05-13 13:00:22 UTC
Timestamp: 2026-05-13 13:30:44 UTC
... (repeating every ~30 minutes)

The error originates from the Google Sheets API client library attempting to refresh an expired OAuth 2.0 access token. Unlike short-lived access tokens (typically valid for 3600 seconds), the underlying refresh token—used to obtain new access tokens—has either expired or been revoked. This is a common failure mode when:

The refresh token exceeds its 6-month inactivity window without being used
The user revokes the connected app's permissions in their Google Account settings
The OAuth consent screen configuration changes and requires re-authorization
The service account credentials are rotated without updating the stored token

Infrastructure Context: The Sync Pipeline Architecture

The port sheet synchronization operates within this architecture:

Cron trigger: Runs every 30 minutes on the jada-agent Lightsail instance
Script location: /usr/local/bin/port_sheet_sync.py
OAuth credentials storage: Stored in the repos.env configuration file (located at ~/Documents/repos/repos.env on development machines and in secure storage on the Lightsail instance)
Target Google Sheet: Shared drive under the dangerouscentaur@gmail.com account
Python dependencies: google-auth-oauthlib, google-auth-httplib2, and the Google Sheets API v4 client

The script uses a two-step authentication flow: it loads a stored OAuth token (containing both access and refresh tokens), attempts to use the access token for API calls, and automatically refreshes using the refresh token when the access token expires. The 400 error indicates the refresh token is no longer valid.

Diagnosis Process

We accessed the Lightsail instance using AWS Systems Manager Session Manager (SSM) after determining the SSH key was not locally available. The process was:

# Get temporary SSH credentials from Lightsail API
aws lightsail get-instance-access-details \
  --instance-name jada-agent \
  --region us-east-1

# Write temporary key and connect
ssh -i /tmp/lightsail_key ubuntu@34.239.233.28

# Check service status
systemctl status jada-agent.service

# Examine recent logs
journalctl -u jada-agent.service -n 100 --no-pager

# Verify cron logs for port_sheet_sync
grep port_sheet_sync /var/log/syslog | tail -20

The daemon itself remains healthy: 11 days uptime, 0.65% average CPU utilization, 144MB memory usage on a 914MB system, and zero status check failures. The issue is isolated to the OAuth token used by the port sheet sync script.

Root Cause and Key Decisions

The OAuth token stored in repos.env has expired or been revoked. Rather than attempting to refresh in-place, we must re-authenticate the dangerouscentaur@gmail.com account through Google's OAuth 2.0 consent flow. This requires:

Running the auth_ga.py script (located at /Users/cb/Documents/repos/tools/auth_ga.py) on a development machine with browser access
Authenticating as dangerouscentaur@gmail.com and granting Google Sheets API permissions
Retrieving the new refresh token and access token from the authorization response
Updating the OAuth credentials in repos.env on both development and production environments
Restarting the jada-agent daemon to pick up the new token configuration

We chose re-authentication over attempting token refresh because:

Token refresh failures typically indicate the refresh token itself is invalid, not just the access token
Re-authorization ensures we have explicit, current user consent for API access
This approach is more transparent and auditable than attempting silent recovery
It gives us an opportunity to validate that the stored token structure matches the current library expectations

Session Activity Context

The jada-agent completed three sessions on 2026-05-13 UTC:

Session 1 (00:00 UTC): Hit the 30-turn limit (expected for complex multi-step tasks)
Session 2 (00:02 UTC): Successfully completed, created a "needs-you" task for e-signature and crew page blockers
Session 3 (00:05 UTC): Hit the 30-turn limit again

After session 3, the daemon found no new tasks and entered idle state (0.00 load average). The port sheet sync failures were unrelated to agent task processing—they occurred in parallel via cron schedule.

What's Next

Immediate action items:

Execute auth_ga.py --account dangerouscentaur@gmail.com on a development machine with a web browser
Complete the Google OAuth consent flow in the browser
Extract and validate the new refresh token from the response
Update the GA_TOKEN and related credential entries in repos.env
Deploy updated repos.env to the Lightsail instance
Restart the jada-agent service: systemctl restart jada-agent.service
Verify the next cron execution (within 30 minutes) succeeds with no HTTP 400 errors

The daemon is otherwise operating normally with excellent uptime and resource efficiency. Once the OAuth token is refreshed, the port sheet sync pipeline should resume normal operation.