Diagnosing and Resolving OAuth Token Expiration in Automated Google Sheets Sync Pipeline
What Was Done
During a routine health check of the jada-agent orchestrator daemon running on AWS Lightsail instance 34.239.233.28, we identified a critical failure in the automated port sheet synchronization pipeline. The port_sheet_sync.py script, responsible for bidirectional syncing between a Google Sheet and our internal task tracking system, has been failing every 30 minutes for the past 18+ hours with HTTP 400 errors. This post documents the diagnosis, root cause analysis, and remediation strategy.
Technical Details: The Failure Pattern
The jada-agent daemon maintains a 60-second poll loop that checks for pending tasks in the progress dashboard and routes them to Claude for execution. Simultaneously, a cron job fires port_sheet_sync.py every 30 minutes to keep our Google Sheets data synchronized with our task queue.
Examining the daemon logs via SSH connection revealed a consistent error pattern:
[port-sheet] token error: HTTP Error 400: Bad Request
Timestamp: 2026-05-13 12:30:15 UTC
Timestamp: 2026-05-13 13:00:22 UTC
Timestamp: 2026-05-13 13:30:44 UTC
... (repeating every ~30 minutes)
The error originates from the Google Sheets API client library attempting to refresh an expired OAuth 2.0 access token. Unlike short-lived access tokens (typically valid for 3600 seconds), the underlying refresh token—used to obtain new access tokens—has either expired or been revoked. This is a common failure mode when:
- The refresh token exceeds its 6-month inactivity window without being used
- The user revokes the connected app's permissions in their Google Account settings
- The OAuth consent screen configuration changes and requires re-authorization
- The service account credentials are rotated without updating the stored token
Infrastructure Context: The Sync Pipeline Architecture
The port sheet synchronization operates within this architecture:
- Cron trigger: Runs every 30 minutes on the jada-agent Lightsail instance
- Script location:
/usr/local/bin/port_sheet_sync.py - OAuth credentials storage: Stored in the
repos.envconfiguration file (located at~/Documents/repos/repos.envon development machines and in secure storage on the Lightsail instance) - Target Google Sheet: Shared drive under the
dangerouscentaur@gmail.comaccount - Python dependencies:
google-auth-oauthlib,google-auth-httplib2, and the Google Sheets API v4 client
The script uses a two-step authentication flow: it loads a stored OAuth token (containing both access and refresh tokens), attempts to use the access token for API calls, and automatically refreshes using the refresh token when the access token expires. The 400 error indicates the refresh token is no longer valid.
Diagnosis Process
We accessed the Lightsail instance using AWS Systems Manager Session Manager (SSM) after determining the SSH key was not locally available. The process was:
# Get temporary SSH credentials from Lightsail API
aws lightsail get-instance-access-details \
--instance-name jada-agent \
--region us-east-1
# Write temporary key and connect
ssh -i /tmp/lightsail_key ubuntu@34.239.233.28
# Check service status
systemctl status jada-agent.service
# Examine recent logs
journalctl -u jada-agent.service -n 100 --no-pager
# Verify cron logs for port_sheet_sync
grep port_sheet_sync /var/log/syslog | tail -20
The daemon itself remains healthy: 11 days uptime, 0.65% average CPU utilization, 144MB memory usage on a 914MB system, and zero status check failures. The issue is isolated to the OAuth token used by the port sheet sync script.
Root Cause and Key Decisions
The OAuth token stored in repos.env has expired or been revoked. Rather than attempting to refresh in-place, we must re-authenticate the dangerouscentaur@gmail.com account through Google's OAuth 2.0 consent flow. This requires:
- Running the
auth_ga.pyscript (located at/Users/cb/Documents/repos/tools/auth_ga.py) on a development machine with browser access - Authenticating as
dangerouscentaur@gmail.comand granting Google Sheets API permissions - Retrieving the new refresh token and access token from the authorization response
- Updating the OAuth credentials in
repos.envon both development and production environments - Restarting the jada-agent daemon to pick up the new token configuration
We chose re-authentication over attempting token refresh because:
- Token refresh failures typically indicate the refresh token itself is invalid, not just the access token
- Re-authorization ensures we have explicit, current user consent for API access
- This approach is more transparent and auditable than attempting silent recovery
- It gives us an opportunity to validate that the stored token structure matches the current library expectations
Session Activity Context
The jada-agent completed three sessions on 2026-05-13 UTC:
- Session 1 (00:00 UTC): Hit the 30-turn limit (expected for complex multi-step tasks)
- Session 2 (00:02 UTC): Successfully completed, created a "needs-you" task for e-signature and crew page blockers
- Session 3 (00:05 UTC): Hit the 30-turn limit again
After session 3, the daemon found no new tasks and entered idle state (0.00 load average). The port sheet sync failures were unrelated to agent task processing—they occurred in parallel via cron schedule.
What's Next
Immediate action items:
- Execute
auth_ga.py --account dangerouscentaur@gmail.comon a development machine with a web browser - Complete the Google OAuth consent flow in the browser
- Extract and validate the new refresh token from the response
- Update the
GA_TOKENand related credential entries inrepos.env - Deploy updated
repos.envto the Lightsail instance - Restart the jada-agent service:
systemctl restart jada-agent.service - Verify the next cron execution (within 30 minutes) succeeds with no HTTP 400 errors
The daemon is otherwise operating normally with excellent uptime and resource efficiency. Once the OAuth token is refreshed, the port sheet sync pipeline should resume normal operation.