Diagnosing and Remediating a Multi-Site Deployment Pipeline: Auth Token Failures, CloudFront Cache Invalidation, and Daemon Health Monitoring
Overview
This session involved troubleshooting a distributed deployment pipeline spanning multiple domains (86from.com, sailjada.com, queenofsandiego.com), debugging Google Analytics authentication failures, and performing comprehensive health diagnostics on the jada-agent orchestrator daemon running on AWS Lightsail. The work uncovered a critical OAuth token expiration in the port sheet sync process and identified architectural patterns in agent session management that needed clarification for future task scoping.
What Was Done
1. Remote Daemon Health Diagnostics via AWS Lightsail
The jada-agent service running on Lightsail instance 34.239.233.28 required comprehensive health verification. Since the SSH private key was not stored locally in ~/.ssh/, we employed AWS Lightsail's temporary credential API rather than relying on stored key material:
# Retrieve temporary SSH access credentials via Lightsail API
# (This avoids storing persistent keys in version control)
aws lightsail get-instance-access-details \
--instance-name jada-agent \
--region us-east-1
Why this approach: Temporary, time-bound credentials are more secure than persistent SSH keys in a development environment. The Lightsail API generates a certificate paired with a temporary private key that expires after 60 minutes, reducing the attack surface if credentials leak.
Once connected, we collected:
- Service status:
systemctl status jada-agent.service— confirmed active, 3-day uptime - System metrics: CPU (0.65% avg), memory (144MB / 914MB), disk (6.2GB / 39GB), load average 0.00
- Recent logs: Parsed daemon logs from the last 24 hours to identify session counts and error patterns
- Process activity: Verified daemon is actively polling for tasks and executing within expected turn limits
The daemon is healthy: no CPU spikes, normal memory usage, and proper service uptime. The instance passed all AWS status checks in the last 2 hours.
2. Google Analytics OAuth Token Remediation
During metric collection, we discovered a critical failure in the port sheet sync process. Every 30-minute scheduled execution was failing with:
[port-sheet] token error: HTTP Error 400: Bad Request
This indicated an expired or revoked Google OAuth token for the port_sheet_sync.py script. The script lives at /Users/cb/Documents/repos/tools/port_sheet_sync.py and uses OAuth2 credentials stored in a service account or user token file.
Root cause: The Google OAuth token was not refreshed before expiration. Unlike long-lived service account keys, user-delegated OAuth tokens have a finite lifetime (typically 1 hour for access tokens, 7 days for refresh tokens before re-authentication is required).
Remediation approach: Created /Users/cb/Documents/repos/tools/auth_ga.py to handle OAuth2 re-authentication:
# Command to re-authenticate Google Analytics credentials
python3 ~/Documents/repos/tools/auth_ga.py --account dangerouscentaur@gmail.com
This script uses the google-auth-oauthlib library to:
- Initiate the OAuth2 flow with Google's authorization server
- Persist a refresh token locally (with strict file permissions:
chmod 600) - Allow
port_sheet_sync.pyto automatically refresh expired access tokens without manual intervention
Why this matters: The port sheet is the source of truth for booking data. Without sync working, analytics dashboards and booking data become stale, breaking downstream reporting pipelines.
3. Multi-Site Static Content Deployment Pipeline
Significant effort went into HTML content updates and CloudFront cache invalidation across three domains:
- 86from.com: Renamed from
86dfrom.com(correcting domain nomenclature), deployed updatedindex.htmlwith SEO enhancements, added new content pagewhat-does-86d-mean/index.html - sailjada.com: 16 iterative updates to
index.html(evident from session log), refining HTML/CSS/booking widget integration - queenofsandiego.com: Updates to
BookingAutomation.gs(Google Apps Script), modifying booking automation logic
Deployment process:
# Deploy updated content to S3 origin bucket
aws s3 sync ./sites/86from.com/site s3://86from-origin/ --delete
# Invalidate CloudFront cache to force edge nodes to fetch fresh content
aws cloudfront create-invalidation \
--distribution-id E1A2B3C4D5E6F7 \
--paths "/*"
(Distribution IDs have been anonymized; actual values are stored in infrastructure-as-code templates.)
4. Booking Widget JavaScript Syntax Remediation
Multiple HTML updates to sailjada.com/index.html revealed a subtle but critical issue: the booking widget contained unescaped double-brace template syntax ({{ }}) that conflicted with JavaScript parsing. This typically occurs when:
- Frontend framework syntax (e.g., Vue.js, Angular) is embedded in HTML served via CloudFront
- The HTML is processed by a server-side templating engine that also uses double braces
- JavaScript minification or build steps fail to properly escape template delimiters
Verification and remediation:
# Extract booking widget JavaScript block and validate syntax
grep -n "script type=" sailjada.com/index.html | head -5
# Identify double-brace occurrences and their scope
grep -c "{{ " sailjada.com/index.html
# Replace template delimiters within the widget scope only
sed -i 's/{{ /{/g' sailjada.com/index.html # Example; actual regex scoped to widget div
After remediation, we re-deployed to a staging CloudFront distribution to validate parsing before production rollout.
Infrastructure and Architecture Decisions
Why Temporary SSH Credentials Over Persistent Keys
Rather than storing the jada-agent SSH private key in the development environment (e.g., ~/.ssh/jada-key), we leveraged AWS Lightsail's built-in credential API. This approach:
- Eliminates key sprawl: No persistent SSH keys stored in
~/.ssh/configor version control - Enables audit trails: AWS CloudTrail logs which IAM principal requested access and when
- Reduces blast radius: A leaked temporary credential is only valid for 60 minutes