```html

OAuth Token Refresh & Google Workspace Integration: Automating Monthly Revenue Reports

This post covers the infrastructure and automation work required to integrate Google Workspace APIs (Gmail, Sheets, Drive) into a backend service running on EC2, with a focus on solving token refresh failures and building a repeatable monthly reporting pipeline.

The Problem

The JADA operations backend needed to generate monthly revenue reports by pulling charter data from DynamoDB, cross-referencing guest lists and trip sheets from Google Sheets, and emailing statements to stakeholders. The existing OAuth flow had a critical failure: Google access tokens were expiring without being refreshed, causing API calls to fail silently and blocking report generation.

Root Cause Analysis

The initial diagnostic revealed that reauth_google.py on the EC2 instance was attempting to refresh tokens using a stale or malformed refresh token. The script's token storage logic wasn't properly handling the OAuth 2.0 refresh grant flow. Specifically:

  • The refresh token path in ~/.secrets/google_refresh_token was being read correctly, but the token object wasn't being serialized back to disk after refresh
  • The Google API client library (google-auth) wasn't configured to use credentials.to_json() before persisting to storage
  • No retry logic existed for transient OAuth failures, causing the entire report generation to halt

Solution Architecture

Rather than manually refreshing tokens before each report run, we implemented an automatic token refresh wrapper that:

  1. Checks token expiry on startup: Calls credentials.expired before any API interaction
  2. Refreshes automatically: Uses google.auth.transport.requests.Request() to refresh in-place
  3. Persists the new token: Writes the refreshed credentials back to ~/.secrets/google_refresh_token as JSON
  4. Validates before returning: Confirms the new token is valid and has sufficient scopes

Technical Implementation

The patched reauth_google.py script on the EC2 instance now includes:


# Load persisted refresh token
with open(os.path.expanduser('~/.secrets/google_refresh_token'), 'r') as f:
    creds_dict = json.load(f)
    credentials = google.oauth2.credentials.Credentials.from_authorized_user_info(creds_dict)

# Check and refresh if needed
if credentials.expired:
    request = google.auth.transport.requests.Request()
    credentials.refresh(request)
    # Persist refreshed token immediately
    with open(os.path.expanduser('~/.secrets/google_refresh_token'), 'w') as f:
        f.write(credentials.to_json())

This approach ensures that each time the backend wakes up (via cron, Lambda, or manual trigger), it has a valid token before attempting to hit Gmail, Sheets, or Drive APIs.

API Scopes & Workspace Integration

The OAuth flow was configured to request the following scopes:

  • https://www.googleapis.com/auth/gmail.readonly – Read archived booking confirmations and guest lists from Gmail
  • https://www.googleapis.com/auth/gmail.send – Send completed revenue statements to stakeholders
  • https://www.googleapis.com/auth/drive.readonly – Access shared Drive folders containing trip sheets and prior reports
  • https://www.googleapis.com/auth/spreadsheets – Read/write the master revenue sheet and monthly templates

Each scope was validated during the refresh cycle to prevent downstream failures when a scope was missing.

Monthly Report Pipeline

With token refresh working, we built the reporting pipeline in stages:

Stage 1: Data Collection

Python scripts in /tmp/ on the EC2 instance pull raw data:

  • build_sheet.py – Queries the crew-dispatch DynamoDB table for all charters in the reporting month, extracting guest counts, revenue totals, and crew assignments
  • gmail_contacts.py – Searches Gmail for booking confirmations from Jennifer Sanderson to cross-reference guest names and booking details
  • sheet_inspect.py – Downloads the Drive master spreadsheet to inspect tab structure and prior-month formatting rules

Stage 2: Template & Calculation

build_may_tab.py creates a new monthly tab by:

  1. Cloning the Template tab from the master workbook
  2. Populating charter rows with guest counts, per-person rates ($125pp standard, variable for charters)
  3. Calculating reportable revenue (guest count × rate, excluding crew comps)
  4. Summing monthly totals and comparing against DynamoDB records for audit purposes
  5. Formatting currency cells with proper number formatting (accounting style, 2 decimals)

Stage 3: Upload & Distribution

upload.py uses the Sheets API to:

  • Upload the completed monthly workbook to the JADA Internal Drive folder
  • Update the file metadata (description, modification timestamp)

send_statement.py then:

  • Generates a single-tab XLSX attachment in the required house format
  • Searches Gmail for prior statement recipients (from the last month's email thread)
  • Composes and sends the statement via Gmail API with CC to Sergio

Infrastructure & Storage

The reporting system uses:

  • DynamoDB: crew-dispatch and crew-roster tables in us-east-1 and us-west-2 for redundancy
  • Google Drive: JADA Business folder with /revenue-ledger and /reports subdirectories
  • EC2 Instance: ubuntu@34.239.233.28 (t3.medium, Amazon Linux 2) running cron jobs and Python scripts
  • Secrets Storage: ~/.secrets/ directory (mode 0700) containing refresh tokens, SSH keys, and credentials
  • Temp Scripts: /tmp/ directory for monthly report generation scripts (cleaned after each run)

Key Decisions

Why persist the refresh token? – Long-lived refresh tokens eliminate the need for interactive OAuth logins. Once the initial OAuth flow completes (offline_access), the token persists across EC2 reboots and doesn't require user interaction.

Why check expiry on every run? – Google tokens have a 1-hour TTL by default. Checking at startup ensures we never attempt an API call with a stale token, which would fail partway through a multi-step report.

Why use openpyxl for XLSX generation? – openpyxl allows us to preserve formatting (currency cells, merged headers, footnotes) and control sheet tabs programmatically, which is critical for matching the stakeholder-facing statement format