```html

Automating Monthly Revenue Reports: OAuth Token Refresh, Google Sheets API Integration, and Email Delivery

Over the past development session, we built and deployed an end-to-end automation pipeline for generating and distributing monthly revenue statements. The system pulls charter data from DynamoDB, generates formatted Excel workbooks via the Google Sheets API, and delivers statements via Gmail—all triggered by a single Python script on our EC2 instance.

What Was Done

  • Fixed Google OAuth token refresh by patching reauth_google.py to handle expired credentials gracefully
  • Built Excel workbook generation using openpyxl, reading from DynamoDB charter records and applying revenue rules
  • Created Sheets API upload workflow to push the workbook to Google Drive and share with stakeholders
  • Implemented statement email distribution via Gmail API with templated attachments
  • Established crew roster management in DynamoDB for cascade-based call-to-crew notifications

Technical Details: OAuth and Token Management

The initial blocker was a stale OAuth token in the Google credentials cache. The original reauth_google.py script at /home/ubuntu/repos/reauth_google.py was not properly detecting and refreshing expired tokens before API calls.

Root Cause: The Google Auth library's refresh logic requires explicit token validation and refresh calls before making API requests. Our script was attempting API calls without checking token expiry first.

Solution: We patched the script to:

  • Validate token age on each invocation
  • Call credentials.refresh(Request()) if token is stale
  • Write refreshed credentials back to the cache file immediately
  • Add explicit error handling for refresh failures

The patched version was syntax-checked locally and deployed to the EC2 instance with a backup of the original. After deployment, a subsequent Gmail API call succeeded, confirming token refresh was working correctly.

Revenue Report Generation Pipeline

The monthly statement generation happens in several stages:

Stage 1: Charter Data Extraction

We query the DynamoDB table crew-dispatch (deployed in us-east-1) using a scan operation filtered to the target month. The schema includes:

  • charterId (partition key)
  • guestNames (guest list)
  • totalAmount (raw booking total)
  • reportableRevenue (amount to show in monthly statement)

The reportableRevenue field is critical: it reflects the business rule that only certain charter types or payment states count toward monthly revenue reporting. By reading prior months' statements (stored in the JADA business folder on Drive), we identified the pattern: charters with confirmed payment and non-cancelled status contribute their full amount to reportable revenue.

Stage 2: Excel Workbook Construction

We use openpyxl (Python library for .xlsx generation) to build the statement file. The process:

  • Load a template tab with formatting, headers, and totals rows
  • Iterate over charter records and append rows with guest names, dates, amounts
  • Populate the totals row with a SUM formula over the amount column
  • Apply number formatting (currency) to amount cells
  • Save the workbook to a temporary file (e.g., /tmp/statement_may.xlsx)

The template is maintained in Drive and downloaded at runtime to ensure formatting consistency. Each month's tab is named after the month (e.g., "May", "June") and follows a single-sheet-per-month convention.

Stage 3: Drive Upload and Share

Once the .xlsx is generated locally on the EC2 instance, we use the Google Sheets API to:

  • Upload the file to a shared Drive folder (using google.auth.transport.requests and googleapiclient.discovery)
  • Set sharing permissions for stakeholders (via files().update() with permissions().create())
  • Store the file ID and web-viewable link for email distribution

The Drive folder is structured as /JADA Internal/Reports/{year}/{month}/ to maintain organization and audit trail.

Stage 4: Email Delivery

The Gmail API sends the statement to a distribution list stored in a prior-recipients memo. We:

  • Construct a MIME message with the .xlsx as an attachment
  • Use gmail.users().messages().send() with the attachment payload
  • Log recipient addresses and file IDs for reconciliation

By searching Gmail for prior statement sends, we identified the standard recipients and maintained continuity across months.

Infrastructure and Resource Management

DynamoDB Tables:

  • crew-dispatch (us-east-1) — charter records, indexed by date for monthly queries
  • roster (us-east-1) — crew availability and captain assignments

EC2 Instance:

  • Host: 34.239.233.28 (us-east-1)
  • Key pair: jada-key.pem (stored in SSH agent)
  • Repos: ~/repos/ containing the Python scripts and cached credentials
  • Secrets: ~/.secrets/ with Google service account JSON and OAuth refresh tokens

Google Cloud Integration:

  • Service account credentials stored in ~/.secrets/google_creds.json
  • OAuth 2.0 refresh tokens cached in ~/.secrets/token.pickle (refreshed by patched reauth script)
  • Gmail API scopes: gmail.readonly, gmail.send, gmail.modify
  • Google Drive and Sheets scopes: drive, spreadsheets

Key Decisions

  • Why DynamoDB for charters: Real-time updates as bookings come in; no need to sync external CRM. TTL-based expiry for test records.
  • Why openpyxl over Google Sheets API for local generation: Faster iteration, simpler formatting control, and attachment flexibility for email delivery.
  • Why Drive storage instead of S3: Stakeholders already use Drive; sharing and audit logs are built-in; reduces friction for non-technical users.
  • Why cached credentials and refresh tokens: Avoids repeated OAuth flow; enables headless cron scheduling; token rotation happens transparently on each run.

What's Next

The pipeline is now functional for May and June statements. Future work includes: