```html

Automating Charter Revenue Reporting: Google Sheets API Integration & OAuth Token Management on EC2

What Was Done

We built an end-to-end charter revenue reporting pipeline that pulls real booking data from DynamoDB, generates monthly Excel workbooks via the Google Sheets API, and delivers formatted statements via Gmail. The solution involved:

  • Patching OAuth token refresh logic in a production reauth service running on EC2
  • Building Python scripts to query DynamoDB tables (crew-dispatch and charter-chats) for May charter records
  • Generating multi-tab Excel workbooks with proper number formatting and footnotes
  • Uploading workbooks to Google Drive and distributing via Gmail with authenticated send
  • Establishing SSH access patterns and credential management for the JADA operations host

Technical Details: The OAuth Token Refresh Problem

The core blocker was a failing Google API token refresh cycle. The production host at ubuntu@34.239.233.28 (JADA ops EC2 instance) runs a reauth service located at ~/repos/reauth_google.py. When the token expired, the refresh was failing silently, breaking downstream Sheets and Gmail API calls.

Root cause: the refresh token validation in the original script was checking for a refresh_token field in the cached token JSON, but Google's OAuth flow sometimes omits this field on subsequent refreshes. The fix involved:

# Original logic (failing)
if 'refresh_token' not in creds_dict:
    return False  # Forces re-auth

# Patched logic
if not creds_dict.get('access_token'):
    return False  # Only fail if access_token is missing

This change allows the script to use the stored refresh token even when it's not explicitly present in the cached JSON. We deployed the patched version with a backup of the original, then verified the token refresh cycle completed successfully before proceeding to data pulls.

Infrastructure: DynamoDB Data Layer

Charter booking data lives in two DynamoDB tables in the JADA account:

  • crew-dispatch: contains full charter records with fields for vessel, captain, date, guest count, and amounts
  • charter-chats: tracks conversation history and proposal states for each charter

We queried these tables using EC2's IAM role (which already had DynamoDB read permissions) to extract May charter records without exposing credentials. The query pattern filtered by date range and extracted guest names, captain assignments, and revenue figures. Key insight: some charters had multiple revenue entries (e.g., separate amounts for catering vs. captain fees), so aggregation logic had to sum across all line items for a single charter event.

These raw records became the source data for the Excel workbook generation. By querying directly from DynamoDB rather than CSV exports, we eliminated manual data entry and kept the pipeline fully automated.

Excel Generation & Google Sheets Integration

We generated Excel workbooks in two stages:

Stage 1: Local Generation with openpyxl

Built a Python script (/tmp/build_may_tab.py) that:

  • Created a workbook from a Template sheet (mimicking prior April report structure)
  • Populated the May tab with charter records, dates, guest names, and revenue figures
  • Applied Excel number formatting (currency, bold headers, frozen panes)
  • Added a footnote explaining the "reportable revenue" calculation rule
from openpyxl import load_workbook
from openpyxl.styles import numbers

# Load template, access May sheet
wb = load_workbook('template.xlsx')
ws = wb['May']

# Write charter data rows starting at row 5
for idx, charter in enumerate(may_charters, start=5):
    ws[f'A{idx}'] = charter['date']
    ws[f'B{idx}'] = charter['guest_names']
    ws[f'C{idx}'] = charter['total_amount']
    ws[f'C{idx}'].number_format = '$#,##0.00'

Stage 2: Upload to Google Drive & Sheets API

The generated workbook was uploaded to Google Drive using authenticated credentials (the repaired OAuth token). Once in Drive, we used the Sheets API to:

  • Verify sheet tabs and inspect cell values
  • Confirm formatting survived the upload
  • Pull down a Drive-native copy for final review

This approach decouples local file generation from Cloud storage, allowing us to version the workbook locally while keeping a canonical copy in Drive accessible to stakeholders.

Email Distribution & Gmail API

The final step was sending the May revenue statement to stakeholders. We built a sender script that:

  • Constructed an authenticated Gmail message with MIME multipart encoding
  • Attached the standalone Excel file (same format as the Drive version)
  • Set recipient list from prior statement emails (discovered via Gmail search)
  • Included CC for Carole (ops lead) and reply-to for statement questions
from google.oauth2.credentials import Credentials
from google.auth.transport.requests import Request
from googleapiclient.discovery import build
import base64
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText

# Build Gmail service with refreshed credentials
service = build('gmail', 'v1', credentials=creds)

# Create message with attachment
msg = MIMEMultipart()
msg['to'] = 'recipient@example.com'
msg['from'] = 'operations@example.com'
msg['subject'] = 'May 2024 Revenue Statement'

# Attach the Excel file
with open('may_statement.xlsx', 'rb') as attachment:
    part = MIMEBase('application', 'octet-stream')
    part.set_payload(attachment.read())
    part.add_header('Content-Disposition', f'attachment; filename= may_statement.xlsx')
    msg.attach(part)

# Send via Gmail API
raw_message = base64.urlsafe_b64encode(msg.as_bytes()).decode()
service.users().messages().send(userId='me', body={'raw': raw_message}).execute()

The Gmail API requires the gmail.send OAuth scope, which the patched token now correctly refreshes. Recipients were validated against prior statement emails to avoid sending to stale addresses.

Key Decisions

Why Query DynamoDB Directly?

Rather than requiring manual CSV export from a dashboard, we query the source tables. This eliminates a manual step, reduces copy-paste errors, and allows future automation of monthly reports without human intervention.

Why Two Excel Versions (Local + Drive)?

The Drive version is canonical (accessible to all team members), but the email attachment is local-generated. This gives us:

  • A guarantee that the attachment matches what was sent (no accidental Drive updates)
  • A version archive on the EC2 box for audit purposes