Automating Monthly Revenue Reports: OAuth Token Refresh, Google Sheets API Integration, and Email Delivery
Over the past development session, we built and deployed an end-to-end automation pipeline for generating and distributing monthly revenue statements. The system pulls charter data from DynamoDB, generates formatted Excel workbooks via the Google Sheets API, and delivers statements via Gmail—all triggered by a single Python script on our EC2 instance.
What Was Done
- Fixed Google OAuth token refresh by patching
reauth_google.pyto handle expired credentials gracefully - Built Excel workbook generation using openpyxl, reading from DynamoDB charter records and applying revenue rules
- Created Sheets API upload workflow to push the workbook to Google Drive and share with stakeholders
- Implemented statement email distribution via Gmail API with templated attachments
- Established crew roster management in DynamoDB for cascade-based call-to-crew notifications
Technical Details: OAuth and Token Management
The initial blocker was a stale OAuth token in the Google credentials cache. The original reauth_google.py script at /home/ubuntu/repos/reauth_google.py was not properly detecting and refreshing expired tokens before API calls.
Root Cause: The Google Auth library's refresh logic requires explicit token validation and refresh calls before making API requests. Our script was attempting API calls without checking token expiry first.
Solution: We patched the script to:
- Validate token age on each invocation
- Call
credentials.refresh(Request())if token is stale - Write refreshed credentials back to the cache file immediately
- Add explicit error handling for refresh failures
The patched version was syntax-checked locally and deployed to the EC2 instance with a backup of the original. After deployment, a subsequent Gmail API call succeeded, confirming token refresh was working correctly.
Revenue Report Generation Pipeline
The monthly statement generation happens in several stages:
Stage 1: Charter Data Extraction
We query the DynamoDB table crew-dispatch (deployed in us-east-1) using a scan operation filtered to the target month. The schema includes:
charterId(partition key)guestNames(guest list)totalAmount(raw booking total)reportableRevenue(amount to show in monthly statement)
The reportableRevenue field is critical: it reflects the business rule that only certain charter types or payment states count toward monthly revenue reporting. By reading prior months' statements (stored in the JADA business folder on Drive), we identified the pattern: charters with confirmed payment and non-cancelled status contribute their full amount to reportable revenue.
Stage 2: Excel Workbook Construction
We use openpyxl (Python library for .xlsx generation) to build the statement file. The process:
- Load a template tab with formatting, headers, and totals rows
- Iterate over charter records and append rows with guest names, dates, amounts
- Populate the totals row with a SUM formula over the amount column
- Apply number formatting (currency) to amount cells
- Save the workbook to a temporary file (e.g.,
/tmp/statement_may.xlsx)
The template is maintained in Drive and downloaded at runtime to ensure formatting consistency. Each month's tab is named after the month (e.g., "May", "June") and follows a single-sheet-per-month convention.
Stage 3: Drive Upload and Share
Once the .xlsx is generated locally on the EC2 instance, we use the Google Sheets API to:
- Upload the file to a shared Drive folder (using
google.auth.transport.requestsandgoogleapiclient.discovery) - Set sharing permissions for stakeholders (via
files().update()withpermissions().create()) - Store the file ID and web-viewable link for email distribution
The Drive folder is structured as /JADA Internal/Reports/{year}/{month}/ to maintain organization and audit trail.
Stage 4: Email Delivery
The Gmail API sends the statement to a distribution list stored in a prior-recipients memo. We:
- Construct a MIME message with the .xlsx as an attachment
- Use
gmail.users().messages().send()with the attachment payload - Log recipient addresses and file IDs for reconciliation
By searching Gmail for prior statement sends, we identified the standard recipients and maintained continuity across months.
Infrastructure and Resource Management
DynamoDB Tables:
crew-dispatch(us-east-1) — charter records, indexed by date for monthly queriesroster(us-east-1) — crew availability and captain assignments
EC2 Instance:
- Host:
34.239.233.28(us-east-1) - Key pair:
jada-key.pem(stored in SSH agent) - Repos:
~/repos/containing the Python scripts and cached credentials - Secrets:
~/.secrets/with Google service account JSON and OAuth refresh tokens
Google Cloud Integration:
- Service account credentials stored in
~/.secrets/google_creds.json - OAuth 2.0 refresh tokens cached in
~/.secrets/token.pickle(refreshed by patched reauth script) - Gmail API scopes:
gmail.readonly,gmail.send,gmail.modify - Google Drive and Sheets scopes:
drive,spreadsheets
Key Decisions
- Why DynamoDB for charters: Real-time updates as bookings come in; no need to sync external CRM. TTL-based expiry for test records.
- Why openpyxl over Google Sheets API for local generation: Faster iteration, simpler formatting control, and attachment flexibility for email delivery.
- Why Drive storage instead of S3: Stakeholders already use Drive; sharing and audit logs are built-in; reduces friction for non-technical users.
- Why cached credentials and refresh tokens: Avoids repeated OAuth flow; enables headless cron scheduling; token rotation happens transparently on each run.
What's Next
The pipeline is now functional for May and June statements. Future work includes:
- Scheduled automation: