Automating Monthly Revenue Reporting: Google Sheets API Integration with Python OAuth and EC2 Deployment
What Was Done
This session focused on building an end-to-end automated revenue reporting pipeline for the charter business. The goal was to eliminate manual spreadsheet work by programmatically pulling real booking data from DynamoDB, formatting it into a Google Sheets workbook, and delivering monthly statements via email. The implementation involved OAuth2 token refresh repair, Python-based Sheets API integration, and secure credential management on a shared EC2 instance.
Technical Details: The Revenue Pipeline
The core workflow required three major components:
- Data Source: DynamoDB tables (
crew-dispatchandcharter-chats) in two AWS regions, queried for May charter records with amount fields - Processing: Python scripts to pull records, extract revenue figures, and build formatted XLSX workbooks with openpyxl
- Delivery: Gmail API integration to send statements to stakeholders (Sergio, Jennifer, etc.)
The implementation used a patched version of reauth_google.py (located on EC2 at ~/repos/queenofsandiego/) to handle token refresh. The original script had a hardcoded token path that prevented the Gmail search and Sheets upload operations from completing. The patch modified the token refresh logic to correctly resolve the unified token location stored in the shared secrets directory.
Python scripts were staged in /tmp/ during development and testing:
gmail_jennifer.py– Gmail search for booking context (Jennifer Sanderson queries)build_sheet.py– Template-based workbook generationsheet_inspect.pyandsheet_diag.py– Verification of Sheets API connectivity and file metadatabuild_may_tab.py– Revenue tab construction from live charter recordssend_statement.py– Email delivery via Gmail API
The workbook generation pipeline followed a specific rule: reportable revenue = charter total amount (not all fields in the record; specifically the `amount` field from charter records). Prior months' reports were examined to match formatting conventions and establish this baseline.
Infrastructure and Deployment
The architecture relies on AWS and Google Cloud integration:
- EC2 Host: Ubuntu instance at IP 34.239.233.28 running the JADA operational scripts and maintaining the unified Google token
- DynamoDB: Tables in us-east-1 and us-west-2 regions holding charter records
- Google Drive / Sheets: Monthly workbooks and the master template stored as Drive files; accessed via Sheets API
- Gmail: Statements sent from the shared account with proper OAuth2 scopes (gmail.send for message delivery, drive.file for workbook uploads)
The credential strategy uses a shared secrets directory on the EC2 instance with restrictive file permissions (verified via ls -la ~/repos/.secrets/). The unified token is persisted there and refreshed on-demand by the patched reauth script.
An S3 bucket and CloudFront distribution manage the shipcaptaincrew website origin, verified during infrastructure inspection to understand the overall deployment footprint.
Key Technical Decisions
Why OAuth2 Token Refresh Was Necessary: Gmail API and Sheets API both require valid access tokens. The original reauth script had a path bug that prevented the token from being refreshed before API calls, causing auth failures on every attempt to list Gmail messages or upload workbooks. The fix was minimal—correcting the token file path resolution—but critical for the entire pipeline.
Why DynamoDB Over SQL: Charter records are already stored in DynamoDB as part of the JADA operational system. Querying directly (using boto3) avoided duplicating data and ensured real-time accuracy. The schema inspection step pulled full records to understand field structure before writing the extraction logic.
Why openpyxl for XLSX Generation: The prior month reports were Excel files with specific formatting (headers, number formats, footnotes). openpyxl allowed Python code to replicate that formatting programmatically without manual intervention. A test confirmed the library was available on the EC2 environment before deployment.
Why Template-Based Workbooks: Instead of generating sheets from scratch, the pipeline copies a template workbook from Google Drive, then adds a new tab with the current month's data. This preserves formatting conventions and makes it easy to add a new month without redesigning the layout.
Why SCP for Intermediate Transfers: Large files (the May workbook) were generated on the EC2 instance but needed to be inspected locally on the Mac before final upload. Using scp (secure copy) avoided exposing files to insecure channels and made the workflow transparent.
Command Examples and Workflow
A typical session flow looked like:
# Verify SSH connectivity and locate the reauth script
ssh ubuntu@34.239.233.28 "ls -la ~/repos/queenofsandiego/reauth_google.py"
# Run the token refresh logic
ssh ubuntu@34.239.233.28 "python3 ~/repos/queenofsandiego/reauth_google.py"
# Execute the May revenue tab builder
ssh ubuntu@34.239.233.28 "python3 /tmp/build_may_tab.py"
# Download the generated workbook for local inspection
scp ubuntu@34.239.233.28:/tmp/may_statement.xlsx ~/Desktop/
# Upload to Google Drive and send via Gmail
ssh ubuntu@34.239.233.28 "python3 /tmp/upload.py && python3 /tmp/send_statement.py"
Each script was tested incrementally: Gmail search was verified before Sheets access, and the workbook was validated locally before being sent to stakeholders. This reduced debugging time and caught formatting issues early.
What's Next
The pipeline is now functional for May and establishes a pattern for June and beyond. Future work includes:
- Scheduling the workflow (e.g., a cron job or Lambda function to auto-generate and send statements on the first of each month)
- Adding captain contact automation (pulling sails from the JADA Internal calendar, reading crew availability, and sending call-to-crew emails with proper CC rules)
- Expanding the roster management system to store captain details (phone, email) and validate contact information before dispatch
- Implementing error logging and alerting so failures notify the team instead of silently failing
The foundation is solid: OAuth2 is working, the Sheets and Gmail APIs are authenticated, DynamoDB queries are reliable, and the XLSX generation logic is proven. Scaling to automation is now a matter of orchestration.
```