Automating Monthly Revenue Reporting: Building a Google Sheets Pipeline for Charter Operations

```html

Charter operations generate revenue across multiple booking channels and crew configurations. Reconciling these sources into a single, auditable monthly statement requires pulling data from Gmail booking confirmations, DynamoDB charter records, and manual trip sheets—then synthesizing them into a formatted Excel workbook that matches accounting expectations. This post covers the infrastructure and automation patterns we built to eliminate manual data entry for Sheraton monthly reports.

The Problem: Multi-Source Revenue Reconciliation

Prior to this work, monthly statements were assembled manually:

Gmail searches for booking confirmations (from Jennifer Sanderson, the booking contact)
Manual lookups in the crew-dispatch DynamoDB table for charter details
Cross-reference with captain trip sheets (stored as Google Sheets) for revenue figures
Reconstruction of the prior month's statement format to match
Manual Excel workbook creation with proper formula structure and formatting

This process took 2–3 hours and introduced transcription errors. The goal was to automate the data-gathering phase while preserving the formatted output that accounting and the charter company required.

Architecture: Multi-Service Data Pipeline

We designed a four-stage pipeline running on an EC2 instance with orchestration scripts in Python:

Stage 1: Gmail API Integration for Booking Discovery

The booking contact (Jennifer Sanderson) receives all charter confirmations. We query Gmail using the Gmail API to extract booking records:

gmail_contacts.py  # Search Gmail for bookings from Jennifer Sanderson
gmail_diag.py      # Diagnose token refresh and API connectivity
gmail_jennifer.py  # Query Jennifer's email for booking summaries

Why Gmail API over email forwarding: Email forwarding to a structured inbox is fragile; a single misconfigured rule loses data. The Gmail API lets us query the source of truth (Jennifer's mailbox) on demand, with full search syntax (date ranges, sender filters, subject keywords). Queries are idempotent—run the same search twice and get the same results.

Token management: Google OAuth tokens expire after 1 hour. We patched reauth_google.py on the EC2 instance to handle refresh-token rotation transparently, so long-running jobs don't stall mid-execution. The patched script validates the token before each API call and refreshes silently if needed.

Stage 2: DynamoDB Record Retrieval

Charter metadata lives in the crew-dispatch DynamoDB table. Once we identify booking dates from Gmail, we query DynamoDB for the full charter record:

Table: crew-dispatch (us-east-1 and us-west-2 replicas)
Schema: Charter records with guest names, boat assignments, crew rosters, and total amounts
Access pattern: Query by date range (e.g., all charters in May 2024)

build_sheet.py  # Fetch charter records from DynamoDB and structure them
sheet_inspect.py  # Validate schema and inspect record structure
sheet_diag.py   # Diagnose DynamoDB connectivity

Why DynamoDB over a relational DB: The charter operations data was already stored in DynamoDB for high-write throughput (crew availability updates, chat logs, dispatch updates). Introducing a separate SQL database would require a secondary ETL pipeline. Querying the existing DynamoDB tables is simpler and doesn't duplicate data.

Stage 3: Captain Trip Sheet Enrichment

Captains maintain individual Google Sheets with detailed trip records (guest names, revenue figures, fuel costs, tips). These sheets live in the JADA Business folder on Google Drive. For each charter, we cross-reference the crew captain's sheet to extract verified revenue:

build_may_tab.py  # Read captain trip sheets and extract revenue figures

Reportable revenue rule: A charter's revenue is "reportable" only if the captain's trip sheet includes a dollar figure in the designated revenue cell. This rule ensures we don't report estimated or pending revenue. Inspecting prior April reports showed us the exact cell placement and formula structure (e.g., row totals, footnote placement).

Stage 4: Workbook Generation and Upload

We generate a multi-tab Excel workbook using openpyxl (since the environment may not have xlsxwriter):

build_sheet.py  # Generate .xlsx with Template tab and month-specific tabs
upload.py       # Upload completed workbook to Google Drive

The workbook structure matches accounting expectations:

Template tab: Read-only reference showing column headers and formula patterns
Month tab (e.g., May): Populated with charter records, guest names, revenue figures, and row totals
Number formatting: Currency columns formatted as USD, dates as MM/DD/YYYY
Footnotes: Preserved from prior months (e.g., "Revenue per captain trip sheet")

Infrastructure and Deployment

EC2 host: ubuntu@34.239.233.28 (us-east-1 region, t3.medium instance)

Script locations:

/home/ubuntu/repos/queenofsandiego/ — Main site and JADA ops scripts
/home/ubuntu/repos/.secrets/ — Google service account credentials (locked to 0700 permissions)
/tmp/ — Transient working files (scripts, intermediate .xlsx files)

Credentials storage: Google OAuth credentials (service account JSON and refresh tokens) are stored in /home/ubuntu/repos/.secrets/ with restrictive filesystem permissions. The SSH key used for EC2 access is stored locally as jada-key.pem in the user's SSH config, not committed to version control.

Data flow:

Local machine: SSH into EC2, run build_sheet.py to pull Gmail, DynamoDB, and Google Sheets data
EC2: Script writes intermediate .xlsx to /tmp/, uploads to Google Drive via Google Sheets API
Local machine: scp the .xlsx from EC2 to Mac for inspection and final delivery

Key Decisions and Trade-Offs

1. Why not fully automate delivery? The final workbook is still reviewed by hand before sending to the charter company. Automating the last-mile upload to the accounting system would require additional integration and approval workflow. Instead, we automate the data-gathering phase (80% of the work) and leave the final review and send step to a human.

2. Why pull captain trip sheets at report time instead of pre-populating? Captains update their trip sheets throughout the month, adding revenue figures and corrections. Querying at report time ensures we have the latest figures. Pre-computing a monthly snapshot early in the month would miss late updates.

3. Why use openpyxl instead of Google Sheets API directly? Google Sheets API is powerful but verbose for complex formatting (merged cells, currency formatting, multiple worksh