Building a Dynamic Charter Document Pipeline: From Calendar Events to Live S3 Manifests
Over the past development session, I implemented an end-to-end document generation and publishing pipeline for JADA Operations' weekend charter events. The system extracts charter data from a Google Calendar API, generates structured HTML manifests and trip sheets, publishes them to S3, and invalidates CloudFront caches to ensure live content consistency across the crew management portal.
What Was Done
The primary objective was to automate the creation and publication of charter documents—specifically passenger manifests and trip sheets—that needed to be available on the queenofsandiego.com crew portal within minutes of event confirmation. Previously, this was a manual process prone to delays and formatting inconsistencies.
- Fetched weekend charter events from the JADA Google Calendar using OAuth tokens
- Parsed calendar event data to extract passenger names, contact information, and charter details
- Generated formatted HTML manifests and trip sheets with consistent styling
- Published generated documents to multiple S3 locations for redundancy
- Invalidated CloudFront distribution caches to ensure immediate live availability
- Integrated document provisioning into Lambda functions for event-driven processing
Technical Architecture
Calendar Integration
The system begins by querying the JADA Google Calendar API for events within a specific date range (in this case, the upcoming weekend). The implementation uses OAuth token refresh logic to maintain authentication across session boundaries:
import requests
from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentials
# Calendar API queries retrieve all events with structured metadata
# Event objects contain booking details, passenger lists, and charter specifics
# Token refresh handled automatically by the Google auth library
The calendar serves as the source of truth for charter information. Each event contains structured custom fields with passenger names, contact phone numbers, and charter-specific metadata. This avoids maintaining a separate database and keeps operations teams working in their existing tools.
Document Generation
Charter manifests are generated as static HTML files with embedded styling. The manifest template (/Users/cb/Documents/repos/jada-ops/quinn-male/quinn-male-manifest.html) includes:
- Vessel information and charter date/time
- Captain and crew assignments
- Complete passenger manifest with names and phone numbers
- Trip sheet with location coordinates and timing information
- CSS styling for print-friendly formatting
The generation process is straightforward—extract calendar event data, populate HTML templates with passenger information, and write completed documents to local filesystem before publishing:
# charter_provisioner.py handles manifest generation
# Pattern: calendar_event → extract_fields() → render_template() → write_html()
# Example structure from generated manifests:
# - Passenger names extracted from calendar event description
# - Phone numbers parsed from custom event fields
# - Vessel details populated from charter metadata
# - CSS includes print styles for crew operations
S3 Publishing Strategy
Documents are published to two distinct S3 locations to support different access patterns:
- Primary Location:
s3://shipcaptaincrew/docs/crew-page/{event_id}/— Documents linked from the crew management portal SPA - Secondary Location:
s3://shipcaptaincrew/snapshots/print/{charter_name}/— Backup copies for durability and historical reference
Both locations use content-type headers set to text/html to ensure browsers render documents directly rather than prompting downloads. The S3 bucket policy allows public read access, enabling direct URL sharing with crew members and passengers.
Publishing is handled through the AWS SDK with explicit credential refresh to avoid session timeout issues:
# send_charter_emails.py demonstrates the publishing pattern
import boto3
s3_client = boto3.client('s3', region_name='us-west-2')
# Publish manifest to crew-page docs
s3_client.put_object(
Bucket='shipcaptaincrew',
Key=f'docs/crew-page/{event_id}/manifest.html',
Body=manifest_html,
ContentType='text/html'
)
# Publish to backup location
s3_client.put_object(
Bucket='shipcaptaincrew',
Key=f'snapshots/print/{charter_name}/manifest.html',
Body=manifest_html,
ContentType='text/html'
)
CloudFront Cache Invalidation
The critical piece that enables live document delivery is CloudFront cache invalidation. The queenofsandiego.com domain is backed by CloudFront distribution E2Y7EXAMPLE (exact ID redacted), which caches all S3 content with a 24-hour default TTL.
Without explicit invalidation, updated manifests would take up to 24 hours to appear live. The solution is to trigger CloudFront invalidation immediately after S3 upload:
cloudfront = boto3.client('cloudfront', region_name='us-east-1')
invalidation_response = cloudfront.create_invalidation(
DistributionId='DISTRIBUTION_ID',
InvalidationBatch={
'Paths': {
'Quantity': 2,
'Items': [
f'/docs/crew-page/{event_id}/manifest.html',
f'/docs/crew-page/{event_id}/trip-sheet.html'
]
},
'CallerReference': str(time.time())
}
)
The invalidation uses specific path patterns rather than wildcard invalidations (which carry quota penalties). CallerReference is set to the current timestamp to ensure uniqueness and proper AWS tracking.
Lambda Integration
The document pipeline is integrated into /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py, which serves as the event handler for crew page document requests. When users navigate to a crew event page, the Lambda function:
- Queries the calendar API for the specific event
- Checks if cached documents exist in S3
- Generates manifests on-demand if missing
- Returns document URLs to the frontend SPA for display
This lazy-generation approach reduces redundant document creation while ensuring documents are always fresh when explicitly requested by crew members.
Key Architectural Decisions
Why Google Calendar as the source of truth? The operations team already maintains charter schedules in Google Calendar with custom fields for passenger details. Using the calendar API eliminates the need for custom database infrastructure and keeps the system aligned with existing workflows.
Why dual S3 locations? The crew-page docs location is directly referenced by the web portal frontend, while the snapshots location provides historical archives and backup copies. This redundancy protects against accidental deletion and supports compliance/record-keeping requirements.
Why explicit CloudFront invalidation? Cache invalidation is more reliable than relying on TTL expiration for time-sensitive operational documents. Crew members need current manifests immediately, not eventual consistency hours later.
Why static HTML over dynamic rendering? Pre-generated manifests load instantly and are trivial to cache. Dynamic rendering would add latency and complexity during high-traffic periods (multiple crews accessing documents simultaneously before charter departure).
Implementation Details
The complete