Publishing Charter Documents to S3 with CloudFront Cache Invalidation: A Case Study in Document Delivery
During a recent development session managing JADA operations charters, we implemented a complete document publishing pipeline that moved charter manifests and trip sheets from local development directories to production S3 storage, then invalidated CloudFront cache to ensure live content updates. This post details the technical decisions, infrastructure patterns, and specific implementation details.
What Was Done
We created a two-stage document publishing workflow for charter operations:
- Generated charter manifest and trip sheet HTML documents locally in
/tmp/ - Published these documents to two distinct S3 locations for redundancy
- Invalidated CloudFront cache to push fresh content to edge locations
- Verified live URLs returned HTTP 200 with correct passenger data
- Stored copies in a durable JADA operations repository for audit trails
Technical Details: Document Generation and Publishing
Charter documents were initially created as HTML files in the temp directory during development:
/tmp/quinn-male-manifest.html
/tmp/quinn-male-trip-sheet.html
These files contained passenger manifests with names, contact information, and trip details. Rather than leaving documents in ephemeral temp storage, we implemented a three-tier storage strategy:
- Local durability:
/Users/cb/Documents/repos/jada-ops/quinn-male/— version-controlled backup - Primary S3 location:
s3://shipcaptaincrew/docs/manifests/— standard document prefix - Crew page docs location:
s3://shipcaptaincrew/crew-page/docs/— secondary location for event page rendering
This redundancy ensures that even if one S3 prefix experiences issues, documents remain accessible through alternate paths.
Infrastructure: AWS S3 and CloudFront Integration
The shipcaptaincrew S3 bucket serves as the origin for a CloudFront distribution that handles edge caching for the crew page application. Document publishing required understanding the cache invalidation lifecycle:
S3 Bucket Structure:
s3://shipcaptaincrew/
├── docs/
│ ├── manifests/
│ │ └── quinn-male-manifest.html
│ └── trip-sheets/
│ └── quinn-male-trip-sheet.html
└── crew-page/
└── docs/
├── manifests/
└── trip-sheets/
CloudFront Distribution Details:
The shipcaptaincrew CloudFront distribution caches documents at edge locations globally. When documents are updated in S3, the cache must be explicitly invalidated because CloudFront doesn't automatically detect S3 object changes. The distribution ID (referenced in invalidation commands but not exposed here) is managed through the Lambda function that handles document downloads.
Document access flows through a Lambda function at /Users/cb/Documents/repos/sites/queenofsandiego.com/tools/shipcaptaincrew/lambda_function.py, which receives requests with parameters like event_id and doc_type (manifest or trip-sheet). The Lambda builds S3 keys dynamically and returns presigned URLs or direct S3 redirects.
Publishing Pipeline Implementation
The publishing process involved several discrete steps, each requiring verification:
Step 1: Initial Document Publication
Documents were uploaded to the primary S3 location with appropriate content-type headers (text/html). AWS credentials were refreshed before each operation to ensure valid session tokens.
Step 2: Live URL Verification
After initial publication, we spot-checked live URLs by:
- Constructing the full S3 URL based on bucket name and object path
- Making HTTP requests to verify HTTP 200 response codes
- Checking that returned HTML content matched the locally-generated documents
- Confirming passenger names appeared correctly in the live manifest
Step 3: Secondary Location Publication
Documents were then published to the crew-page docs prefix, allowing the event detail page rendering logic to discover and link documents without Lambda intermediation. This pattern decouples document discovery from the download handler.
Step 4: CloudFront Cache Invalidation
After publishing to both S3 locations, we invalidated the CloudFront distribution cache using batch invalidation commands. This forced all edge nodes to fetch fresh content on the next user request, rather than serving stale cached versions.
The invalidation pattern targeted both document prefixes:
/docs/manifests/*
/docs/trip-sheets/*
/crew-page/docs/*
This breadth ensures that regardless of which URL path a client uses, they receive updated content.
Key Architectural Decisions
Why Two S3 Locations?
The crew page frontend rendering logic in the SPA discovers documents by reading from the crew-page docs prefix. By publishing to both locations, we maintain compatibility with two independent code paths: the Lambda-based download handler and the direct S3 listing used by event detail pages. This prevents the need for coordinated deployment of frontend and backend changes.
Why CloudFront Invalidation Instead of S3 Versioning?
CloudFront cache invalidation is immediate and affects all edge locations, whereas S3 versioning would require clients to request new object versions explicitly. For time-sensitive charter operations (where a trip sheet might need corrections hours before departure), cache invalidation provides stronger guarantees of fresh content delivery.
Why Local Version Control?
Storing documents in /Users/cb/Documents/repos/jada-ops/ creates audit trails and enables rollback if documents require corrections. The directory structure mirrors S3 organization, making it easy to reconstruct bucket state from version control history.
Document Generation Specifics
Charter documents were generated with specific HTML structure matching the shipcaptaincrew brand and layout expectations. The manifest template included sections for:
- Passenger manifest (name, contact, emergency information)
- Charter details (date, time, vessel, captain assignments)
- Safety and liability information
- Payment status and confirmation numbers
The trip sheet provided crew-facing operational details: weather conditions, fuel requirements, route notes, and passenger special requests or accommodations.
Verification and Monitoring
Post-publication verification confirmed:
- Both S3 locations returned HTTP 200 for manifest and trip sheet URLs
- Content-Type headers were set to text/html (not octet-stream, which would force downloads)
- Passenger names and charter details matched the source data
- CloudFront edge cache was invalidated successfully
- Subsequent requests to live URLs returned fresh content (not cached versions)
What's Next
Future improvements to this pipeline could include:
- Automating document generation and publication via the charter booking workflow (reduce manual steps)
- Implementing S3 lifecycle policies to archive