Publishing Charter Documents to S3 and Invalidating CloudFront Cache: A Multi-Step Deployment Pipeline
During this development session, I implemented a complete document publishing and cache invalidation workflow for the JADA operations charter management system. This involved creating manifest and trip sheet documents for a charter booking, publishing them to multiple S3 locations, and strategically invalidating CloudFront cache to ensure fresh content delivery across different parts of the application. Here's how the architecture came together.
What Was Done
- Generated structured HTML manifest and trip sheet documents for the Quinn Male charter
- Published documents to two distinct S3 prefixes:
shipcaptaincrew/docs/print/andshipcaptaincrew/docs/crew-page/ - Invalidated CloudFront cache distributions to force fresh content delivery
- Verified live URLs returned correct HTTP 200 responses with matching content
- Ensured passenger names and charter details were consistently rendered across all published locations
Technical Details: Document Generation and Publishing
The charter documents were created as standalone HTML files with embedded styling, making them self-contained and portable across different deployment targets:
/Users/cb/Documents/repos/jada-ops/quinn-male/quinn-male-manifest.html
/Users/cb/Documents/repos/jada-ops/quinn-male/quinn-male-trip-sheet.html
These files were initially staged in /tmp/ for validation before being moved to the persistent repository location. The documents include passenger roster information, trip details, and crew assignments—critical data that needed to be consistently published across multiple locations.
The key decision to maintain dual S3 locations (print/ and crew-page/) served different consumption patterns. The print/ prefix serves documents that are primarily accessed as downloadable, standalone files. The crew-page/ prefix serves the same documents but with the understanding that they're being embedded or referenced within the crew management frontend application.
Infrastructure: S3 and CloudFront Integration
The shipcaptaincrew S3 bucket structure follows a clear organizational pattern:
s3://shipcaptaincrew/docs/print/ # Printable manifests and trip sheets
s3://shipcaptaincrew/docs/crew-page/ # Frontend-referenced documents
s3://shipcaptaincrew/snapshots/ # Archive and reference materials
Documents were published with appropriate HTTP headers. The Content-Type header was explicitly set to text/html to ensure browsers render the documents rather than attempting to download them:
aws s3 cp quinn-male-manifest.html s3://shipcaptaincrew/docs/print/quinn-male-manifest.html \
--content-type text/html --region us-west-2
The CloudFront distribution protecting this S3 bucket required cache invalidation after publishing to ensure viewers received the updated manifest immediately rather than stale cached versions. CloudFront distributions cache objects based on their URL path and query parameters, and the default TTL (time-to-live) could serve outdated content for up to 24 hours without explicit invalidation.
Cache Invalidation Strategy
Rather than invalidating the entire distribution (which is expensive and impacts all cached content), we used path-specific invalidations:
/docs/print/quinn-male-manifest.html
/docs/print/quinn-male-trip-sheet.html
/docs/crew-page/quinn-male-manifest.html
/docs/crew-page/quinn-male-trip-sheet.html
This surgical approach minimized the blast radius—only the specific objects we modified were purged from edge caches, while other cached assets (images, CSS, static JS) remained untouched. For CloudFront, this is significantly more efficient than wildcard invalidations like /docs/print/*, which creates individual invalidations for every matching object in your distribution.
The verification workflow was critical: after publishing and invalidating cache, we confirmed that live URLs returned HTTP 200 and that the content matched the source files. This prevented the silent failure mode where documents publish successfully but CloudFront serves old cached versions due to invalidation lag.
Document Routing and Access Patterns
The application's Lambda functions handle document access through multiple pathways. The handle_get_doc function routes document requests based on event context:
GET /api/events/{event_id}/docs/{doc_type}
GET /api/crew-page/docs/{event_id}/{doc_type}
These handlers construct S3 URLs dynamically based on the event ID and document type, then either redirect to CloudFront URLs or serve the content directly. By maintaining documents in both print/ and crew-page/ prefixes, we enable different handler logic paths while keeping source-of-truth files identical.
The crew page frontend SPA ingests document URLs from the event details endpoint, which lists all associated documents for a given charter. When the manifest renders with passenger names, those names come from the HTML document itself rather than a separate API call, reducing complexity and ensuring the displayed manifest exactly matches the downloadable version.
Key Decisions and Rationale
Dual S3 Publishing Locations: Rather than creating symbolic links or aliases, publishing the same document to two locations provides independence and clarity. If future requirements demand different transformations or access patterns for print vs. crew-page documents, the infrastructure is already separated.
HTML Documents Over JSON APIs: Generating complete HTML documents rather than storing data in databases and rendering on-demand provides several advantages: documents are self-contained and portable, they render identically across all contexts, they're trivial to download and print, and they require no database queries at serving time.
Explicit Cache Invalidation Over TTL Waiting: Setting aggressive TTLs (short cache windows) would reduce CloudFront efficiency. Explicit invalidation means content is fresh within seconds of publishing, while maintaining high cache hit rates during normal operation. This is appropriate for documents that represent point-in-time charter details.
Content-Type Headers: Explicitly setting text/html headers prevents CloudFront from guessing MIME types based on file extensions. This ensures browsers render HTML documents and prevents accidental downloads when S3 keys don't have standard extensions.
Verification and Quality Assurance
The deployment pipeline included multiple verification checkpoints:
- Local file validation: ensuring HTML documents contained expected passenger names and charter details
- S3 upload confirmation: verifying files existed at intended paths
- HTTP response codes: confirming CloudFront returned 200 (not 404 or 403)
- Content matching: spot-checking that live URLs contained identical content to source files
- Consistency across locations: verifying the same manifest rendered identically from both print/ and crew-page/ prefixes
What's Next
Future enhancements could include automated manifest generation triggered by charter booking updates, implementing document versioning with timestamped S3 keys, and adding digital signatures to manifests for compliance purposes. The current infrastructure provides a solid foundation for these extensions while maintaining the simplicity and reliability of direct HTML publication.
The pattern established here—generate content locally, validate, publish to S3 with appropriate headers, invalidate CloudFront strategically, and verify live URLs—becomes a reusable template for other document types in the JADA operations system.
```