Preventing S3 Deployment Regressions: Hard Rules for Multi-Environment Static Sites
Last week, a stale local index.html was deployed to production S3, wiping three completed features from queenofsandiego.com: the hero JADA → BOOK NOW crossfade animation, the Stripe embedded checkout booking flow, and inadvertently resurrecting a deleted "For Ranch & Coast readers..." hero line. The root cause wasn't a code bug—it was a process gap. This post documents the hard rules we've now baked into every session for this project, and why each one matters.
What Went Wrong
The deployment pipeline for static sites like queenofsandiego.com is deceptively simple: edit local index.html, run a sync command to S3, CloudFront invalidates the cache, users see the new version. But "simple" invites shortcuts.
- No pre-flight diff: The deploy happened without pulling the current S3 state and comparing it to local. The local copy was three weeks old.
- Dual-target in one command: Both
staging.queenofsandiego.comandqueenofsandiego.comwere synced in a single AWS CLI call, violating the staging-first validation rule. - Ignored prior warnings: The session summary from two hours earlier had explicitly flagged "S3 may be ahead of local — always pull first." This was read but not acted on.
- No snapshot before overwrite: S3 versioning is not enabled for this bucket (cost vs. risk tradeoff), so the three features are gone unless manually restored from git history.
The Eight Hard Rules (D1–D8)
We've encoded these directly into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, which auto-loads at the start of every session:
D1: Always Pull S3 Before Editing
aws s3 sync s3://queenofsandiego.com ./local/qos-prod/ --exclude "*" --include "index.html" --dryrun
aws s3 sync s3://queenofsandiego-staging ./local/qos-staging/ --exclude "*" --include "index.html" --dryrun
Never assume local is current. Print the diff to stdout. If S3 has changes local doesn't, escalate to CB before proceeding.
D2: Staging Only, Single Target
Deploy to queenofsandiego-staging first, always alone:
aws s3 cp ./index.html s3://queenofsandiego-staging/ --cache-control "max-age=0"
Wait 30 seconds. Verify on staging.queenofsandiego.com. Only then promote to prod.
D3: One File Per Logical Change
If the PR touches the hero, the footer, and the booking form, deploy them as three separate uploads to staging, validate each one independently, then batch-promote to prod. This makes root-cause analysis trivial if something breaks.
D4: Obey Your Own Prior Session Warnings
Every session summary includes a "Risks & Warnings" section. If it says "S3 ahead of local" or "stale file detected," treat it as a blocker. Escalate rather than overrule yourself.
D5: Snapshot Prod Before Overwriting (No Versioning Strategy)
Since versioning is disabled, take a manual backup before any prod sync:
aws s3 cp s3://queenofsandiego.com/index.html ./backups/index.html.$(date +%s).backup
Store this in git so the team can audit what changed and when.
D6: Print a Six-Line Proof Block Before Any cp
Before executing an S3 upload, print this in chat:
--- DEPLOYMENT PROOF BLOCK ---
Source file: ./index.html (md5: ...)
Target bucket: s3://queenofsandiego-staging/
CloudFront dist: EXXXXXXXX (queenofsandiego.com)
Features validated: [ JADA fade, Stripe checkout, footer links ]
Rollback: ./backups/index.html.TIMESTAMP
Command: aws s3 cp ./index.html s3://queenofsandiego-staging/ --cache-control "max-age=0"
--- END PROOF BLOCK ---
This forces a pause for human review and makes the entire chain auditable in the transcript.
D7: Feature Token Registry
Maintain a grep-able list of key HTML markers in S3 prod:
Before declaring a deploy successful, grep the live S3 file for each token. Missing any token = rollback.
D8: Escalate to CB if S3 Is Ahead of Local
If the pre-flight diff shows S3 has code that local doesn't, stop. This is not a situation to resolve alone. Message CB with the diff and wait for direction.
Infrastructure Details
- S3 buckets:
queenofsandiego.com— production static site rootqueenofsandiego-staging— staging clone, same schema
- CloudFront distribution:
EXXXXXXXX(omitting actual ID for security) — cache behavior set to 0 TTL onindex.html, 86400s on assets. Invalidation happens post-deploy. - Route53 zone:
queenofsandiego.com— CNAME from staging subdomain to CloudFront, prod domain points to main distribution. - GAS webhooks: Stripe
payment_intent.succeededevents redirect to a GAS endpoint that verifies the session, updates the SheetsBookingstab, and emails confirmation.
Key Architectural Decisions
Why staging sync first, not just testing locally? Browser dev tools don't capture CloudFront cache behavior, compression, or CORS headers in the same way the live CDN does. Staging on the real infrastructure catches issues local testing misses.
Why separate feature tokens instead of hash-based validation? If the entire file changes (e.g., a minifier re-orders code), a hash diff will fail even though all features are present. Semantic tokens (the actual HTML comments and element IDs) are resilient to reformatting.
Why manual snapshots instead of S3 versioning? Versioning adds $0.023/GB/month per copy. For a 60KB file, it's negligible, but the policy decision was to keep the bucket "clean" and version control via git instead. Acceptable risk given the low deploy frequency (~2 per week) and CB's ability to recover from git.