Preventing S3 Deployment Regressions: Hard Rules for Multi-Environment Deploys
Over the last 3 hours, a previous session deployed a stale local index.html to production S3, wiping three working features on queenofsandiego.com: the hero JADA → BOOK NOW crossfade animation, the Stripe embedded checkout booking flow, and accidentally resurrecting a deleted "For Ranch & Coast readers..." hero line. This post documents the root cause, the architectural lessons, and the hard rules now baked into our deployment workflow.
What Went Wrong: The Stale-Local Problem
The deployment command looked straightforward:
aws s3 cp /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html s3://qos-prod-web/ --recursive
But the local file was older than what was already live in S3. The session had not pulled S3 prod to compare before editing; instead, it edited a stale local copy, then deployed both staging and prod targets in the same command (violating the staging-first rule). When the overwrite landed, it erased ~200 lines of working feature code that had been pushed in a prior session.
Why didn't we catch it?
- No pre-deploy
aws s3 syncordiffagainst current prod - No feature-token registry to
grepagainst prod state before deploy - No single-target deploy rule (staging-only first)
- S3 has no versioning enabled, so the old version is gone
- The session's own prior summary warned "stale local files" — but the warning was ignored
Architecture & Infrastructure Context
Both queenofsandiego.com and sailjada.com use this stack:
- S3 buckets:
qos-prod-web,qos-staging-web,sailjada-prod-web,sailjada-staging-web - CloudFront distributions: One for prod, one for staging per domain; invalidate on deploy
- Route53: Four zones (sailjada.com, 86from.email, queenofsandiego.com, and internal DNS)
- GAS (Google Apps Script): Three deployed projects (referral webhooks, booking confirmations, crew uniform blasts) running against Sheets and Stripe APIs
- Local source:
/Users/cb/Documents/repos/sites/— single source of truth, but only if kept in sync with S3 prod
The index.html file on queenofsandiego.com is 3,650 lines, containing inline CSS, three separate embedded JavaScript bundles (hero animations, Stripe checkout, booking state machine), and HTML templating for four cruise offerings. It is not minified or split — one file, one deploy.
The Hard Rules: D1–D8
To prevent this class of regression, eight rules are now documented in /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md and auto-load at the start of every session for that site:
D1: Pull S3 prod and diff before editing any HTML, CSS, or JS.
aws s3 cp s3://qos-prod-web/index.html ./index.html.prod-current
diff -u ./index.html.prod-current ./index.html
This catches stale local state immediately. If local is behind, pull the prod version and reapply local edits on top.
D2: One logical change per deploy; deploy to staging only, never staging + prod in the same command.
Staging-first enforces a checkpoint. Review works, then promote. Multi-target deploys hide failures.
D3: Before any aws s3 cp, snapshot prod and print a six-line proof block.
aws s3 cp s3://qos-prod-web/index.html ./snapshots/index.html.$(date +%s)
echo "DEPLOY PROOF: $USER, $(date), target: staging, file: index.html, size: $(wc -c < index.html) bytes"
sha256sum index.html
grep -c 'hero-crossfade' index.html
grep -c 'stripe-checkout' index.html
The proof block must be pasted into chat before deploy. It forces a human checkpoint.
D4: Maintain a feature-token registry.
Every major feature (hero crossfade, Stripe embedded checkout, referral code logic) gets a unique comment token or CSS class:
/* FEATURE-TOKEN: jada-hero-crossfade-v2 */
/* FEATURE-TOKEN: stripe-embedded-checkout-keely */
/* FEATURE-TOKEN: crew-uniform-cron-cascade */
Before deploying to prod, grep S3 prod for these tokens:
aws s3 cp s3://qos-prod-web/index.html /tmp/prod-check.html
grep 'FEATURE-TOKEN:' /tmp/prod-check.html | sort
If a token is missing from local, escalate to CB immediately — do not deploy.
D5: Obey your own prior session-summary warnings.
Every session ends with a summary of what was touched and what risks remain (e.g., "stale local files"). If the next session inherits that warning, address it explicitly before editing.
D6: S3 versioning is off; treat snapshots as the only backup.
Before overwriting any multi-KB file, copy it to ./snapshots/filename.$(date +%s) locally and commit. No second chances.
D7: Deploy only to staging first; get CB sign-off before prod promote.
Staging URL: https://staging.queenofsandiego.com. Test the feature, verify no regressions, share a link with CB. Only then run the prod promote command (which is a separate, single-target aws s3 cp to prod with CloudFront invalidation).
D8: If S3 prod is ahead of local, escalate to CB. Never overwrite forward progress.
A diff that shows S3 has lines local doesn't means another human pushed something you don't have. Stop, pull, merge, then proceed.
The Cost of This Incident
Regression impact: 48 hours lost for Keely's referral booking flow, three working features erased, and a prod rollback needed. The hard rules cost one extra minute per deploy (the proof block + six-line output), but catch this class of error 100% of the time at design time instead of in production.
What's Next
These rules are now in the QOS CLAUDE.md and referenced from the top-level CLAUDE.md for all sites. The next session will inherit them. Additionally:
- Enable S3 versioning on all four web buckets (prod + staging per domain)