Preventing S3 Deployment Regressions: A Case Study in State Management and Safe Deploy Workflows

```html

Over the last three hours, a deployment incident on queenofsandiego.com revealed a critical gap in our local-to-S3 sync workflow: stale local files silently overwrote newer production state, reverting three features (hero image crossfade, Stripe embedded checkout, and a deliberately-removed content line). This post documents the incident, the root causes, and the hard rules we've now encoded to prevent recurrence.

What Happened

A single cp command deployed a local index.html to both staging and production S3 buckets simultaneously, wiping working features that existed only in the live S3 version. The local file was at least 2–3 commits behind the current prod state (based on git log inspection), yet the deployment succeeded with no safety checks or diff review.

Lost feature 1: Hero section JADA→BOOK NOW crossfade animation (CSS transitions working in prod, missing in local)
Lost feature 2: Stripe embedded checkout flow (live in prod S3, absent from local src)
Lost feature 3: Removal of "For Ranch & Coast readers..." hero line (intentionally deleted in prod, resurrected by stale local copy)
Violation: Single command deployed to both staging/ and prod/ prefixes in S3, bypassing the staging-first validation rule already documented but not enforced

Root Cause Analysis

The failure chain:

No pre-deploy S3 pull: The local working directory was never compared against current S3 state before overwriting
No diff review: A six-line proof block (file hash, byte count, feature token scan) was not printed and reviewed in chat before cp executed
No staging gate: Despite existing documentation, the deploy targeted both staging and prod in one command, eliminating the human review checkpoint
Ignored prior warnings: The session summary from the prior commit explicitly warned that local files were stale; this warning was not re-checked
No S3 versioning snapshot: Production files were overwritten without a timestamped backup, making recovery depend on git + manual re-deploy

Technical Details of the Fix

We've implemented eight hard rules (D1–D8) into the queenofsandiego.com CLAUDE.md project memory file, loaded automatically on every session:

D1: Pull and Diff Before Edit

aws s3 sync s3://queenofsandiego-prod/prod/ ./s3-prod-snapshot/ --region us-west-2 --dry-run
diff -r ./s3-prod-snapshot/ ./src/ | tee s3-diff.log

Every session begins by pulling the current S3 prod state into a snapshot directory. Any diff is logged and reviewed before local edits begin.

D2: Staging-Only Single-Target Deploys

All cp and aws s3 sync commands must name exactly one target: either s3://queenofsandiego-prod/staging/ or s3://queenofsandiego-prod/prod/, never both. Staging is always first; prod is only promoted after CB reviews staging in a browser.

D3: One Logical Change Per Deploy

Files are deployed individually or as a tightly-scoped group (e.g., "index.html + hero.css for JADA crossfade"). Unrelated file changes are staged separately. This makes rollback granular and enables easy git-bisect if a regression is discovered post-deploy.

D4: Obey Your Own Prior Session-Summary Warnings

Before any S3 write, re-read the prior session's summary comments. If it flags "local files are stale" or "S3 is ahead of git," escalate to D8 (ask CB) rather than proceeding.

D5: Snapshot Production Before Overwrite

When deploying to prod, generate a dated backup in a backups/ S3 prefix:

aws s3 cp s3://queenofsandiego-prod/prod/index.html \
  s3://queenofsandiego-prod/backups/index.html.$(date +%Y%m%d-%H%M%S) \
  --region us-west-2

This gives us a recovery path even without S3 versioning (which we don't currently have enabled).

D6: Six-Line Proof Block Before Any cp

Before executing cp or sync, print and wait for acknowledgment:

md5sum ./src/index.html
wc -c ./src/index.html
grep -c "BOOK NOW" ./src/index.html
grep "Stripe\|embedded\|checkout" ./src/index.html
grep "Ranch & Coast" ./src/index.html || echo "NOT FOUND (correct)"
echo "TARGET: s3://queenofsandiego-prod/staging/index.html"

This is a manual human gate. The operator must see the token count and feature presence before proceeding.

D7: Feature-Token Registry

We maintain a FEATURE_TOKENS.md in the repo listing every deployed feature by grep-searchable tokens (e.g., "hero-jada-fade", "stripe-embedded-checkout", "no-ranch-coast-line"). Before pushing to staging or prod, we scan the local file and prod S3 against these tokens to ensure no accidental removal.

D8: Escalate to CB if S3 Is Ahead of Local

If diff -r shows S3 prod contains features not in local git, do not overwrite. Instead, message CB with the exact diff and ask:

Should we pull S3 changes into git first?
Is the local version intentionally stripped-down?
Should we cherry-pick the S3 features into local before deploying?

Infrastructure & Bucket Configuration

S3 Bucket: queenofsandiego-prod (us-west-2)
CloudFront Distribution: E1ABC2DEF3GHI (points to queenofsandiego-prod.s3.us-west-2.amazonaws.com)
Prefixes:
- prod/ — live website
- staging/ — staging environment (same CF dist, different origin prefix)
- backups/ — timestamped snapshots
Route53 Zone: queenofsandiego.com (hosted zone ID Z1ABC2XYZ)
CNAME: www