```html

Preventing CloudFront Stale-File Regressions: How a Missed S3 Diff Cost Three Features

Last session, a deployment to production wiped three working features on queenofsandiego.com by uploading a stale local copy of index.html over a newer version already live in S3. The hero JADA → BOOK NOW crossfade, the Stripe embedded checkout flow, and careful deletion of a "Ranch & Coast readers" line all vanished in one cp command. This post documents what went wrong, how we fixed it, and the hard rules we built to prevent it happening again.

The Failure: Stale Local File Beats Production

The core issue was straightforward but catastrophic: a developer (Sonnet 4.6) edited /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html locally without first pulling the current production version from S3 to compare. When they ran:

aws s3 cp index.html s3://queenofsandiego-prod/index.html

they uploaded a file that was hours out of date. Production S3 had a newer version with the hero fade, Stripe checkout, and the removed Ranch & Coast line. Their local copy did not. The S3 version lost.

Why did this happen?

  • No pre-deploy diff: They edited index.html without pulling S3's current version first and comparing it locally.
  • Staging + Prod in one command: They deployed to both queenofsandiego-staging and queenofsandiego-prod in the same script, skipping the "always test staging first" rule.
  • No S3 versioning: S3 has no rollback enabled on either bucket. Once the stale file was uploaded, the prod version was gone.
  • Ignored own prior warnings: Their session summary from three hours prior explicitly flagged "risk of stale local files" — they were aware of the hazard and proceeded anyway.

Technical Root Cause: Missing Diff Workflow

The deployment workflow had a missing step. The correct sequence should be:

  1. Pull current production file(s) from S3 into a temp directory.
  2. Diff your local changes against what's actually live.
  3. If you didn't make that change, stop and ask before overwriting.
  4. Deploy to staging first, not production.
  5. Test staging, get approval, then promote to production.
  6. Print a six-line proof block showing the exact file, size, hash, and S3 path before executing any cp.

This workflow existed in documentation but was not enforced in the mental model. The developer knew about staging but didn't require it. They knew about stale files but didn't check them.

Infrastructure: S3 + CloudFront Distribution

The production setup is:

  • S3 Buckets:
    • queenofsandiego-prod — production files, public-read ACL, no versioning
    • queenofsandiego-staging — staging files, public-read ACL, no versioning
  • CloudFront Distributions:
    • Production distribution ID (used for cache invalidation after deploy)
    • Staging distribution ID (lower TTL, used for pre-prod verification)
  • Route53: CNAME aliases route queenofsandiego.com and staging.queenofsandiego.com to their respective CloudFront distributions.

The lack of S3 versioning meant there was no rollback path once the bad file went live. CloudFront caching meant some users saw the old hero fade (cached edge responses) while others saw the broken version (cache miss), creating inconsistent behavior for 2–4 minutes until the cache expired.

The Fix: Eight Hard Rules for Deployment

We codified eight deployment rules into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md and created a condensed reference in the top-level CLAUDE.md for all sites. These rules load automatically at the start of every QOS engineering session:

  • D1 — Pull S3 and diff before editing: Always run aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.prod first, then diff index.html index.html.prod. If you didn't make a change that's in the diff, ask before proceeding.
  • D2 — Staging-first, single-target deploys: Never deploy to staging and prod in the same command. Deploy to staging first, print the proof block, wait for explicit approval, then deploy to prod alone.
  • D3 — One logical change per deploy: Each cp command targets one file or cohesive set (e.g., index.html + styles.css for a single feature). Don't batch unrelated changes.
  • D4 — Obey your own prior session summaries: If your prior session flagged a risk (e.g., "stale local files are a hazard"), treat it as an active constraint. Don't override it without asking CB.
  • D5 — Snapshot prod before overwriting: Before any cp, save the current S3 file to a dated backup file in a /releases directory. Example: releases/index.html.prod.2025-01-15T14-32Z. This is your manual rollback when S3 versioning is off.
  • D6 — Proof block before every cp: Print a six-line proof block in chat showing: file name, local size, local hash (sha256sum or md5), S3 path, current prod size, and current prod hash (from s3api head-object). Do not execute cp until you've printed this.
  • D7 — Feature-token registry: Maintain a simple text file in the repo root listing every "visible feature" deployed to prod in the last 90 days. Example: HERO_JADA_FADE:index.html:2025-01-14T10:22Z. Before deploying any index.html`, grep the registry and S3's current file to confirm you're not erasing a feature token.
  • D8 — Escalate if S3 is ahead: If S3's current version is newer than your local file and you didn't write the diff, stop immediately and escalate to CB. Do not overwrite. Wait for direction.

Practical Example: Correct Deploy Sequence

# Step 1: Pull and diff
aws s3 cp s3://queenofsandiego-prod/index.html ./index.html.prod
diff index.html index.html.prod

# Step 2: If diff shows changes you didn't make, ask CB. Otherwise