```html

Preventing S3 Deployment Regressions: A Case Study in Stale Local State and the Hard Rules That Fix It

Last week, a deployment session accidentally overwrote three working features on queenofsandiego.com by pushing a stale local index.html over a newer production version in S3. The hero crossfade animation (JADA → BOOK NOW), the Stripe embedded checkout flow, and the deliberate removal of a "For Ranch & Coast readers..." headline all vanished. This post documents what went wrong, why it happened, and the hard rules we've now encoded to prevent it.

The Incident: What Happened

The session deployed to both S3 staging and S3 production in a single command, violating the staging-first principle. It used a local copy of index.html that was several commits behind what was already live in the production S3 bucket. The deployment tool (aws s3 cp with --recursive) overwrote the newer remote file with the older local one, erasing three independent features.

Root cause: The agent ignored its own prior session summary, which warned that local files might be stale relative to S3. No diff was performed before the push. No snapshot of production was taken beforehand. The staging-only rule exists in the codebase but was not checked before executing the deploy.

Technical Details: The Deployment Flow

Our deployment pipeline for queenofsandiego.com follows this sequence:

  • Local dev: Edit files in /Users/cb/Documents/repos/sites/queenofsandiego.com/
  • Staging push: aws s3 cp --recursive ./path-to-built-files s3://staging.queenofsandiego.com/ --profile qos
  • CloudFront invalidation (staging): Invalidate staging-dist-id after push
  • Human review: Visit staging.queenofsandiego.com, verify no regressions
  • Production push: Only after staging sign-off, aws s3 cp --recursive ./path-to-built-files s3://queenofsandiego.com/ --profile qos
  • CloudFront invalidation (prod): Invalidate production distribution ID

The incident violated this at step 1.5: no pre-deployment diff and snapshot. It also violated step 4: both staging and prod were pushed in the same atomic operation, with no human review opportunity in between.

Infrastructure: S3, CloudFront, and State

Our infrastructure uses:

  • S3 buckets: queenofsandiego.com (production), staging.queenofsandiego.com (staging). Both are private; traffic flows through CloudFront only.
  • CloudFront distributions: One for each bucket. Cache behavior is TTL 300 seconds for HTML, longer for assets.
  • Route53: DNS CNAME records point queenofsandiego.com and staging.queenofsandiego.com to their respective CloudFront distribution domain names.
  • No S3 versioning enabled. This is the critical gap: we overwrite objects in place, so old versions are unrecoverable without manual backups.

The stale-local problem occurs because a developer may pull the repo, edit a file locally, then weeks pass before deploying. Meanwhile, another session (or manual push) has updated S3 directly. A naive s3 cp --recursive then clobbers the newer remote file with the older local one.

The Hard Rules: Preventing Regressions

We've now encoded eight hard rules into /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md and a condensed summary into the top-level CLAUDE.md for cross-site awareness:

  • D1 — Pull and diff before edit: Before modifying any file destined for S3, run aws s3 cp s3://queenofsandiego.com/index.html ./index.html.prod --profile qos and diff -u index.html.prod index.html locally. Document the result in the session.
  • D2 — Staging only, single target: Never deploy to both staging and production in one command. Always stage first, always in isolation: aws s3 cp ./index.html s3://staging.queenofsandiego.com/index.html --profile qos
  • D3 — One logical change per deploy: If editing the hero fade, deploy only the hero fade. Don't batch unrelated fixes. This isolates regression scope.
  • D4 — Obey prior session warnings: If a prior session summary says "local files may be stale," treat it as blocking. Re-pull and re-verify before proceeding.
  • D5 — Snapshot production before overwriting: aws s3 cp s3://queenofsandiego.com/index.html ./backups/index.html.$(date +%s) --profile qos before any cp in the reverse direction.
  • D6 — Six-line proof block: Before executing any deployment, print a block showing: old hash, new hash, S3 bucket name, file path, timestamp, and staging/prod target. Require explicit human confirmation in chat.
  • D7 — Feature-token registry: Maintain a registry (grep-able in code comments) of major features and their unique CSS class or ID. Before prod push, grep S3-current against these tokens to confirm nothing vanished. Example: /* FEATURE_TOKEN: jada-hero-crossfade-1 */
  • D8 — Escalate to CB if S3 is ahead: If S3 has a newer version than local, pause and ask CB whether to pull-and-rebase or proceed with a merge strategy. Never overwrite without decision.

Key Decisions and Rationale

Why not enable S3 versioning? Cost and complexity. With versioning, every overwrite creates a new object version; over weeks, the bill grows. Our snapshot approach (D5) gives us recovery without ongoing cost, provided we catch the regression quickly.

Why require staging review before prod? Staging is cheap to verify and is the only environment where a regression is invisible to users. Catching regressions here prevents customer-facing outages.

Why feature tokens? Visual inspection of staging works, but it's fallible—especially for animation or interaction features. A grep of production HTML against a known token set is deterministic and can be scripted.

Why escalate to CB instead of auto-merging? Merging stale local and new remote requires understanding intent. CB owns the decision of which version is canonical; the agent should not guess.

What's Next

These rules are now loaded automatically in every queenofsandiego.com session via the CLAUDE.md file. We're implementing a pre-deployment checklist as a markdown table in the session context so agents can cross-check