```html

Preventing S3 Deployment Regressions: A Case Study in Stale Local Files and Feature Loss

This post documents a critical production incident on queenofsandiego.com and the hardened deployment ruleset we built to prevent it from happening again. The incident: a stale local index.html was deployed to S3, silently wiping three working features (hero image crossfade, Stripe embedded checkout, and a previously-deleted marketing line). The root cause was skipping a diff check before deployment. The fix is a formal pre-deploy validation protocol with hard rules and feature-token tracking.

What Happened

During a routine deployment to s3://queenofsandiego.com/, the agent deployed a local copy of /Users/cb/Documents/repos/sites/queenofsandiego.com/index.html without first pulling the current S3 version and diffing it against the local version. The S3 file was newer and contained three features not present in the stale local copy:

  • A JADA → BOOK NOW hero image crossfade (JavaScript-driven animation linking the Jada Sailing booking flow)
  • Stripe embedded checkout integration in the booking modal
  • Removal of the "For Ranch & Coast readers..." marketing hero line (previously deprecated)

The deployment was also executed in a single command that deployed to both staging and production simultaneously, violating the staging-first validation rule that should have caught the stale state before hitting prod.

Technical Details: The Deployment Flow

The affected resources:

  • S3 Bucket: queenofsandiego.com (CloudFront origin)
  • CloudFront Distribution: d-queenofsandiego (invalidation required to serve fresh cache)
  • File Path: /index.html (root landing page)
  • Staging Origin: staging.queenofsandiego.com (separate S3 bucket for pre-prod validation)

The correct deployment sequence should have been:

# Step 1: Pull current S3 version and save as reference
aws s3 cp s3://queenofsandiego.com/index.html ./index.html.s3-current

# Step 2: Diff local against S3 current
diff -u ./index.html.s3-current ./index.html | head -50

# Step 3: If diff is non-empty and unexpected, STOP and escalate
# If diff is expected, proceed to staging only

# Step 4: Deploy to staging bucket only
aws s3 cp ./index.html s3://staging.queenofsandiego.com/index.html

# Step 5: Invalidate staging CloudFront (or use staging.queenofsandiego.com direct)
aws cloudfront create-invalidation \
  --distribution-id d-staging-queenofsandiego \
  --paths "/index.html"

# Step 6: CB reviews staging, confirms feature state

# Step 7: Only then, deploy to prod and invalidate
aws s3 cp ./index.html s3://queenofsandiego.com/index.html
aws cloudfront create-invalidation \
  --distribution-id d-queenofsandiego \
  --paths "/index.html"

Steps 1–2 were skipped entirely. Steps 4–7 were combined into a single operation.

Root Cause Analysis

The agent's prior session summary had explicitly warned: "Snapshot prod before overwriting; no S3 versioning on this bucket means stale local copies can wipe live features silently." This warning was generated by the agent itself but not checked before the risky deployment. Additionally, the agent had a standing rule in memory (loaded from /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md) stating "Deploy to staging first, always single-target," but executed a dual-target deploy instead.

The incident reveals two failure modes:

  • State Loss: S3 lacks object versioning, so an older local file overwrites newer prod data with no recovery path.
  • Rule Evasion: Rules existed but were not checked before the risky action.

The Fix: Hard Rules and Feature-Token Registry

We implemented eight mandatory rules in /Users/cb/Documents/repos/sites/queenofsandiego.com/CLAUDE.md, which auto-load on every session:

D1. Pull S3 current before editing or deploying any shared file (index.html, CSS, JS in root).
    Command: aws s3 cp s3://queenofsandiego.com/[FILE] ./[FILE].s3-current
    Reason: S3 has no versioning; stale local copies silently overwrite live features.

D2. Diff local vs. S3 current before any deploy. If diff is non-empty and unexpected, STOP.
    Command: diff -u ./[FILE].s3-current ./[FILE] | head -50
    Reason: Catch feature regressions before they go live.

D3. Deploy to staging only on the first attempt. No dual-target deploys.
    Target: s3://staging.queenofsandiego.com/[FILE]
    Invalidate: aws cloudfront create-invalidation --distribution-id d-staging-queenofsandiego --paths "/[FILE]"
    Reason: Staging is CB's checkpoint; one-target ops are atomic and auditable.

D4. Obey prior session warnings. If your own session summary says "don't do X," treat it as law.
    Reason: Prevents repeat of known failures in the same codebase.

D5. Snapshot prod S3 state before overwriting. Save as [FILE].prod-snapshot-[TIMESTAMP].
    Reason: Emergency rollback if a silent regression is caught post-deploy.

D6. Print a six-line proof block in chat before any S3 cp command:
    - Old MD5 (S3 current)
    - New MD5 (local)
    - Lines changed (diff line count)
    - Intended target (staging or prod, never both)
    - Feature tokens affected (see D7)
    - One-line reason
    Reason: Forces explicit review and creates a durable audit trail.

D7. Maintain a feature-token registry in comments within index.html.
    Format: 
    Tokens: JADA_CROSSFADE, STRIPE_CHECKOUT, RANCH_COAST_HERO, BOOK_NOW_LINK, etc.
    Grep the S3 current version against this registry before deploy.
    Reason: Quickly detect if a deploy would erase known features.

D8. If S3 is ahead of local (newer, larger, or has features local lacks), escalate to CB.
    Do not overwrite. Reason: Prevents data loss and ensures CB is aware of drift.

Additionally, a condensed pointer was added to the top-level /Users/cb/Documents/repos/CLAUDE.md so non-QOS sites load the gist:

Before deploying any shared S3 file:
  1. Pull S3 current.
  2. Diff local vs. S3 current.
  3. Deploy to staging first.
  4. Obey prior warnings.
  5. Snapshot prod.
  See sites/queenofsandiego.com/CLAUDE.md for full hard rules (D1–D8).

Infrastructure and Automation