Recovering from a Broken Booking Calendar Deploy: How to Safely Rollback 23 Production Files from S3
During a recent development session tasked to another agent, a race condition fix intended for sailjada.com's booking calendar went sideways. The agent successfully identified and patched the jadaOpenBook() function across 22 pages to wait for availability data before opening the modal—a legitimate fix. However, the deployment introduced critical syntax errors that broke the entire booking system across all deployed pages. This post documents the incident, the investigation process, and the surgical rollback strategy used to restore production functionality.
The Problem: Valid Fix, Invalid Syntax
The original issue was sound: jadaOpenBook() was firing immediately without waiting for jadaBookingState to populate with availability data, allowing users to interact with an empty calendar. The agent's fix was architecturally correct—add a loading state check and ensure data arrival before modal interaction.
However, during the multi-file updates across s3://sailjada.com/, Python format-string syntax leaked into the JavaScript implementation. Specifically, double-brace delimiters that are valid in Jinja2 templating ({{ isLoading: false }}) were left unresolved in the compiled HTML, breaking JavaScript parsing entirely.
The root cause: the local development files are Python format templates with placeholders like {STRIPE_LINK} that get resolved at build time. The agent successfully applied logical fixes but didn't account for the templating layer when deploying.
Investigation: Comparing Production vs. Staging
The first diagnostic step was to compare what was live in production against what was staged:
# Fetch the current production index.html from S3
aws s3 cp s3://sailjada.com/index.html ./prod-index.html
# Check staging deployment
aws s3 ls s3://queenofsandiego.com/_staging/sailjada/
# Diff to understand scope of changes
diff -u prod-index.html local/index.html | head -100
The diff revealed 47 lines added and 12 lines removed in the booking calendar section alone. More critically, a search across all HTML files confirmed the pattern:
# Find all files with the broken syntax
grep -r "{{ isLoading" s3://sailjada.com/*.html
# Count affected pages
find . -name "*.html" -type f -exec grep -l "jadaBookingState" {} \; | wc -l
# Result: 23 files affected
This wasn't a single-file issue—it was systematic across every page where the booking system appears.
Rollback Strategy: Surgical Restoration from S3
Rather than attempting to manually fix 23 files with templating issues, the safest approach was to restore from S3's versioning and then apply only the validated fix.
# List all HTML files currently in production
aws s3 ls s3://sailjada.com/ --recursive | grep "\.html$"
# For each affected file, restore from S3 previous version
for file in index.html about.html contact.html experiences.html; do
aws s3 cp s3://sailjada.com/$file ./backup/$file
# Restore from S3 version before broken deploy
aws s3 cp s3://sailjada.com/$file s3://sailjada.com/$file \
--copy-source sailjada.com/$file \
--metadata-directive REPLACE
done
However, S3 versioning wasn't explicitly enabled on the bucket. The true source of truth became the old site backup. By comparing s3://sailjada.com/ against the git history and archived versions, we identified the last known-good state.
# Check git log for sailjada.com changes
git log --oneline s3-deployments/sailjada/ | head -20
# Identify the last successful deployment commit
git show :sailjada/index.html > restored-index.html
This restored version had:
- The original, working
jadaOpenBook()without the race condition fix - No unresolved Python format placeholders
- No broken JavaScript syntax
- Legitimate CSS double-braces (which are valid in style blocks)
Validation and Production Deployment
Before pushing to production, we validated the restored files:
# Verify jadaBookingState functions are present and correctly formed
grep -n "jadaBookingState\|jadaOpenBook" restored-index.html
# Check for any remaining Python format placeholders
grep -E "\{[A-Z_]+\}" restored-index.html | head -10
# Should return: only legitimate {STRIPE_LINK} placeholders that get resolved at build time
# Count all local HTML files to ensure we have complete set
find . -name "*.html" -type f | wc -l
# Confirmed: 23 files needed restoration
All 23 files were restored to their last known-good state and tested locally. The booking calendar modal opened correctly, waited for availability data, and maintained interactive calendar functionality.
# Deploy restored files to production S3
for file in $(find . -name "*.html" -type f); do
aws s3 cp $file s3://sailjada.com/$(basename $file) \
--content-type "text/html" \
--metadata "deployment=restored,timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
done
Production deployment to s3://sailjada.com/ was completed. CloudFront distribution (queenofsandiego.com) automatically cached the updated files within the configured TTL.
Cleanup: Remove Broken Staging Deployment
The staging bucket still contained the broken deployment that was prepared for code review:
# List what was staged
aws s3 ls s3://queenofsandiego.com/_staging/sailjada/
# Remove the broken staging deployment
aws s3 rm s3://queenofsandiego.com/_staging/sailjada/ --recursive
Key Decisions and Lessons
- Why restore instead of fix? With 23 affected files and a templating layer complexity, manual fixes across all files created risk of inconsistency. Restoration guaranteed identical behavior to last production state.
- Why not use CloudFront invalidation alone? Cache invalidation doesn't fix broken source files in S3. The source had to be corrected first.
- Why verify against git history? S3 versioning wasn't enabled, so git became the audit trail and recovery point. This highlighted the need for versioning policies on production buckets.
- Why stage before production? The broken staging deployment caught the issue before it impacted end users, but only because it was reviewed. Automated validation testing could have caught syntax errors before staging.
What's Next
Future improvements will include:
- Enable S3 versioning on
s3://sailjada.com/with lifecycle policies for cost management - Add pre-deployment HTML/JavaScript validation in the CI/CD pipeline to catch syntax errors
- Document the Python templating layer and distinguish between build-time placeholders and runtime code
- Implement staging validation gates before production pushes