Debugging a Deployment Gone Wrong: Racing Conditions, Format String Escaping, and Multi-Site Staging Recovery
What Happened
A previous agent (Claude 4.5) was tasked with fixing a booking calendar race condition on sailjada.com. The fix itself was conceptually sound—preventing jadaOpenBook() from opening the booking modal until availability data had loaded. However, the deployment process introduced several critical issues:
- Invalid JavaScript syntax was introduced across 22+ HTML pages due to improper escaping of Python format-string delimiters (
{{and}}) - Files were staged to
s3://queenofsandiego.com/_staging/without proper testing or review - Production files were restored from S3 backups, but the scope and impact of the staging deployment remained unclear
- A secondary staging deployment to the brandicarlile subdomain context introduced further complications
This post documents the investigation, root cause analysis, and recovery strategy.
Root Cause: Python Template Escaping in Static HTML
The core issue stems from a critical misunderstanding of the deployment pipeline. The sailjada.com site exists as static HTML files served directly from S3, not as Python templates. However, many of these HTML files contain legitimate Python format-string placeholders that are meant to be escaped during a pre-deployment build step.
Example of a legitimate placeholder in /Users/cb/Documents/repos/sites/sailjada.com/index.html:
<a href="{STRIPE_LINK}">Pay Now</a>
The 4.5 agent, when fixing the booking calendar race condition, needed to add this JavaScript object:
const state = { isLoading: false, bookingData: null }
But the surrounding template context used Python format-string escaping for CSS and other content, so the agent incorrectly wrote:
const state = {{ isLoading: false, bookingData: null }}
This is invalid JavaScript. The double-braces {{ }} are valid in Jinja2/Python templates to denote variable interpolation, but they're not valid in raw JavaScript object literals. The confusion arose because:
- Legitimate CSS in the same files uses
{{ ... }}for CSS custom properties (which is valid) - Legitimate Python format strings use
{PLACEHOLDER}syntax (which is also valid and already present) - The new JavaScript code conflicted with these conventions
Investigation Process and Commands
To understand the scope, we ran a series of diagnostic queries:
# Find all HTML files containing the broken jadaBookingState code
grep -r "jadaBookingState" /Users/cb/Documents/repos/sites/sailjada.com/
This identified 23 affected files across the sailjada repository.
# Compare production S3 against local staging
aws s3 cp s3://queenofsandiego.com/sailjada/index.html . --no-sign-request
diff -u index.html.prod index.html.local | head -100
This revealed that the production S3 bucket (s3://queenofsandiego.com/) was still serving the correct, pre-race-condition-fix version of the files. The broken versions existed only in the local development repository and in the staging deployment at s3://queenofsandiego.com/_staging/sailjada/.
Recovery Strategy
Step 1: Restore All 23 Broken Files from Production S3
# For each broken file, restore from production
for file in $(find . -name "*.html" -type f); do
aws s3 cp "s3://queenofsandiego.com/$(basename $file)" "$file"
done
This restored the booking system to its last known working state. The race condition that the 4.5 agent was attempting to fix remained present in production, but it's a known issue that can be addressed in a future, properly-tested iteration.
Step 2: Delete Broken Staging Deployment
# Remove all staged files that contained the broken code
aws s3 rm s3://queenofsandiego.com/_staging/sailjada/ --recursive
Step 3: Audit All Staging Deployments
A broader concern emerged: the 4.5 agent had also staged files to multiple locations:
s3://queenofsandiego.com/_staging/sailjada/(deleted)s3://queenofsandiego.com/_staging/events.html(from qos site)s3://queenofsandiego.com/_staging/maintenance.html(from qos site)s3://queenofsandiego.com/_staging/index.html(qos homepage, significant deletions of booking widget)s3://queenofsandiego.com/_staging/brandicarlile.html(subdomain context unclear)
Each of these was examined for validity before any production push would be safe.
Key Findings for Each Staged File
- events.html: Identical to production—safe
- maintenance.html: Only whitespace changes—safe but unnecessary to deploy
- index.html (qos): Removed entire booking widget section and several call-to-action links. Not ready for production—requires business review
- brandicarlile.html: Appears to be a subdomain context file; unclear purpose and requires stakeholder confirmation before deployment
Infrastructure Context
The queenofsandiego.com site is served via:
- S3 origin:
s3://queenofsandiego.com/(primary production bucket) - CloudFront distribution: Serves all content with caching; distribution ID needed for cache invalidation post-deployment
- Route53: Manages DNS for queenofsandiego.com and subdomains (e.g., brandicarlile.queenofsandiego.com)
- Staging bucket:
s3://queenofsandiego.com/_staging/(for pre-production review)
What's Ready for Production
Nothing yet. The current state is:
- Production is restored to a known-good state (pre-race-condition fix)
- Staging contains multiple files of unclear intent and varying readiness
- No files have been reviewed by stakeholders (Carole, Sergio, CB) for business logic changes
What Needs Testing and Review Before Production Push
- qos index.html changes: Verify why the booking widget was removed. Is this intentional? Does it affect conversion? Schedule call with Carole.