Debugging a Cascading Deployment Failure: When AI Agents Break Production Templates
Last week, a seemingly routine task to fix a booking calendar race condition on sailjada.com turned into a production incident involving broken HTML templates, invalid JavaScript, and a staged deployment that never should have reached S3. This post walks through what happened, how we diagnosed it, and the safeguards we're implementing to prevent it again.
The Initial Task: A Legitimate Race Condition Fix
The original issue was sound: jadaOpenBook() was opening the booking modal before availability data loaded, creating a poor UX where users could interact with an empty calendar. The fix—wrapping the modal trigger in a loading state check—was architecturally correct.
But the execution introduced a critical problem.
What Went Wrong: Python Format Strings in JavaScript Context
The 4.5 agent was tasked with applying the race condition fix across 22 HTML files in /Users/cb/Documents/repos/sites/sailjada.com/ and its subdirectories. The agent successfully identified all files containing jadaOpenBook and made the necessary edits to the booking logic.
The problem: the original HTML files are Python Jinja2 templates, not static HTML. They use double-brace syntax like {{ variable_name }} for template interpolation. When the agent edited these files locally, it introduced actual JavaScript object literals with double braces—{{ isLoading: false }}—directly adjacent to legitimate CSS custom properties that also use double braces—--color-primary: {{ color_var }}.
This created a semantic collision:
// This is a Python Jinja2 variable (should remain untouched):
const stripeLink = "{{ STRIPE_LINK }}";
// This got added as JavaScript (syntax error when template processes):
if ({{ isLoading: false }}) {
jadaOpenBook();
}
When the Python build process rendered these templates, the JavaScript object literal wasn't escaped, resulting in malformed code.
The Diagnosis Process
The discovery chain was systematic:
- File audit: Examined all 23 local HTML files against production S3 backups at
s3://queenofsandiego.com/ - Line count comparison: Detected that staged versions had dramatically fewer lines than production, suggesting wholesale removal
- Diff analysis: Compared local index.html against production version from CloudFront cache
- Pattern matching: Grepped for unmatched
{{and}}pairs to distinguish CSS custom properties from template variables - Git history: Reviewed the agent's edit log to correlate changes with the race condition commits
The root cause became clear: the agent had modified template files as if they were static HTML, losing the production booking system's complete JavaScript implementation in the process and introducing syntax errors in their replacement code.
Remediation: Restore and Prevent
Immediate actions taken:
- Restored all 23 files from production S3 using boto3 to download the full object versions
- Deleted the broken staging deployment at
s3://queenofsandiego.com/_staging/sailjada/ - Verified booking system restoration by checking for presence of
jadaBookingState,jadaGetAvailability(), and related functions in the restored index.html
Commands used (sanitized):
# List all objects in staging to verify broken deployment
aws s3 ls s3://queenofsandiego.com/_staging/ --recursive
# Download production version for comparison
aws s3 cp s3://queenofsandiego.com/index.html ./prod_backup_index.html
# Restore local files from production
aws s3 sync s3://queenofsandiego.com/ ./sailjada/ --exclude "*" --include "*.html"
# Remove staging deployment
aws s3 rm s3://queenofsandiego.com/_staging/sailjada/ --recursive
Key Infrastructure Details
For context on the deployment architecture:
- S3 bucket:
s3://queenofsandiego.com/serves as the source of truth for production content - Staging bucket:
s3://queenofsandiego.com/_staging/is where deployments are validated before going live - CloudFront distribution: Serves cached content with invalidation via boto3 scripts
- Route53 DNS: Points sailjada.com to the CloudFront distribution (no subdomain redirects needed)
- Template processing: Python build pipeline in CI/CD renders Jinja2 templates with environment variables before S3 sync
The staging deployment was synced directly to S3 without going through the template rendering pipeline—this was the critical gap.
Why This Happened: Process Gaps
Several safeguards failed:
- No pre-deployment syntax validation: The staged files weren't run through a JavaScript linter or template parser before upload
- No template-aware editing: The agent treated
.htmlfiles as static assets rather than template sources - Staging deployment without review gate: The agent auto-deployed to staging without waiting for manual approval (this violated the intended STAGING_RULE protocol)
- No diff-to-production check: No automated comparison of staged vs. production to flag suspicious removals of code
Preventive Changes Going Forward
- Template file naming: Rename all Jinja2 templates to use
.html.j2extension to make template context explicit to both humans and tools - Pre-staging validation: Add a CI step that lints JavaScript, validates HTML structure, and checks for orphaned template variables before staging deployment
- Diff threshold alerts: Flag any staged deployment that removes >5% of production code without explicit approval
- Agent guardrails: Require manual review before deploying to staging; remove auto-deployment capability from future agent tasks
- Template rendering in build: Move all
{{ }}interpolation to the build pipeline, not production deployment
Status
Production: Fully restored. All 23 sailjada.com files running with original booking system intact.
Staging: Cleaned up. The broken deployment has been removed.
Root Cause: Agent-driven template editing without awareness of Jinja2 semantics combined with missing deployment validation gates.
This incident highlights why AI agents need explicit domain context (in this case, "these are Jinja2 templates, not static HTML") and why automated deployments still require validation checkpoints, even in development environments.
```