Diagnosing and Staging a Critical Deposit Outage: Apps Script Access Control & Multi-Endpoint Deployment Strategy
During a recent operational incident, we discovered that the deposit/reservation functionality across 11 event pages had silently failed due to misconfigured access controls on our Google Apps Script backend. This post covers the systematic diagnosis, root-cause analysis, and staged remediation we implemented.
The Problem: Dark Funnel, Zero Visibility
Our public-facing event pages (sailjada.com and queenofsandiego.com) all expose a "Reserve" widget that triggers booking and deposit flows. Without monitoring on the frontend button clicks or backend 4xx responses, the failure was invisible until manual testing revealed:
- Primary endpoint:
https://script.google.com/macros/s/.../44Pme8wCA/execreturning403 Forbidden - Fallback endpoint:
https://script.google.com/macros/s/.../AFsLWaO3/execreturning404 Not Found
Every inbound reservation attempt was failing silently. Unlike a visible server error, this meant we had no data on lost bookings—only the knowledge that the funnel was broken.
Root Cause: Apps Script Deployment & Access Control Mismatch
We maintain two Apps Script deployments:
- Primary (Production): Project ID
1dDpSK8JZda7XUpKIGlyyAX19KLL4JqFjYVtpcunB5ZE3-NMX_9v0lQJ5, Deployment ID44Pme8wCA - Fallback (Worship): Separate project, Deployment ID
AFsLWaO3
The primary deployment had been set to "Restricted" access (likely during a security audit or permission reconfiguration). This prevented unauthenticated requests—which our frontend forms rely on—from executing the script. The fallback had been deleted entirely.
Diagnostic Approach: Endpoint Testing & Deployment Inspection
We validated the outage with direct HTTP calls:
curl -i https://script.google.com/macros/s/AKfycbz.../44Pme8wCA/exec
Confirmed 403. We then cross-referenced the live event pages to identify which endpoints they called:
grep -r "script.google.com" /path/to/event-pages/ | head -20
All 11 pages referenced the primary endpoint. We also audited the source repo for any stale clasp configurations or secondary project references:
find ~/Documents/repos -name "appsscript.json" -o -name ".clasp.json" | xargs cat
This confirmed a single Apps Script project was the source of truth, but two distinct deployments existed (one active, one orphaned).
Infrastructure: S3, CloudFront, and Event Page Distribution
Our event pages are distributed via:
- S3 Bucket:
queenofsandiego-assets(region: us-west-2) - CloudFront Distribution: ID
E2ABC1234XYZ(covers sailjada.com and queenofsandiego.com CNAMEs) - Route53 Zone: Hosted zone for queenofsandiego.com with A records pointing to CloudFront
Event page HTML files are stored at paths like:
s3://queenofsandiego-assets/events/YYYY-MM-DD/index.html
s3://queenofsandiego-assets/events/YYYY-MM-DD/assets/form.js
The form.js file contains the hardcoded Apps Script endpoint URL. A CloudFront invalidation is required after any URL change to purge cache.
Staged Remediation: Two-Path Strategy
Rather than immediately modifying live pages, we staged the fix with these steps:
Phase 1: Redeploy with Correct Access Control
Inside the Google Apps Script console:
- Navigate to project
1dDpSK8JZda7XUpKIGlyyAX19KLL4JqFjYVtpcunB5ZE3-NMX_9v0lQJ5 - Go to Deploy → Manage Deployments
- Select the primary deployment (
44Pme8wCA) - Change Who has access to "Anyone"
- Set Execute as to "Me" (the service account that owns the script)
- Click Deploy
This reconfigures the existing deployment's permissions without changing the URL, so no page updates are needed.
Phase 2: Restore Fallback Deployment (Optional, for Redundancy)
If required, redeploy the fallback endpoint in a separate Apps Script project, then update the fallback URL in form.js. This provides a secondary path if the primary becomes unavailable again.
Phase 3: Verify and Invalidate Cache
After redeploy:
curl -i https://script.google.com/macros/s/.../44Pme8wCA/exec
# Expected: 200 OK with reservation response
Then invalidate CloudFront:
aws cloudfront create-invalidation --distribution-id E2ABC1234XYZ --paths "/*"
Key Decisions: Why This Approach?
- No code changes required: Fixing access control in the Google console is faster and safer than redeploying the script logic.
- URL stability: Redeploying into the same deployment slot preserves the
/execendpoint, avoiding page edits and CDN cache busting. - Two-endpoint architecture: Maintaining a fallback provides graceful degradation if the primary fails again, but requires coordination to keep both in sync.
- Dark funnel risk accepted (temporarily): Until 403 is resolved, we have no visibility into lost bookings. Monitoring should be added to catch future outages faster.
What's Next
- Monitoring: Add CloudWatch alarms or Datadog checks on the Apps Script endpoint to alert on 4xx responses.
- Fallback automation: Implement a scheduled job that validates both endpoints daily and alerts on failures.
- Access control audit: Review who can modify Apps Script deployments and access controls; consider using Identity & Access Management (IAM) to restrict console changes.
- Deployment docs: Document the exact steps to redeploy and the expected access control settings, so this can be done quickly in future incidents.
Once the primary deployment