```html

Diagnosing and Staging a Critical Deposit Outage: Apps Script Access Control & Multi-Endpoint Deployment Strategy

During a recent operational incident, we discovered that the deposit/reservation functionality across 11 event pages had silently failed due to misconfigured access controls on our Google Apps Script backend. This post covers the systematic diagnosis, root-cause analysis, and staged remediation we implemented.

The Problem: Dark Funnel, Zero Visibility

Our public-facing event pages (sailjada.com and queenofsandiego.com) all expose a "Reserve" widget that triggers booking and deposit flows. Without monitoring on the frontend button clicks or backend 4xx responses, the failure was invisible until manual testing revealed:

  • Primary endpoint: https://script.google.com/macros/s/.../44Pme8wCA/exec returning 403 Forbidden
  • Fallback endpoint: https://script.google.com/macros/s/.../AFsLWaO3/exec returning 404 Not Found

Every inbound reservation attempt was failing silently. Unlike a visible server error, this meant we had no data on lost bookings—only the knowledge that the funnel was broken.

Root Cause: Apps Script Deployment & Access Control Mismatch

We maintain two Apps Script deployments:

  1. Primary (Production): Project ID 1dDpSK8JZda7XUpKIGlyyAX19KLL4JqFjYVtpcunB5ZE3-NMX_9v0lQJ5, Deployment ID 44Pme8wCA
  2. Fallback (Worship): Separate project, Deployment ID AFsLWaO3

The primary deployment had been set to "Restricted" access (likely during a security audit or permission reconfiguration). This prevented unauthenticated requests—which our frontend forms rely on—from executing the script. The fallback had been deleted entirely.

Diagnostic Approach: Endpoint Testing & Deployment Inspection

We validated the outage with direct HTTP calls:

curl -i https://script.google.com/macros/s/AKfycbz.../44Pme8wCA/exec

Confirmed 403. We then cross-referenced the live event pages to identify which endpoints they called:

grep -r "script.google.com" /path/to/event-pages/ | head -20

All 11 pages referenced the primary endpoint. We also audited the source repo for any stale clasp configurations or secondary project references:

find ~/Documents/repos -name "appsscript.json" -o -name ".clasp.json" | xargs cat

This confirmed a single Apps Script project was the source of truth, but two distinct deployments existed (one active, one orphaned).

Infrastructure: S3, CloudFront, and Event Page Distribution

Our event pages are distributed via:

  • S3 Bucket: queenofsandiego-assets (region: us-west-2)
  • CloudFront Distribution: ID E2ABC1234XYZ (covers sailjada.com and queenofsandiego.com CNAMEs)
  • Route53 Zone: Hosted zone for queenofsandiego.com with A records pointing to CloudFront

Event page HTML files are stored at paths like:

s3://queenofsandiego-assets/events/YYYY-MM-DD/index.html
s3://queenofsandiego-assets/events/YYYY-MM-DD/assets/form.js

The form.js file contains the hardcoded Apps Script endpoint URL. A CloudFront invalidation is required after any URL change to purge cache.

Staged Remediation: Two-Path Strategy

Rather than immediately modifying live pages, we staged the fix with these steps:

Phase 1: Redeploy with Correct Access Control

Inside the Google Apps Script console:

  1. Navigate to project 1dDpSK8JZda7XUpKIGlyyAX19KLL4JqFjYVtpcunB5ZE3-NMX_9v0lQJ5
  2. Go to Deploy → Manage Deployments
  3. Select the primary deployment (44Pme8wCA)
  4. Change Who has access to "Anyone"
  5. Set Execute as to "Me" (the service account that owns the script)
  6. Click Deploy

This reconfigures the existing deployment's permissions without changing the URL, so no page updates are needed.

Phase 2: Restore Fallback Deployment (Optional, for Redundancy)

If required, redeploy the fallback endpoint in a separate Apps Script project, then update the fallback URL in form.js. This provides a secondary path if the primary becomes unavailable again.

Phase 3: Verify and Invalidate Cache

After redeploy:

curl -i https://script.google.com/macros/s/.../44Pme8wCA/exec
# Expected: 200 OK with reservation response

Then invalidate CloudFront:

aws cloudfront create-invalidation --distribution-id E2ABC1234XYZ --paths "/*"

Key Decisions: Why This Approach?

  • No code changes required: Fixing access control in the Google console is faster and safer than redeploying the script logic.
  • URL stability: Redeploying into the same deployment slot preserves the /exec endpoint, avoiding page edits and CDN cache busting.
  • Two-endpoint architecture: Maintaining a fallback provides graceful degradation if the primary fails again, but requires coordination to keep both in sync.
  • Dark funnel risk accepted (temporarily): Until 403 is resolved, we have no visibility into lost bookings. Monitoring should be added to catch future outages faster.

What's Next

  • Monitoring: Add CloudWatch alarms or Datadog checks on the Apps Script endpoint to alert on 4xx responses.
  • Fallback automation: Implement a scheduled job that validates both endpoints daily and alerts on failures.
  • Access control audit: Review who can modify Apps Script deployments and access controls; consider using Identity & Access Management (IAM) to restrict console changes.
  • Deployment docs: Document the exact steps to redeploy and the expected access control settings, so this can be done quickly in future incidents.

Once the primary deployment