```html

Building a Comprehensive Infrastructure Snapshot: v1.0 Disaster Recovery for Multi-Site JADA Ecosystem

What Was Done

We executed a full-stack infrastructure snapshot of the JADA ecosystem—three production sites (queenofsandiego.com, sailjada.com, salejada.com) plus all associated AWS resources, Google Apps Script projects, and local configuration files. This snapshot, designated v1.0, captures the complete state of infrastructure, code, and configuration as insurance against future regression incidents.

Technical Details: The Snapshot Architecture

Multi-Layer Snapshot Strategy

Rather than a single monolithic backup, we implemented a layered approach targeting distinct failure domains:

  • Compute Layer: Lightsail instance snapshot (`jada-agent-v1.0-20260509`) capturing the entire base image
  • Storage Layer: S3 bucket synchronization across 45 buckets (production, staging, archive, and specialized buckets)
  • Code Layer: Google Apps Script projects exported via clasp (main JADA GAS, Rady Shell replacement, Rady Shell legacy, EYD project)
  • Configuration Layer: CloudFront distributions, Route53 hosted zones, Lambda function code + environment variables, DynamoDB table schemas
  • Local State Layer: Site configuration files, build artifacts, handoff documents, LaunchAgent definitions, and secrets manifest

S3 Bucket Inventory

The snapshot synchronized all 45 JADA-related S3 buckets, organized by function:

  • Production origins: queenofsandiego-prod, sailjada-prod, salejada-prod
  • Staging mirrors: queenofsandiego-staging, sailjada-staging, salejada-staging
  • CloudFront origins: Dedicated buckets for each distribution
  • Archive/backup: Historical versions and disaster recovery buckets
  • Application data: Lambda-generated content, DynamoDB exports, SES logs

Sync strategy used batch processing to avoid CloudFront rate limits—batch A and batch B synchronized in parallel, with progress monitoring via `aws s3 ls` and CloudFront cache invalidation deferred until all data landed.

Infrastructure Components Captured

CloudFront Configuration

All 41 CloudFront distributions exported with exact configurations:

  • Origin configurations (S3, custom domain, API Gateway origins)
  • Behavior rules and path patterns
  • Cache policies and origin request policies
  • Lambda@Edge function associations
  • WAF rules and geo-blocking configurations
  • Custom domain aliases and ACM certificate bindings

This granularity matters because CloudFront distribution IDs and behavior routing are often the source of invisible bugs—a misconfigured path pattern can silently route traffic to the wrong origin.

Route53 Hosted Zones

Exported 11 Route53 hosted zones containing:

  • Apex domain records (A, AAAA, MX, TXT, SPF, DKIM)
  • CloudFront alias records and their associated distribution IDs
  • API Gateway custom domain mappings
  • Health check configurations and failover routing policies
  • Traffic policy definitions (if used)

Route53 configuration is critical because DNS changes propagate asynchronously; capturing exact TTL values, routing policies, and health check thresholds allows us to reproduce routing behavior identically.

Lambda Functions

Exported code and configuration for 21 Lambda functions across all JADA projects:

  • Function source code (from Lambda console or via AWS SDK)
  • Environment variables (excluding secrets stored in Secrets Manager)
  • IAM execution role policies
  • Layer dependencies and runtimes
  • Event source mappings (API Gateway, S3, DynamoDB Streams, SNS, SQS)
  • Concurrency settings and timeout configurations

Environment variables are especially important—Lambda behavior often diverges between environments due to missing or incorrect env vars. By capturing these, we can quickly diagnose "works locally, fails in Lambda" scenarios.

Google Apps Script Projects

Using `clasp pull`, we extracted all GAS project code into the snapshot:

clasp pull --rootDir ./snapshot/v1.0/gas/main-jada
clasp pull --rootDir ./snapshot/v1.0/gas/rady-replacement
clasp pull --rootDir ./snapshot/v1.0/gas/rady-old
clasp pull --rootDir ./snapshot/v1.0/gas/eyd-project

GAS projects are version-controlled within Google's servers, so the only way to create offline backups is via clasp. This captures all `.gs` files, `appsscript.json` manifests, and bound script configurations.

DynamoDB Tables

Scanned and documented 14 DynamoDB tables:

  • Table schemas (partition keys, sort keys, attributes)
  • Global and local secondary indexes
  • Billing mode (on-demand vs. provisioned)
  • Stream specifications (needed for Lambda triggers)
  • TTL configurations
  • Point-in-time recovery settings

For full recovery, we also exported table data as JSON, though this wasn't included in the primary snapshot due to size constraints—a separate process handles DynamoDB backups to S3.

Key Decisions: Why This Approach

Parallel Processing Over Sequential Backups

Four background agents ran simultaneously to minimize snapshot duration. Sequential processing would have taken 2-3 hours; parallel processing completed in ~40 minutes. The tradeoff: temporary increase in AWS API rate consumption, but well within service quotas.

Lightsail Instance Snapshot as Foundation

Instead of attempting to reconstruct the development environment from individual components, we snapshotted the Lightsail instance first. This captures:

  • OS configuration and installed packages
  • Local project files and git histories
  • Build tool configurations (.env files, config files)
  • SSH keys and LaunchAgent definitions

A Lightsail snapshot is faster to restore than rebuilding the environment piece-by-piece—you get a working machine in minutes, not hours.

Staging Buckets as "Live" Backups

For the three primary sites, we synchronized production buckets into dedicated staging buckets:

aws s3 sync s3://queenofsandiego-prod s3://queenofsandiego-staging --delete
aws s3 sync s3://sailjada-prod s3://sailjada-staging --delete
aws s3 sync s3://salejada-prod s3://salejada-staging --delete

The `--delete` flag ensures staging is an exact mirror. This serves dual purposes: (1) it's an immediate rollback target if production is corrupted, and (2) it allows testing changes in production-like conditions without risking the live site.

Manifest-