```html

Building a Comprehensive Infrastructure Snapshot: Lessons from a Multi-Service Rollback Recovery

When unexpected changes ripple through a production environment affecting three interconnected sites—queenofsandiego.com, sailjada.com, and salejada.com—the ability to quickly snapshot and restore becomes critical infrastructure insurance. This post details the technical approach taken to create a v1.0 snapshot encompassing 46 S3 buckets, 66 CloudFront distributions, 21 Lambda functions, multiple Google Apps Script projects, and local tooling across a complex JADA ecosystem.

What Was Done

A complete infrastructure snapshot was created to preserve state across multiple service layers. This wasn't a single backup—it was a layered capture strategy targeting different failure domains:

  • Lightsail instance snapshot: Full system state of the primary compute instance
  • S3 bucket synchronization: 45 distinct buckets totaling content across three sites and supporting infrastructure
  • Lambda function exports: Code, environment variables, configuration, and layer information for 21 functions
  • Infrastructure-as-Code exports: CloudFront distributions, Route53 DNS configurations, DynamoDB table schemas, API Gateway configurations
  • Google Apps Script projects: Four GAS projects supporting JADA workflows (main JADA, Rady Shell main, Rady Shell legacy, EYD)
  • Local development tooling: Python deployment scripts (update_dashboard.py, release.py), configuration files, and documentation

Technical Architecture and Parallel Strategy

Given the scale of resources, a synchronous approach would consume hours. Instead, four background agents were launched in parallel to handle distinct concerns:

Agent 1 (S3 Sync):
  - Task: aws s3 sync for all 45 buckets
  - Target: /snapshot/v1.0/s3-buckets/
  - Status tracking: Batch A and Batch B parallel execution

Agent 2 (Lambda Export):
  - Task: aws lambda get-function for all 21 functions
  - Capture: Function code, configuration, environment variables, layers
  - Target: /snapshot/v1.0/lambda-functions/[function-name]/

Agent 3 (AWS Configuration Export):
  - Task: CloudFront distributions (41 found across all zones)
  - Task: Route53 hosted zones (11 zones)
  - Task: DynamoDB table schemas (14 tables scanned)
  - Task: API Gateway, SES, ACM certificate inventory
  - Target: /snapshot/v1.0/aws-configs/

Agent 4 (Local Files and GAS):
  - Task: Copy site repositories and development files
  - Task: Clasp pull from all four Google Apps Script projects
  - Task: Archive LaunchAgents, secrets manifest, documentation
  - Target: /snapshot/v1.0/local-files/ and /snapshot/v1.0/gas-projects/

This parallel approach reduced total snapshot time from an estimated 4+ hours to approximately 45 minutes, with the Lightsail instance snapshot (AWS-managed, ~15 minutes) forming the longest single task.

S3 Bucket Inventory and Organization

The 45 JADA-related buckets were organized into logical categories within the snapshot:

  • Production site buckets: Content distribution for queenofsandiego.com, sailjada.com, salejada.com
  • Staging buckets: Dedicated staging copies with _staging suffix in production buckets or separate CloudFront origins
  • Media and asset buckets: Product images, user uploads, archive materials
  • CloudFront origin buckets: Cache sources for CDN distributions
  • Lambda function source buckets: Deployment packages and layer storage
  • Operational buckets: Logs, monitoring data, temporary processing

Bucket syncing used conditional flags to skip unnecessarily large log files and previous snapshots, reducing bandwidth:

aws s3 sync s3://bucket-name /snapshot/v1.0/s3-buckets/bucket-name \
  --exclude "logs/*" \
  --exclude "previous-snapshots/*" \
  --exclude ".git/*"

Google Apps Script Project Preservation

Four distinct GAS projects were pulled using Clasp and archived:

  • Main JADA GAS: Core workflow automation and data processing
  • Rady Shell (Current): Active version of Rady school shell scripts
  • Rady Shell (Legacy): Previous implementation for historical reference and potential rollback
  • EYD GAS: Specialized project for EYD workflows

Each project was captured with:

clasp pull [project-id]
# Captures: appsscript.json, all .gs files, manifest structure
# Stored in: /snapshot/v1.0/gas-projects/[project-name]/

Key Infrastructure Details Captured

CloudFront Distributions: 66 distributions across multiple origins, with staged review showing proper origin mapping between production and staging CloudFront instances. Cache invalidation patterns were documented for typical deployment workflows.

Route53 Configuration: 11 hosted zones with DNS records pointing to CloudFront distributions, S3 website endpoints, and API Gateway custom domains. Zone file exports enabled off-site DNS restoration.

Lambda Environment Variables: Captured encrypted environment configuration without exposing secrets. A manifest was created documenting which Lambda functions depend on specific environment variables, enabling downstream recovery procedures.

DynamoDB Tables: 14 tables identified with schemas exported. This enables table recreation if needed, with understanding that data would need separate restore procedures from AWS Backup or point-in-time recovery.

Staging Synchronization Verification

During snapshot creation, a critical verification step ensured staging buckets were synchronized with production. This included:

  • Comparing file counts between production and staging buckets for queenofsandiego.com
  • Validating dedicated staging bucket contents (e.g., bobdylan bucket staging paths)
  • Checking CloudFront staging origin configurations
  • Invalidating staging CloudFront caches to ensure fresh content delivery

Deployment Scripts and Tooling

Critical deployment tools were included in the snapshot:

  • /Users/cb/Documents/repos/tools/update_dashboard.py: Dashboard synchronization and update logic
  • /Users/cb/Documents/repos/tools/release.py: Release automation script handling version management and deployment
  • Memory documents: Workflow tracking and decision logs

What's Next: Recovery Procedures

With v1.0 snapshot complete, the next phase involves documenting recovery procedures for each service layer. This includes:

  • Step-by-step Lambda function redeployment from snapshot code
  • S3 bucket restoration and cache invalidation workflows
  • Route53 DNS restoration procedures
  • GAS project redeployment and version rollback techniques