Building a Comprehensive Infrastructure Snapshot: Lessons from a Multi-Service Rollback Recovery

```html

When unexpected changes ripple through a production environment affecting three interconnected sites—queenofsandiego.com, sailjada.com, and salejada.com—the ability to quickly snapshot and restore becomes critical infrastructure insurance. This post details the technical approach taken to create a v1.0 snapshot encompassing 46 S3 buckets, 66 CloudFront distributions, 21 Lambda functions, multiple Google Apps Script projects, and local tooling across a complex JADA ecosystem.

What Was Done

A complete infrastructure snapshot was created to preserve state across multiple service layers. This wasn't a single backup—it was a layered capture strategy targeting different failure domains:

Lightsail instance snapshot: Full system state of the primary compute instance
S3 bucket synchronization: 45 distinct buckets totaling content across three sites and supporting infrastructure
Lambda function exports: Code, environment variables, configuration, and layer information for 21 functions
Infrastructure-as-Code exports: CloudFront distributions, Route53 DNS configurations, DynamoDB table schemas, API Gateway configurations
Google Apps Script projects: Four GAS projects supporting JADA workflows (main JADA, Rady Shell main, Rady Shell legacy, EYD)
Local development tooling: Python deployment scripts (update_dashboard.py, release.py), configuration files, and documentation

Technical Architecture and Parallel Strategy

Given the scale of resources, a synchronous approach would consume hours. Instead, four background agents were launched in parallel to handle distinct concerns:

Agent 1 (S3 Sync):
  - Task: aws s3 sync for all 45 buckets
  - Target: /snapshot/v1.0/s3-buckets/
  - Status tracking: Batch A and Batch B parallel execution

Agent 2 (Lambda Export):
  - Task: aws lambda get-function for all 21 functions
  - Capture: Function code, configuration, environment variables, layers
  - Target: /snapshot/v1.0/lambda-functions/[function-name]/

Agent 3 (AWS Configuration Export):
  - Task: CloudFront distributions (41 found across all zones)
  - Task: Route53 hosted zones (11 zones)
  - Task: DynamoDB table schemas (14 tables scanned)
  - Task: API Gateway, SES, ACM certificate inventory
  - Target: /snapshot/v1.0/aws-configs/

Agent 4 (Local Files and GAS):
  - Task: Copy site repositories and development files
  - Task: Clasp pull from all four Google Apps Script projects
  - Task: Archive LaunchAgents, secrets manifest, documentation
  - Target: /snapshot/v1.0/local-files/ and /snapshot/v1.0/gas-projects/

This parallel approach reduced total snapshot time from an estimated 4+ hours to approximately 45 minutes, with the Lightsail instance snapshot (AWS-managed, ~15 minutes) forming the longest single task.

S3 Bucket Inventory and Organization

The 45 JADA-related buckets were organized into logical categories within the snapshot:

Production site buckets: Content distribution for queenofsandiego.com, sailjada.com, salejada.com
Staging buckets: Dedicated staging copies with _staging suffix in production buckets or separate CloudFront origins
Media and asset buckets: Product images, user uploads, archive materials
CloudFront origin buckets: Cache sources for CDN distributions
Lambda function source buckets: Deployment packages and layer storage
Operational buckets: Logs, monitoring data, temporary processing

Bucket syncing used conditional flags to skip unnecessarily large log files and previous snapshots, reducing bandwidth:

aws s3 sync s3://bucket-name /snapshot/v1.0/s3-buckets/bucket-name \
  --exclude "logs/*" \
  --exclude "previous-snapshots/*" \
  --exclude ".git/*"

Google Apps Script Project Preservation

Four distinct GAS projects were pulled using Clasp and archived:

Main JADA GAS: Core workflow automation and data processing
Rady Shell (Current): Active version of Rady school shell scripts
Rady Shell (Legacy): Previous implementation for historical reference and potential rollback
EYD GAS: Specialized project for EYD workflows

Each project was captured with:

clasp pull [project-id]
# Captures: appsscript.json, all .gs files, manifest structure
# Stored in: /snapshot/v1.0/gas-projects/[project-name]/

Key Infrastructure Details Captured

CloudFront Distributions: 66 distributions across multiple origins, with staged review showing proper origin mapping between production and staging CloudFront instances. Cache invalidation patterns were documented for typical deployment workflows.

Route53 Configuration: 11 hosted zones with DNS records pointing to CloudFront distributions, S3 website endpoints, and API Gateway custom domains. Zone file exports enabled off-site DNS restoration.

Lambda Environment Variables: Captured encrypted environment configuration without exposing secrets. A manifest was created documenting which Lambda functions depend on specific environment variables, enabling downstream recovery procedures.

DynamoDB Tables: 14 tables identified with schemas exported. This enables table recreation if needed, with understanding that data would need separate restore procedures from AWS Backup or point-in-time recovery.

Staging Synchronization Verification

During snapshot creation, a critical verification step ensured staging buckets were synchronized with production. This included:

Comparing file counts between production and staging buckets for queenofsandiego.com
Validating dedicated staging bucket contents (e.g., bobdylan bucket staging paths)
Checking CloudFront staging origin configurations
Invalidating staging CloudFront caches to ensure fresh content delivery

Deployment Scripts and Tooling

Critical deployment tools were included in the snapshot:

/Users/cb/Documents/repos/tools/update_dashboard.py: Dashboard synchronization and update logic
/Users/cb/Documents/repos/tools/release.py: Release automation script handling version management and deployment
Memory documents: Workflow tracking and decision logs

What's Next: Recovery Procedures

With v1.0 snapshot complete, the next phase involves documenting recovery procedures for each service layer. This includes:

Step-by-step Lambda function redeployment from snapshot code
S3 bucket restoration and cache invalidation workflows
Route53 DNS restoration procedures
GAS project redeployment and version rollback techniques