Building a Comprehensive Infrastructure Snapshot: Lessons from a Full-Stack Rollback Recovery
When unexpected infrastructure changes rollback weeks of work, the response needs to be systematic and complete. This post documents the technical approach used to create a comprehensive v1.0 snapshot of a distributed three-site platform spanning AWS S3, CloudFront, Lambda, Route53, Google Apps Script, and local tooling infrastructure.
The Problem: Scope of Loss
The original incident affected three interconnected sites:
- queenofsandiego.com — primary site with event pages, brand styling, navigation
- sailjada.com — product catalog and listing pages
- salejada.com — secondary product domain
Work was lost across multiple systems simultaneously: S3 bucket contents, CloudFront cache states, Google Apps Script project versions, and local development files. A point-in-time snapshot across all systems became critical.
Infrastructure Inventory: What Had to Be Captured
Before snapshotting, we enumerated the full infrastructure footprint:
- S3 Buckets: 46 total across production, staging, and archive tiers
- Primary:
queenofsandiego.com,sailjada.com,salejada.com - Staging mirrors:
queenofsandiego.com-staging,sailjada-staging,salejada-staging - Archive/backup buckets following naming convention
*-backup,*-archive - Lambda deployment packages and build artifacts
- Primary:
- CloudFront Distributions: 66 total
- Primary distributions for each domain
- Staging distribution origins pointing to
-stagingbuckets - Legacy distributions (Bob Dylan catalog, Manager Candy support)
- All origin configurations, cache behaviors, invalidation patterns
- Lambda Functions: 21 total
- Edge functions for header injection, authentication, request routing
- Origin request/response handlers
- Viewer request/response processors
- Environment variables, IAM execution roles, VPC configurations
- Route53 Hosted Zones: 16 total
- DNS records, alias configurations, health checks
- CNAME routing to CloudFront distributions
- MX records for SES integration
- Google Apps Script Projects: 4 total
- Main JADA project (
claspID tracking) - Rady Shell replacement automation
- Rady Shell legacy version
- EYD (Events and Yachts Database) project
- Main JADA project (
- Supporting AWS Services
- DynamoDB tables (14 identified) for data persistence
- SES configuration for email delivery
- API Gateway endpoints
- ACM certificates
- IAM roles and policies
- Local Development Infrastructure
/Users/cb/Documents/repos/— primary repository roottools/directory containingupdate_dashboard.py,release.py- LaunchAgent configurations for background automation
- Handoff documentation and runbooks
- Environment configuration files (.env files tracked securely)
Snapshot Strategy: Parallel Execution
Given the volume of data and number of API calls required, we implemented four parallel agents to avoid timeout issues and maximize throughput:
# Agent 1: S3 Bucket Synchronization
# Sync all 46 S3 buckets to local snapshot directory
aws s3 sync s3://[bucket-name]/ /snapshot/v1.0/s3/[bucket-name]/ \
--region us-west-2 \
--no-progress \
> /tmp/s3-sync-[bucket-name].log 2>&1 &
This agent synced all buckets in batches of 8-10 concurrent operations, totaling approximately 68MB of static assets, HTML files, CSS, JavaScript, and images. File count verification was performed after each batch to ensure no objects were skipped.
# Agent 2: Lambda Function Export
# Export all 21 Lambda function code, configuration, and environment
aws lambda get-function \
--function-name [function-name] \
--region us-west-2 \
> /snapshot/v1.0/lambda/[function-name]/config.json
aws lambda get-function-code-signing-config \
--function-arn arn:aws:lambda:us-west-2:[account-id]:function:[function-name] \
> /snapshot/v1.0/lambda/[function-name]/signing-config.json
For each function, we captured: deployment package ZIP, environment variables, VPC configuration, execution role, layers, tags, and code signing configuration.
# Agent 3: AWS Infrastructure Export
# CloudFront distributions with all cache behaviors and origins
aws cloudfront list-distributions --query 'DistributionList.Items' \
> /snapshot/v1.0/cloudfront/distributions-list.json
aws cloudfront get-distribution-config \
--id [distribution-id] \
> /snapshot/v1.0/cloudfront/[distribution-id]-config.json
# Route53 zones and record sets
aws route53 list-hosted-zones \
> /snapshot/v1.0/route53/hosted-zones.json
aws route53 list-resource-record-sets \
--hosted-zone-id [zone-id] \
> /snapshot/v1.0/route53/[zone-id]-records.json
This agent also captured DynamoDB table schemas, SES domain configurations, API Gateway endpoints, and ACM certificate metadata.
# Agent 4: Google Apps Script and Local Files
# Pull latest versions from all GAS projects via clasp
clasp pull -r [JADA-main-project-id]
clasp pull -r [rady-replacement-project-id]
clasp pull -r [rady-legacy-project-id]
clasp pull -r [eyd-project-id]
# Copy to snapshot with source tracking
cp -r /path/to/gas/projects /snapshot/v1.0/gas/
Local development files were captured from /Users/cb/Documents/repos/, including all source code, tooling scripts, and configuration templates (with secrets redacted).
Lightsail Instance Snapshot
In parallel with the data exports, we initiated an AWS Lightsail instance snapshot named jada-agent-v1.0-20260509. This captures the full filesystem, running processes, and system state of any Lightsail-based infrastructure, providing a recovery point independent of S3 and