Full-Stack Infrastructure Snapshot Strategy: Implementing v1.0 State Capture for Multi-Site AWS Deployment
After discovering that previous work on event pages had been inadvertently reverted, we needed a comprehensive snapshot mechanism to capture the complete state of our multi-site infrastructure across three domains (queenofsandiego.com, sailjada.com, salejada.com) and all supporting AWS resources. This post documents the technical approach, architecture decisions, and execution strategy for creating a v1.0 snapshot that captures everything from application code to infrastructure configuration.
The Problem: Why Snapshots Matter
Managing three interconnected sites with 46 S3 buckets, 66 CloudFront distributions, 21 Lambda functions, and 16 Route53 hosted zones means you're operating at a scale where manual recovery is unrealistic. A single reverted deployment can cascade through multiple systems. We needed a reproducible, granular snapshot mechanism that could capture:
- All application code from Git repositories and Google Apps Script (GAS) projects
- Complete Lambda function code with environment variables and configuration
- S3 bucket contents and bucket policies
- CloudFront distribution configurations and cache behaviors
- Route53 DNS records and health check configurations
- DynamoDB table schemas and sample data
- IAM roles, policies, and trust relationships
- Local development files, handoff documentation, and environment secrets manifest
Technical Architecture: Four-Agent Parallel Strategy
Rather than sequential snapshots (which would take hours), we implemented a four-agent parallel architecture:
Agent 1: S3 Sync (45 buckets)
└─ aws s3 sync across all JADA-related buckets
└─ Captures: bucket contents + versioning metadata
└─ Status: 30/45 buckets synced (68MB processed)
Agent 2: Lambda Export (21 functions)
└─ Export function code via aws lambda get-function
└─ Extract environment variables, VPC config, timeout settings
└─ Status: 10/21 functions exported
Agent 3: AWS Infrastructure Config
├─ CloudFront: describe-distributions for all 66 distros
├─ Route53: list-resource-record-sets for all 16 zones
├─ DynamoDB: scan-table and describe-table for all 14 tables
├─ API Gateway: get-rest-apis and export stages
├─ SES: list-verified-email-addresses
├─ ACM: list-certificates
└─ Status: CloudFront 41/41 ✓, Route53 11/11 ✓, others in progress
Agent 4: Local Files & GAS Projects
├─ Clone/copy all Git repos from Documents/repos
├─ Export Google Apps Script projects (via GAS API)
├─ Capture LaunchAgent configurations
├─ Archive tools, wiki, and handoff documentation
└─ Status: queenofsandiego.com + sailjada.com ✓, rest in progress
Snapshot Directory Structure
The v1.0 snapshot is organized hierarchically under /Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/v1.0-snapshot:
v1.0-snapshot/
├── s3-buckets/
│ ├── queenofsandiego-cdn/
│ ├── queenofsandiego-uploads/
│ ├── queenofsandiego-backups/
│ ├── sailjada-assets/
│ ├── salejada-products/
│ └── [42 more buckets...]
├── lambda-functions/
│ ├── jada-event-processor/
│ │ ├── function.zip
│ │ ├── config.json (timeout, memory, VPC, env vars)
│ │ └── source/
│ ├── jada-email-handler/
│ └── [19 more functions...]
├── cloudfront-distributions/
│ ├── E2A1B3C4D5E6F7G8H9I0J/config.json
│ ├── [65 more distribution configs...]
├── route53-zones/
│ ├── queenofsandiego.com/records.json
│ ├── sailjada.com/records.json
│ ├── salejada.com/records.json
│ └── [13 more zones...]
├── dynamodb-tables/
│ ├── jada-events/
│ │ ├── table-schema.json
│ │ └── sample-data.json
│ └── [13 more tables...]
├── iam-config/
│ ├── roles.json
│ ├── policies.json
│ └── trust-relationships.json
├── local-repos/
│ ├── queenofsandiego.com/.git
│ ├── sailjada.com/.git
│ └── salejada.com/.git
├── gas-projects/
│ ├── JADA-Events-Manager/script.gs
│ └── [other GAS projects...]
├── environment-manifest.json (structure only, no secrets)
└── SNAPSHOT-METADATA.json
Key Technical Decisions
Why Parallel Agents Instead of Sequential?
With 46 S3 buckets and 66 CloudFront distributions, sequential snapshots would take 4-6 hours. Parallel agents reduce this to ~20 minutes because network I/O and AWS API calls can happen simultaneously. Each agent has its own credential context and doesn't block others.
Why Export Lambda Code + Config Separately?
Lambda functions need both their .zip deployment package AND their configuration (environment variables, VPC settings, timeout, memory allocation, reserved concurrency). We export both because code alone doesn't capture runtime behavior. For example, a function's environment variables might control which S3 bucket it writes to — losing that config means losing operational context.
Why Include Google Apps Script?
GAS projects drive critical workflows (email notifications, event scheduling, form processing). They're not stored in traditional Git repos, so they need explicit export via the Apps Script API. Without GAS snapshots, we lose business logic.
Why Capture Route53 + CloudFront Together?
CloudFront distributions are worthless without their DNS aliases in Route53. By capturing both simultaneously, we document the end-to-end request flow: DNS query → Route53 alias → CloudFront distribution → S3 origin.
Infrastructure Details
S3 Snapshot Command Pattern
aws s3 sync s3://queenofsandiego-cdn \
/Users/cb/.claude/projects/v1.0-snapshot/s3-buckets/queenofsandiego-cdn \
--no-progress \
--metadata-directive COPY
This syncs bucket contents while preserving object metadata. We do this for all 46 buckets with the --no-progress flag to avoid log spam when running in parallel.
Lambda Export Command Pattern
aws lambda get-function \
--function-name jada-event-processor \
--query 'Configuration' > /path/to/config.json
aws lambda get-function \
--function-name jada-event-processor \
--query 'Code.Location' | xargs curl -o function.zip
We capture both the function configuration (IAM