Building a Production Snapshot System for Multi-Site JADA Infrastructure: v1.0 Strategy
After discovering that critical work on event pages had been reverted, we needed a comprehensive snapshot and recovery strategy. This post documents how we built v1.0—a complete point-in-time capture of the JADA ecosystem across three production sites, multiple cloud services, and local development artifacts.
What We Needed to Capture
The JADA infrastructure spans three primary domains:
- queenofsandiego.com — main site with events, products, and dynamic content
- sailjada.com — e-commerce presence
- salejada.com — secondary sales channel
Each site depends on interconnected AWS resources: 46 S3 buckets, 66 CloudFront distributions, 21 Lambda functions, 16 Route53 hosted zones, DynamoDB tables for content, SES for email, and Google Apps Script projects managing backend logic. Local development also included handoff documents, configuration files, memory files, and CLI tool snapshots.
The challenge: capturing all of these atomically to create a recovery point that could restore the entire system if something broke again.
The Snapshot Architecture
We used a parallel, distributed approach with four background agents running simultaneously:
Agent 1: S3 Bucket Synchronization
Synced all 45 S3 buckets to local storage using AWS CLI batch operations. This included:
- Production bucket content for all three sites
- Staging bucket snapshots for comparison
- Build artifacts and deployment caches
- Lambda function code repositories
- Static assets (images, CSS, JavaScript)
We discovered dedicated staging buckets and verified that production-to-staging syncs were complete before snapshot, ensuring staging matched production file counts exactly. This was critical—we had to verify the sync was bidirectional before proceeding.
Agent 2: Lambda Function Export
Extracted all 21 Lambda functions including:
- Function code (as ZIP archives)
- Runtime configurations and memory allocations
- Environment variable structures (without values, for security)
- IAM role attachments and permissions
- VPC configurations and security group associations
- Trigger configurations (API Gateway, S3, EventBridge)
We captured environment variable names and structures but deliberately excluded values—this allows reconstruction without exposing secrets. Each Lambda was exported with its associated CloudWatch Logs group names and retention policies.
Agent 3: AWS Service Configurations
Pulled complete configuration exports for:
- CloudFront: All 66 distributions with origin configurations, cache behaviors, SSL certificates, and custom domain mappings
- Route53: All 16 hosted zones with complete DNS record sets (A, CNAME, MX, TXT, NS records)
- DynamoDB: Table schemas, global secondary indexes, and stream configurations (14 tables identified)
- ACM: SSL/TLS certificate inventory with domain names and renewal status
- API Gateway: REST API definitions, stages, models, and authorization configurations
- SES: Verified email identities and sending limits
- IAM: Role definitions, trust relationships, and policy documents
CloudFront was particularly important—we documented the origin bucket mappings for each distribution, which informed our staging workflow design (e.g., verifying that staging CloudFront origins pointed to the correct staging S3 buckets).
Agent 4: Google Apps Script & Local Artifacts
Used clasp pull to export Google Apps Script projects:
- Main JADA GAS project (core backend logic)
- Rady Shell replacement GAS
- Rady Shell old GAS (legacy, preserved for reference)
- EYD GAS project
Each GAS project was pulled, copied to the snapshot directory structure, and verified. We also captured:
- Local site repositories (
/Users/cb/Documents/repos/) - Memory and feedback documents tracking decisions
- CLI tools and utility scripts
- LaunchAgent configurations for background automation
- Development notes and architecture diagrams
Technical Decisions & Rationale
Why Parallel Agents Over Sequential Export
AWS API rate limits and network I/O made sequential export impractical. Running four agents in parallel reduced total snapshot time from ~2 hours (sequential) to ~45 minutes. Each agent operated independently with its own IAM permissions and output directory.
Why Snapshot Staging Alongside Production
We discovered a critical workflow: QOS had a dedicated staging CloudFront distribution pointing to a _staging subfolder in the production S3 bucket. By capturing both production and staging, we could:
- Verify staging matched production file counts (quality assurance)
- Recover either production or staging state independently
- Track divergence between staging and production for debugging
This revealed a staging workflow pattern: edits go to staging first, are reviewed, then promoted to production via CloudFront invalidation and S3 sync.
Why Include GAS Projects
Google Apps Script contains server-side logic that isn't version-controlled in Git. By pulling all GAS projects directly from Google Drive via clasp, we preserved:
- Backend form handlers
- Email automation logic
- Database query functions
- Integration code with third-party APIs
GAS is often overlooked in disaster recovery—including it here ensures we can restore business logic, not just static content.
Snapshot Manifest & Structure
The v1.0 snapshot was organized as:
v1.0/
├── s3-buckets/ # All 45 S3 bucket contents
├── lambda-functions/ # 21 Lambda function ZIPs + configs
├── cloudfront-configs/ # 66 distribution configurations (JSON)
├── route53-zones/ # 16 hosted zones (JSON)
├── dynamodb-schemas/ # 14 table definitions
├── gas-projects/ # All GAS code pulls
│ ├── main-jada/
│ ├── rady-replacement/
│ ├── rady-old/
│ └── eyd/
├── local-repos/ # Development repositories
├── env-structures/ # Environment variable names (no values)
├── iam-roles/ # Role definitions
├── acm-certificates/ # Certificate inventory
└── MANIFEST.md # Complete index with file counts
The MANIFEST.md documented every resource: bucket names, CloudFront distribution IDs, Lambda ARNs, Route53 zone IDs, and file counts for verification.
Why This Matters