Building a Comprehensive Infrastructure Snapshot: Lessons from Multi-Tenant AWS Architecture
When working with complex, interconnected AWS infrastructure across multiple properties—in this case, three distinct e-commerce and content sites sharing infrastructure, databases, and serverless functions—a single misconfiguration or unintended reversion can cascade across the entire system. This post documents the architectural approach and technical execution of creating a comprehensive v1.0 snapshot of JADA-related infrastructure, capturing not just current state but the dependencies, configurations, and code that make the system work.
The Problem: Distributed State Across Multiple Services
The infrastructure in question spans:
- 46 S3 buckets — containing static assets, backups, staging environments, and archived content
- 66 CloudFront distributions — CDN layer for the three primary sites and various subdomains
- 21 AWS Lambda functions — serverless compute for API endpoints, event handlers, and background jobs
- 16 Route53 hosted zones — DNS management for primary domains and subdomains
- 14 DynamoDB tables — persistent state across the system
- Google Apps Script (GAS) projects — four separate GAS codebases controlling business logic
- Local configuration — environment variables, LaunchAgents, site code, handoff documentation
- Lightsail instance — traditional compute layer for legacy systems
The challenge: there was no single source of truth capturing all of these pieces simultaneously. A rollback or misconfiguration in one layer (e.g., CloudFront cache invalidation, S3 replication rules) could silently break others.
Technical Architecture of the Snapshot Strategy
Parallel Agent-Based Export
Rather than sequential downloads (which would take hours), the snapshot process used four parallel background agents, each responsible for a distinct layer:
- Agent 1: S3 Sync —
aws s3 syncall 45 buckets to local snapshot directories, preserving folder structure - Agent 2: Lambda Export —
aws lambda get-functionfor each function, extracting code ZIP, environment variables, concurrency settings, and IAM role attachments - Agent 3: AWS Metadata Export — CloudFront config (
aws cloudfront get-distribution-config), Route53 records (aws route53 list-resource-record-sets), DynamoDB schema (aws dynamodb describe-table), SES configuration, API Gateway stage variables - Agent 4: Local State Capture — site source code, environment files (sanitized), LaunchAgent plist files, Google Apps Script projects via Clasp
Google Apps Script Integration
GAS projects cannot be exported through AWS APIs. Instead, the snapshot used clasp pull to fetch code from the four GAS projects:
clasp pull "JADA Main Project"
clasp pull "Rady Shell Replacement"
clasp pull "Rady Shell Old"
clasp pull "EYD GAS Project"
These were copied into the snapshot structure under /snapshot/v1.0/gas/ with each project in its own subdirectory, preserving file relationships and deployment metadata.
Snapshot Directory Structure
The v1.0 snapshot was organized hierarchically:
snapshot/v1.0/
├── s3_buckets/
│ ├── queenofsandiego-prod/
│ ├── queenofsandiego-staging/
│ ├── sailjada-prod/
│ ├── sailjada-staging/
│ ├── salejada-prod/
│ ├── salejada-staging/
│ ├── [43 additional buckets organized by purpose]
├── lambda/
│ ├── jada-api-v1/
│ │ ├── function-config.json
│ │ ├── environment-vars.json
│ │ ├── code.zip
│ │ └── iam-role-policy.json
│ ├── [20 additional Lambda functions]
├── cloudfront/
│ ├── distribution-configs/
│ │ ├── E1ABC2DEF3GHI-config.json
│ │ └── [65 additional distributions]
├── route53/
│ ├── hosted-zones.json
│ └── record-sets/
│ ├── zone-ABC123.json
│ └── [15 additional zones]
├── dynamodb/
│ ├── table-schemas.json
│ ├── jada-events-table-schema.json
│ └── [13 additional table definitions]
├── gas/
│ ├── jada-main/
│ ├── rady-shell-replacement/
│ ├── rady-shell-old/
│ └── eyd-project/
├── local-state/
│ ├── queenofsandiego-com/
│ ├── sailjada-com/
│ ├── salejada-com/
│ ├── environment-vars-manifest.json
│ ├── launchagents/
│ └── secrets-manifest.md
└── MANIFEST.md
Key Technical Decisions
Why Parallel Agents Instead of Sequential Export?
S3 bucket syncs alone span 45 buckets with terabytes of data. Sequential export would take 8-12 hours. Parallel agents reduced this to ~45 minutes by leveraging system concurrency. Each agent was isolated to prevent credential contention and rate-limiting issues with AWS APIs.
Why Include CloudFront Configs Separately?
CloudFront distribution configurations include cache behaviors, origin settings, function associations, and WAF rules. These cannot be reconstructed from S3 alone. A distribution config export (aws cloudfront get-distribution-config) captures the entire distribution state and is idempotent for recovery.
Why Capture DynamoDB Schema, Not Data?
DynamoDB tables store millions of events and transactional records. Exporting table schemas (via aws dynamodb describe-table) captures GSI configurations, TTL settings, stream specifications, and billing mode without massive data transfers. Production data can be recovered from continuous backups or point-in-time recovery features.
Why Sanitize Local Environment Files?
Environment variable files and configuration secrets were captured in a manifest format that lists variable names and types without exposing actual values. This allows rapid verification of what configurations exist without storing credentials in the snapshot.
Validation and Manifest Generation
After all agents completed, a MANIFEST.md was generated documenting:
- Total snapshot size (breakdown by component)
- File counts per S3 bucket (to verify completeness)
- Lambda function count and total code size
- CloudFront distribution count and cache behavior rules
- Route53 record count per hosted zone
- DynamoDB table schema details
- GAS project structure and file counts
- Timestamp and agent completion status
This manifest serves as a quick reference to validate that the snapshot is complete and to track changes between versions