Building a Comprehensive Infrastructure Snapshot: Lessons from Multi-Tenant AWS Architecture

```html

When working with complex, interconnected AWS infrastructure across multiple properties—in this case, three distinct e-commerce and content sites sharing infrastructure, databases, and serverless functions—a single misconfiguration or unintended reversion can cascade across the entire system. This post documents the architectural approach and technical execution of creating a comprehensive v1.0 snapshot of JADA-related infrastructure, capturing not just current state but the dependencies, configurations, and code that make the system work.

The Problem: Distributed State Across Multiple Services

The infrastructure in question spans:

46 S3 buckets — containing static assets, backups, staging environments, and archived content
66 CloudFront distributions — CDN layer for the three primary sites and various subdomains
21 AWS Lambda functions — serverless compute for API endpoints, event handlers, and background jobs
16 Route53 hosted zones — DNS management for primary domains and subdomains
14 DynamoDB tables — persistent state across the system
Google Apps Script (GAS) projects — four separate GAS codebases controlling business logic
Local configuration — environment variables, LaunchAgents, site code, handoff documentation
Lightsail instance — traditional compute layer for legacy systems

The challenge: there was no single source of truth capturing all of these pieces simultaneously. A rollback or misconfiguration in one layer (e.g., CloudFront cache invalidation, S3 replication rules) could silently break others.

Technical Architecture of the Snapshot Strategy

Parallel Agent-Based Export

Rather than sequential downloads (which would take hours), the snapshot process used four parallel background agents, each responsible for a distinct layer:

Agent 1: S3 Sync — aws s3 sync all 45 buckets to local snapshot directories, preserving folder structure
Agent 2: Lambda Export — aws lambda get-function for each function, extracting code ZIP, environment variables, concurrency settings, and IAM role attachments
Agent 3: AWS Metadata Export — CloudFront config (aws cloudfront get-distribution-config), Route53 records (aws route53 list-resource-record-sets), DynamoDB schema (aws dynamodb describe-table), SES configuration, API Gateway stage variables
Agent 4: Local State Capture — site source code, environment files (sanitized), LaunchAgent plist files, Google Apps Script projects via Clasp

Google Apps Script Integration

GAS projects cannot be exported through AWS APIs. Instead, the snapshot used clasp pull to fetch code from the four GAS projects:

clasp pull "JADA Main Project"
clasp pull "Rady Shell Replacement"
clasp pull "Rady Shell Old"
clasp pull "EYD GAS Project"

These were copied into the snapshot structure under /snapshot/v1.0/gas/ with each project in its own subdirectory, preserving file relationships and deployment metadata.

Snapshot Directory Structure

The v1.0 snapshot was organized hierarchically:

snapshot/v1.0/
├── s3_buckets/
│   ├── queenofsandiego-prod/
│   ├── queenofsandiego-staging/
│   ├── sailjada-prod/
│   ├── sailjada-staging/
│   ├── salejada-prod/
│   ├── salejada-staging/
│   ├── [43 additional buckets organized by purpose]
├── lambda/
│   ├── jada-api-v1/
│   │   ├── function-config.json
│   │   ├── environment-vars.json
│   │   ├── code.zip
│   │   └── iam-role-policy.json
│   ├── [20 additional Lambda functions]
├── cloudfront/
│   ├── distribution-configs/
│   │   ├── E1ABC2DEF3GHI-config.json
│   │   └── [65 additional distributions]
├── route53/
│   ├── hosted-zones.json
│   └── record-sets/
│       ├── zone-ABC123.json
│       └── [15 additional zones]
├── dynamodb/
│   ├── table-schemas.json
│   ├── jada-events-table-schema.json
│   └── [13 additional table definitions]
├── gas/
│   ├── jada-main/
│   ├── rady-shell-replacement/
│   ├── rady-shell-old/
│   └── eyd-project/
├── local-state/
│   ├── queenofsandiego-com/
│   ├── sailjada-com/
│   ├── salejada-com/
│   ├── environment-vars-manifest.json
│   ├── launchagents/
│   └── secrets-manifest.md
└── MANIFEST.md

Key Technical Decisions

Why Parallel Agents Instead of Sequential Export?

S3 bucket syncs alone span 45 buckets with terabytes of data. Sequential export would take 8-12 hours. Parallel agents reduced this to ~45 minutes by leveraging system concurrency. Each agent was isolated to prevent credential contention and rate-limiting issues with AWS APIs.

Why Include CloudFront Configs Separately?

CloudFront distribution configurations include cache behaviors, origin settings, function associations, and WAF rules. These cannot be reconstructed from S3 alone. A distribution config export (aws cloudfront get-distribution-config) captures the entire distribution state and is idempotent for recovery.

Why Capture DynamoDB Schema, Not Data?

DynamoDB tables store millions of events and transactional records. Exporting table schemas (via aws dynamodb describe-table) captures GSI configurations, TTL settings, stream specifications, and billing mode without massive data transfers. Production data can be recovered from continuous backups or point-in-time recovery features.

Why Sanitize Local Environment Files?

Environment variable files and configuration secrets were captured in a manifest format that lists variable names and types without exposing actual values. This allows rapid verification of what configurations exist without storing credentials in the snapshot.

Validation and Manifest Generation

After all agents completed, a MANIFEST.md was generated documenting:

Total snapshot size (breakdown by component)
File counts per S3 bucket (to verify completeness)
Lambda function count and total code size
CloudFront distribution count and cache behavior rules
Route53 record count per hosted zone
DynamoDB table schema details
GAS project structure and file counts
Timestamp and agent completion status

This manifest serves as a quick reference to validate that the snapshot is complete and to track changes between versions