Injecting Structured Data at Scale: Automating Event Page SEO Across Multi-Site Infrastructure

```html

One of the most common—and costly—SEO mistakes is deploying hundreds of pages without structured data. We just fixed this across 12 concert event pages by building an automated injection pipeline. Here's how we did it, why it matters, and what the infrastructure looks like now.

The Problem: Invisible Event Data

Event pages across our Rady Shell Events subdomains were being indexed by Google, but the search engine had no semantic understanding of what made them valuable: dates, ticket prices, venue info, performer names, ratings. Without JSON-LD markup, these pages were competing as generic text documents instead of rich, structured events.

A manual audit revealed zero structured data across all active concert pages. With multiple event subdomains (each with its own S3 bucket and CloudFront distribution), manual markup wasn't scalable.

Technical Architecture: The Injection Pipeline

Source Code: Structured Data Injection Script

We built /Users/cb/Documents/repos/tools/inject_structured_data.py to automate this process. The script:

Reads HTML files from Rady Shell Events directories
Parses event metadata (title, date, performer, venue, price) from the page DOM
Generates two JSON-LD blocks: Event schema and LocalBusiness schema
Injects them into the <head> tag before any analytics or tracking scripts
Validates the output against Google's Structured Data Testing Tool format

The script targets the Rady Shell Events directory structure:

/Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/
├── index.html (main events listing)
├── [event-subdomain-1]/
│   └── [year]/
│       └── [event-slug]/
│           └── index.html
└── [event-subdomain-2]/
    └── [year]/
        └── [event-slug]/
            └── index.html

Why this approach: Centralizing injection logic prevents markup inconsistencies and makes future updates (new schema versions, additional metadata) a single-file change rather than manual edits across 12+ pages.

Schema Implementation Details

Each injected <script type="application/ld+json"> block contains:

Event schema: name, startDate (ISO 8601), endDate, location (with address and geo), performer details, offers (price, currency, availability), organizer (JADA), aggregateRating if applicable
LocalBusiness schema: name (venue name), address, telephone, geo coordinates, aggregateRating from Google Business Profile

Placement was critical. We injected immediately after the <head> opening tag and before Google Analytics (GA4), ensuring search engines parse structured data before any client-side execution.

Deployment Infrastructure: Multi-Site Synchronization

S3 Bucket Architecture

Event subdomains are distributed across separate S3 buckets for isolation and independent scaling:

paulsimonradyshell.com → S3 bucket: paulsimonradyshell.com
sailjada.queenofsandiego.com → S3 bucket: sailjada.queenofsandiego.com
Additional event subdomains follow the same pattern: [subdomain-name]-rady-shell-events

We synced updated HTML files using AWS CLI with cache-invalidation flags:

aws s3 sync /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/ \
  s3://paulsimonradyshell.com/ \
  --exclude "*" \
  --include "index.html" \
  --metadata-directive REPLACE \
  --cache-control "max-age=300,public"

Why short cache TTL: Event details (dates, pricing, availability) change frequently. A 5-minute cache allows near-real-time updates without forcing full cache invalidation.

CloudFront Distribution Invalidation

After S3 sync, we invalidated CloudFront edge caches to prevent stale versions from serving to users. Each event subdomain has its own CloudFront distribution:

paulsimonradyshell.com → Distribution ID: E2ABCD1234XYZ (example)
sailjada.queenofsandiego.com → Distribution ID: E9WXYZ5678ABC (example)

Invalidation pattern:

aws cloudfront create-invalidation \
  --distribution-id E2ABCD1234XYZ \
  --paths "/*/index.html" "/index.html"

We used wildcard paths to cover all event subdirectories rather than listing each file individually, reducing API calls and simplifying the deployment script.

Key Decisions and Rationale

Why Automate Instead of Manual Markup?

At scale, manual structured data maintenance becomes a liability:

Consistency: Automated generation eliminates typos and formatting errors across 12+ pages
Maintenance: When schema standards evolve (Google adds new recommended fields), one script update fixes all pages
Scalability: New event pages can be auto-scaffolded with correct markup
Auditability: The injection script is version-controlled; you can diff schema changes across commits

Why JSON-LD Over Microdata?

JSON-LD was chosen over embedded microdata attributes because:

Decoupled from HTML structure—safe to inject without modifying existing DOM
Easier to validate against Google's Structured Data Testing Tool
Search engines prefer JSON-LD; it's the recommended format in Google's developer documentation

Why Separate S3 Buckets Per Domain?

While a monolithic bucket would technically work, separate buckets provide:

Independent cache invalidation (one site's updates don't flush another's CloudFront)
Cleaner access control and billing attribution per domain
Future flexibility for third-party vendor access or separate CDN configuration

Validation and Testing

After injection, we validated using Google's Rich Results Test (not just the deprecated Structured Data Testing Tool). Key checks:

Event schema detected and marked as "Valid"
Dates parsed correctly in ISO 8601 format
Price and currency aligned
No validation errors or warnings blocking rich result eligibility

Sample validation command (local testing before deployment):

python3 inject_structured_data.py \
  --input