Injecting Structured Data at Scale: Automating Event Page SEO Across Multi-Site Infrastructure
One of the most common—and costly—SEO mistakes is deploying hundreds of pages without structured data. We just fixed this across 12 concert event pages by building an automated injection pipeline. Here's how we did it, why it matters, and what the infrastructure looks like now.
The Problem: Invisible Event Data
Event pages across our Rady Shell Events subdomains were being indexed by Google, but the search engine had no semantic understanding of what made them valuable: dates, ticket prices, venue info, performer names, ratings. Without JSON-LD markup, these pages were competing as generic text documents instead of rich, structured events.
A manual audit revealed zero structured data across all active concert pages. With multiple event subdomains (each with its own S3 bucket and CloudFront distribution), manual markup wasn't scalable.
Technical Architecture: The Injection Pipeline
Source Code: Structured Data Injection Script
We built /Users/cb/Documents/repos/tools/inject_structured_data.py to automate this process. The script:
- Reads HTML files from Rady Shell Events directories
- Parses event metadata (title, date, performer, venue, price) from the page DOM
- Generates two JSON-LD blocks:
Eventschema andLocalBusinessschema - Injects them into the
<head>tag before any analytics or tracking scripts - Validates the output against Google's Structured Data Testing Tool format
The script targets the Rady Shell Events directory structure:
/Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/
├── index.html (main events listing)
├── [event-subdomain-1]/
│ └── [year]/
│ └── [event-slug]/
│ └── index.html
└── [event-subdomain-2]/
└── [year]/
└── [event-slug]/
└── index.html
Why this approach: Centralizing injection logic prevents markup inconsistencies and makes future updates (new schema versions, additional metadata) a single-file change rather than manual edits across 12+ pages.
Schema Implementation Details
Each injected <script type="application/ld+json"> block contains:
- Event schema: name, startDate (ISO 8601), endDate, location (with address and geo), performer details, offers (price, currency, availability), organizer (JADA), aggregateRating if applicable
- LocalBusiness schema: name (venue name), address, telephone, geo coordinates, aggregateRating from Google Business Profile
Placement was critical. We injected immediately after the <head> opening tag and before Google Analytics (GA4), ensuring search engines parse structured data before any client-side execution.
Deployment Infrastructure: Multi-Site Synchronization
S3 Bucket Architecture
Event subdomains are distributed across separate S3 buckets for isolation and independent scaling:
paulsimonradyshell.com→ S3 bucket:paulsimonradyshell.comsailjada.queenofsandiego.com→ S3 bucket:sailjada.queenofsandiego.com- Additional event subdomains follow the same pattern:
[subdomain-name]-rady-shell-events
We synced updated HTML files using AWS CLI with cache-invalidation flags:
aws s3 sync /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/ \
s3://paulsimonradyshell.com/ \
--exclude "*" \
--include "index.html" \
--metadata-directive REPLACE \
--cache-control "max-age=300,public"
Why short cache TTL: Event details (dates, pricing, availability) change frequently. A 5-minute cache allows near-real-time updates without forcing full cache invalidation.
CloudFront Distribution Invalidation
After S3 sync, we invalidated CloudFront edge caches to prevent stale versions from serving to users. Each event subdomain has its own CloudFront distribution:
paulsimonradyshell.com→ Distribution ID:E2ABCD1234XYZ(example)sailjada.queenofsandiego.com→ Distribution ID:E9WXYZ5678ABC(example)
Invalidation pattern:
aws cloudfront create-invalidation \
--distribution-id E2ABCD1234XYZ \
--paths "/*/index.html" "/index.html"
We used wildcard paths to cover all event subdirectories rather than listing each file individually, reducing API calls and simplifying the deployment script.
Key Decisions and Rationale
Why Automate Instead of Manual Markup?
At scale, manual structured data maintenance becomes a liability:
- Consistency: Automated generation eliminates typos and formatting errors across 12+ pages
- Maintenance: When schema standards evolve (Google adds new recommended fields), one script update fixes all pages
- Scalability: New event pages can be auto-scaffolded with correct markup
- Auditability: The injection script is version-controlled; you can diff schema changes across commits
Why JSON-LD Over Microdata?
JSON-LD was chosen over embedded microdata attributes because:
- Decoupled from HTML structure—safe to inject without modifying existing DOM
- Easier to validate against Google's Structured Data Testing Tool
- Search engines prefer JSON-LD; it's the recommended format in Google's developer documentation
Why Separate S3 Buckets Per Domain?
While a monolithic bucket would technically work, separate buckets provide:
- Independent cache invalidation (one site's updates don't flush another's CloudFront)
- Cleaner access control and billing attribution per domain
- Future flexibility for third-party vendor access or separate CDN configuration
Validation and Testing
After injection, we validated using Google's Rich Results Test (not just the deprecated Structured Data Testing Tool). Key checks:
- Event schema detected and marked as "Valid"
- Dates parsed correctly in ISO 8601 format
- Price and currency aligned
- No validation errors or warnings blocking rich result eligibility
Sample validation command (local testing before deployment):
python3 inject_structured_data.py \
--input