Injecting Structured Data Into Event Pages: A Multi-Site JSON-LD Strategy for Concert Venue Discovery
This session tackled a critical SEO gap across our event subdomain network: 12 concert pages were live and generating traffic, but completely invisible to search engines as structured events. By implementing automated JSON-LD injection and deploying updates across multiple S3 buckets with CloudFront invalidation, we recovered significant discoverability potential without touching a single HTML template.
The Problem: Dark Data in Plain Sight
Our event subdomains—including paulsimonradyshell.com, sailjada.queenofsandiego.com, and others—were hosting beautifully designed concert landing pages. Google had indexed them. Organic traffic was flowing in. But the pages contained zero machine-readable event metadata. Search engines were parsing raw text to understand dates, locations, and ticket availability instead of consuming properly structured Event and LocalBusiness JSON-LD schemas.
The audit revealed: 0% of active event pages had structured data. This meant:
- No rich snippets in SERPs (no star ratings, date, venue name in preview)
- No Google Event eligibility for the event carousel
- Schema validators flagging every page as incomplete
- Missing opportunity for voice search and smart speakers to surface event details
Solution Architecture: Automated Injection Pipeline
Rather than manually editing 12 pages across 3 separate S3 buckets, we built a Python-based injection system:
File: /Users/cb/Documents/repos/tools/inject_structured_data.py
The script follows a three-phase pattern:
Phase 1: Schema Generation
For each page, the script generates two complementary schemas:
- Event Schema: Captures concert name, date, startTime, endTime, location (with PostalAddress), ticket URL, and performer details
- LocalBusiness Schema: Anchors the venue with address, telephone, and aggregate rating (pulled from review counts)
Both are wrapped in a single <script type="application/ld+json"> block inserted into the document <head>, immediately after the <title> tag and before any stylesheets. This placement ensures:
- Search engine bots parse it before rendering
- No DOM interference with JavaScript frameworks
- Clean semantic separation from presentation
Phase 2: Page-Specific Metadata Extraction
The script reads each HTML file from disk and extracts:
- Event name and date: Parsed from
<h1>,<h2>, and meta tags - Venue address: Extracted from visible text blocks (regex pattern matching for "San Diego, CA")
- Ticket URL: Found via
hrefattributes containing "ticket", "eventbrite", or "ticketmaster" - Performer/artist names: Parsed from page headers and content sections
For pages where metadata wasn't reliably extractable via regex, manual data enrichment was applied (stored as YAML configuration per subdomain).
Phase 3: Injection and Validation
The script inserts the JSON-LD block, then validates:
- Valid JSON syntax (no serialization errors)
- Required Event properties present (@type, name, startDate, location)
- URL formats are absolute, not relative paths
- Schema.org vocabulary compliance
If validation fails, the page is skipped and logged for manual review.
Deployment Infrastructure
S3 Buckets: Each event subdomain has a dedicated bucket for static HTML serving:
paulsimonradyshell.com→ Bucket:paulsimonradyshell-site-bucketsailjada.queenofsandiego.com→ Bucket:sailjada-qos-event-bucket- Additional event subdomains follow the same naming convention
CloudFront Distribution IDs: Each bucket sits behind a CloudFront distribution for caching and edge delivery:
paulsimonradyshell.com→ Distribution:E2ABCD1234XYZ(example ID)sailjada.queenofsandiego.com→ Distribution:E3EFGH5678UVW(example ID)
After uploading updated pages to S3, we invalidated affected paths in CloudFront using the AWS CLI:
aws cloudfront create-invalidation \
--distribution-id E2ABCD1234XYZ \
--paths "/index.html" "/event-pages/*" \
--region us-east-1
This ensures global edge caches refresh within 60 seconds, preventing stale pages from serving to search engine crawlers.
Key Technical Decisions
Why JSON-LD and Not Microdata?
We selected JSON-LD over HTML microdata (itemprop attributes) because:
- Non-invasive: Doesn't require modifying HTML markup—critical when managing multiple template files
- Maintainable: Schema changes don't require re-rendering pages; just update the JSON block
- Google preference: Google's own documentation recommends JSON-LD as the primary format for dynamic content
- Validation tooling: JSON-LD schemas are immediately testable with Google's Rich Results Test without page load
Head vs. Body Placement
We placed JSON-LD in the <head> rather than before </body> because:
- Googlebot's mobile renderer processes
<head>before JavaScript execution - Reduces perceived cumulative layout shift (no schema insertion affecting visible content)
- Schema.org spec treats
<head>and<body>identically, but head-placement avoids async timing issues
Handling Multi-Site Consistency
Event pages are scattered across multiple repos and subdomains. Rather than create site-specific injection scripts, we centralized logic in /Users/cb/Documents/repos/tools/inject_structured_data.py and made it parameterizable:
- Config file per site: YAML mapping event page paths to metadata (fallback when regex extraction fails)
- Template-agnostic: Script works on any HTML file regardless of source (hand-coded, template-generated, etc.)
- Dry-run mode: Test injection without