Injecting Structured Data Into Event Pages: A Multi-Site JSON-LD Strategy for Concert Venue Discovery

```html

This session tackled a critical SEO gap across our event subdomain network: 12 concert pages were live and generating traffic, but completely invisible to search engines as structured events. By implementing automated JSON-LD injection and deploying updates across multiple S3 buckets with CloudFront invalidation, we recovered significant discoverability potential without touching a single HTML template.

The Problem: Dark Data in Plain Sight

Our event subdomains—including paulsimonradyshell.com, sailjada.queenofsandiego.com, and others—were hosting beautifully designed concert landing pages. Google had indexed them. Organic traffic was flowing in. But the pages contained zero machine-readable event metadata. Search engines were parsing raw text to understand dates, locations, and ticket availability instead of consuming properly structured Event and LocalBusiness JSON-LD schemas.

The audit revealed: 0% of active event pages had structured data. This meant:

No rich snippets in SERPs (no star ratings, date, venue name in preview)
No Google Event eligibility for the event carousel
Schema validators flagging every page as incomplete
Missing opportunity for voice search and smart speakers to surface event details

Solution Architecture: Automated Injection Pipeline

Rather than manually editing 12 pages across 3 separate S3 buckets, we built a Python-based injection system:

File: /Users/cb/Documents/repos/tools/inject_structured_data.py

The script follows a three-phase pattern:

Phase 1: Schema Generation

For each page, the script generates two complementary schemas:

Event Schema: Captures concert name, date, startTime, endTime, location (with PostalAddress), ticket URL, and performer details
LocalBusiness Schema: Anchors the venue with address, telephone, and aggregate rating (pulled from review counts)

Both are wrapped in a single <script type="application/ld+json"> block inserted into the document <head>, immediately after the <title> tag and before any stylesheets. This placement ensures:

Search engine bots parse it before rendering
No DOM interference with JavaScript frameworks
Clean semantic separation from presentation

Phase 2: Page-Specific Metadata Extraction

The script reads each HTML file from disk and extracts:

Event name and date: Parsed from <h1>, <h2>, and meta tags
Venue address: Extracted from visible text blocks (regex pattern matching for "San Diego, CA")
Ticket URL: Found via href attributes containing "ticket", "eventbrite", or "ticketmaster"
Performer/artist names: Parsed from page headers and content sections

For pages where metadata wasn't reliably extractable via regex, manual data enrichment was applied (stored as YAML configuration per subdomain).

Phase 3: Injection and Validation

The script inserts the JSON-LD block, then validates:

Valid JSON syntax (no serialization errors)
Required Event properties present (@type, name, startDate, location)
URL formats are absolute, not relative paths
Schema.org vocabulary compliance

If validation fails, the page is skipped and logged for manual review.

Deployment Infrastructure

S3 Buckets: Each event subdomain has a dedicated bucket for static HTML serving:

paulsimonradyshell.com → Bucket: paulsimonradyshell-site-bucket
sailjada.queenofsandiego.com → Bucket: sailjada-qos-event-bucket
Additional event subdomains follow the same naming convention

CloudFront Distribution IDs: Each bucket sits behind a CloudFront distribution for caching and edge delivery:

paulsimonradyshell.com → Distribution: E2ABCD1234XYZ (example ID)
sailjada.queenofsandiego.com → Distribution: E3EFGH5678UVW (example ID)

After uploading updated pages to S3, we invalidated affected paths in CloudFront using the AWS CLI:

aws cloudfront create-invalidation \
  --distribution-id E2ABCD1234XYZ \
  --paths "/index.html" "/event-pages/*" \
  --region us-east-1

This ensures global edge caches refresh within 60 seconds, preventing stale pages from serving to search engine crawlers.

Key Technical Decisions

Why JSON-LD and Not Microdata?

We selected JSON-LD over HTML microdata (itemprop attributes) because:

Non-invasive: Doesn't require modifying HTML markup—critical when managing multiple template files
Maintainable: Schema changes don't require re-rendering pages; just update the JSON block
Google preference: Google's own documentation recommends JSON-LD as the primary format for dynamic content
Validation tooling: JSON-LD schemas are immediately testable with Google's Rich Results Test without page load

Head vs. Body Placement

We placed JSON-LD in the <head> rather than before </body> because:

Googlebot's mobile renderer processes <head> before JavaScript execution
Reduces perceived cumulative layout shift (no schema insertion affecting visible content)
Schema.org spec treats <head> and <body> identically, but head-placement avoids async timing issues

Handling Multi-Site Consistency

Event pages are scattered across multiple repos and subdomains. Rather than create site-specific injection scripts, we centralized logic in /Users/cb/Documents/repos/tools/inject_structured_data.py and made it parameterizable:

Config file per site: YAML mapping event page paths to metadata (fallback when regex extraction fails)
Template-agnostic: Script works on any HTML file regardless of source (hand-coded, template-generated, etc.)
Dry-run mode: Test injection without