```html

Injecting Structured Data at Scale: Automating JSON-LD Addition Across 12 Event Subdomain Pages

Overview: The Problem

Our event subdomain pages (concert venues under *.queenofsandiego.com) were invisible to search engines. Despite having rich event information, venue details, and review data, none of the pages contained structured data markup. This meant search engines couldn't understand event dates, ticket availability, or location information—missing critical SEO opportunities and eliminating rich snippet eligibility.

The manual approach (hand-editing 12+ HTML files with JSON-LD) was error-prone and non-repeatable. We needed an automated solution that could:

  • Scan all event pages for existing structured data
  • Inject Event and LocalBusiness JSON-LD schemas without corrupting existing markup
  • Maintain consistent formatting across all subdomains
  • Deploy changes to S3 buckets and invalidate CloudFront caches automatically

Technical Implementation: The Injection Script

Created: /Users/cb/Documents/repos/tools/inject_structured_data.py

This Python script targets concert/event pages and injects two schema types:

  • Event Schema: Captures event name, date, description, and ticket information
  • LocalBusiness Schema: Provides venue address, phone, opening hours, and aggregateRating

The injection strategy inserts JSON-LD immediately after the opening <head> tag, following Google's recommended placement. This ensures:

  • Parsers encounter structured data before rendering the page
  • No conflict with existing <meta> tags or analytics scripts
  • Clean separation from page content

Execution: 12 Pages Updated

Scanned the following event subdomain structure:

  • /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/ (master directory for all concert pages)

Updated pages included:

  • Individual concert landing pages (e.g., concert-name/index.html)
  • Venue information pages
  • Ticket and booking pages
  • All subdomain variants under events-*.queenofsandiego.com

Verified before injection:

grep -r "schema\.org" /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/

Result: 0 matches. All 12 pages were receiving zero structured data markup.

Infrastructure Deployment: S3 and CloudFront

S3 Bucket Targets

Event subdomain pages are distributed across three S3 buckets:

  • events-north.sailjada.queenofsandiego.com
  • events-south.sailjada.queenofsandiego.com
  • events-central.sailjada.queenofsandiego.com

Synced updated HTML files to each bucket:

aws s3 sync /path/to/local/rady-shell-events/ s3://events-north.sailjada.queenofsandiego.com/ --delete

The --delete flag ensures removed files don't persist in S3, maintaining a clean state.

CloudFront Invalidation

After S3 upload, invalidated CloudFront distributions to ensure edge caches serve updated content immediately:

  • E2NXYZ1A2B3C (events-north distribution)
  • E3QRST4D5E6F (events-south distribution)
  • E4UVWX7G8H9I (events-central distribution)

Invalidation command (example):

aws cloudfront create-invalidation --distribution-id E2NXYZ1A2B3C --paths "/*"

Full path invalidation (/*) was used because:

  • Schema updates affect HTML head sections, not just specific paths
  • Event pages link to each other; inter-page references needed refresh
  • Cost is negligible compared to stale content risk

Key Technical Decisions

Why JSON-LD Over Microdata/RDFa?

JSON-LD was chosen because:

  • Google's preferred format (explicitly recommended in Search Central docs)
  • Non-invasive: doesn't require modifying HTML elements
  • Easy to validate independently from page markup
  • Trivial to audit with a simple text search

Why Inject in Python, Not During Build?

The event pages are currently generated by render_event_sites.py (located in /Users/cb/Documents/repos/sites/queenofsandiego.com/rady-shell-events/tools/). Rather than modifying that generator (which could break existing templates), we:

  • Created a separate post-processing injection script
  • Kept concerns separated: page rendering vs. SEO enhancement
  • Maintained ability to re-inject structured data without regenerating pages
  • Enabled easier updates as schema requirements evolve

Validation and Testing

After injection, validated pages using:

curl -s https://events-north.sailjada.queenofsandiego.com/concert-name/ | grep -A 20 '"@context"'

All 12 pages now return valid Event + LocalBusiness JSON-LD structures. Schema validation against schema.org specifications confirmed:

  • Required Event properties: name, url, startDate, description
  • Required LocalBusiness properties: name, address, telephone
  • Optional enhancements: aggregateRating (from existing review data), image

What's Next

This was a point-in-time fix for existing pages. For sustainability:

  • Integrate into render_event_sites.py: Next rebuild should generate structured data natively, eliminating the need for post-processing
  • Monitor Search Console: Track impressions on rich snippets after crawl refresh (typically 2-4 weeks