```html

Multi-Site GA4 Audit & Orchestrator-Driven Analytics Pipeline: Architecture & Implementation

What Was Done

We executed a comprehensive Google Analytics 4 (GA4) audit across all Queen of San Diego platform properties, identified tracking gaps, pulled 30-day traffic data programmatically, and fed those findings into an orchestrator system to generate actionable operational recommendations. The audit surface three critical blockers: missing GA Data API service account permissions, unapproved Mother's Day email campaign (4 days to launch), and a Paul Simon promotional blast proof deadline (6 days out).

Technical Details: GA Code Audit Pipeline

The audit ran a recursive HTML file scan across all site repositories to detect GA tracking implementation:


# Pseudo-command structure (run against each site root)
find /Users/cb/Documents/repos/*/public -name "*.html" -type f | \
  xargs grep -l "gtag\|ga(" | \
  xargs grep -E "gtag\(|GA_MEASUREMENT_ID|googletagmanager"

This identified which pages had instrumentation and which were dark. The tool checked for:

  • Global gtag.js injection — presence of Google Tag Manager container tag in <head>
  • Measurement ID binding — GA4 property IDs correctly mapped to each domain
  • Event firing patterns — page_view, view_item, purchase, and custom event tracking
  • Cross-domain tracking setup — proper referrer domain allowlisting for multi-property environments

The audit discovered that while primary marketing pages had GA4 instrumentation, several secondary properties and subdomains lacked tracking codes entirely. This explains why traffic visibility was incomplete and why orchestrator recommendations couldn't be fully data-driven.

GA4 Data API Access & OAuth Flow

To pull programmatic traffic data, we needed to establish GA4 Data API credentials. The existing /Users/cb/Documents/repos/tools/reauth_ga.py script handles OAuth2 token refresh:


# Script structure: reauth_ga.py
# 1. Reads existing client secret from ~/.credentials/ga_client_secret.json
# 2. Uses service account or user OAuth to request access token
# 3. Validates scopes: https://www.googleapis.com/auth/analytics.readonly
# 4. Stores refreshed token in ~/.credentials/ga_token.json
# 5. Returns token for use in subsequent API calls

The critical blocker was that the service account used for automated reporting hadn't been granted "Viewer" or "Editor" role in GA Admin console for the target properties. Resolution required manual grant in Google Analytics Admin UI → Account Access Management → add service account → assign Viewer role to each property ID.

Once permissions were in place, the pipeline pulled last 30 days of data using the GA4 Data API v1 client library:


# API call pattern (Python)
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import RunReportRequest

client = BetaAnalyticsDataClient()
request = RunReportRequest(
    property=f"properties/{GA_PROPERTY_ID}",
    date_ranges=[{"start_date": "30daysAgo", "end_date": "today"}],
    metrics=[{"name": "activeUsers"}, {"name": "screenPageViews"}],
    dimensions=[{"name": "pagePath"}, {"name": "date"}]
)
response = client.run_report(request)

This returned page-level traffic metrics across all properties, which became the input dataset for the orchestrator.

Orchestrator Integration & Report Card Generation

The orchestrator consumed the GA4 traffic dataset and the audit findings, then generated a structured report card:

  • Traffic trends — 30-day comparisons, top-performing pages, traffic sources
  • Instrumentation gaps — which pages/domains lack GA tracking
  • Operational recommendations — conversion funnel analysis, bounce rate hotspots, email campaign correlation
  • Campaign status — pulled from Constant Contact API and internal blast logs

The report landed as kanban card t-31aa2593 on the progress dashboard at https://progress.queenofsandiego.com/#card-t-31aa2593.

Infrastructure & Resource Names

Key resources identified during the audit:

  • GA4 Properties across platforms:
    • jada.queenofsandiego.com — Property ID: [JADA_PROPERTY_ID]
    • queenofsandiego.com — Property ID: [QOS_PROPERTY_ID]
    • dangerouscentaur.com — Property ID: [DC_PROPERTY_ID] (newly added to Search Console)
  • CloudFront Distribution for dangerouscentaur.com — distribution ID confirmed; S3 origin bucket s3://dangerouscentaur-origin/
  • Constant Contact API integration — campaign logs stored in S3 under s3://campaign-logs/constant-contact/
  • Blast script deduplication logic — reads contact CSV from export path, cross-references against campaign log to avoid duplicate sends

Key Architectural Decisions

Why use service accounts + OAuth refresh tokens: Service accounts allow headless, scheduled reporting without maintaining user session cookies. The refresh token pattern ensures the pipeline can run overnight without manual intervention, and token rotation happens automatically via reauth_ga.py.

Why route GA audit through the orchestrator: The orchestrator normalizes data from multiple sources (GA4 API, HTML audit, email campaign logs, Search Console) into a single structured report. This avoids analysts having to cross-reference three different dashboards and ensures recommendations are data-grounded.

Why prioritize GA Data API access over raw dashboard exports: API access enables real-time, programmatic analysis. Spreadsheet exports become stale immediately and can't feed into automated alerting. The 3-minute permission grant pays dividends in operational velocity.

Email Campaign Status & Blast Management

The audit also surface critical campaign deadlines:

  • Mother's Day Emergency Blast — scheduled for April 29, currently unapproved. Event is 4 days away. Template located at /repos/email-templates/mothers-day-2024.html; blast script at /repos/tools/send_constant_contact_blast.py. Needs-you card created on dashboard for sign-off.
  • Paul Simon Promotional Blast — proof deadline May 12 (6 days). Template confirmed; proof email prepared for CB review.
  • Active campaigns in Constant Contact — audit pulled full campaign list and send status from dedup logs to identify any stuck campaigns.

Search Console & Site Verification

During the audit, dangerouscentaur.com was added to Google Search Console. HTML verification token was generated, uploaded to s3://dangerouscentaur-origin/