Building an Orchestrator-Driven Analytics Audit Pipeline: GA Code Coverage, Traffic Analysis, and Campaign Intelligence
Over the past development session, we tackled a critical operational gap: comprehensive visibility into Google Analytics instrumentation across all Queen of San Diego platforms, coupled with automated traffic analysis and email campaign orchestration. This post details the architecture, decisions, and technical implementation.
The Problem: Fragmented Analytics Visibility
Before this work, we had no programmatic way to answer basic questions:
- Which pages across all platforms have GA tracking codes?
- What's the last 30 days of traffic by property?
- Are there GA implementation gaps?
- What's the status of scheduled email campaigns?
- Where are the operational bottlenecks?
Manual audits don't scale. We needed an automated pipeline that could run end-to-end and surface findings in our existing kanban workflow.
Architecture: Multi-Stage Audit Pipeline
Stage 1: GA Code Coverage Audit
We built a file-system scanner that traverses HTML across all site repositories and checks for GA4 measurement IDs:
/Users/cb/Documents/repos/jada/— JADA e-commerce HTML templates/Users/cb/Documents/repos/burial-at-sea/— Event booking site/Users/cb/Documents/repos/sail-jada/— Sailing booking platform/Users/cb/Documents/repos/dangerouscentaur/— Luxury travel blog/Users/cb/Documents/repos/qos-main/— Main marketing site
Why this approach: Rather than checking Google Analytics Admin (which only shows what's been triggered), we audit the actual deployed code. This catches instrumentation bugs, missing IDs, and configuration mismatches before they affect reporting.
The scanner looks for the GA4 global site tag pattern:
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
And validates that each property ID maps to an actual GA4 property in our Google Analytics account.
Stage 2: Traffic Data Pull via GA4 Data API
Once we confirmed GA code coverage, we needed programmatic access to traffic data. This required OAuth setup:
- Created service account in Google Cloud (project:
qos-analytics-prod) - Generated client secret JSON and stored in
/Users/cb/.config/google/ - Granted service account Editor role on all GA4 properties via Admin console
- Implemented token refresh flow in
/Users/cb/Documents/repos/tools/reauth_ga.py
The reauth script handles OAuth token lifecycle:
python3 /Users/cb/Documents/repos/tools/reauth_ga.py \
--client-secret ~/.config/google/qos-analytics-client-secret.json \
--scopes analytics.readonly
Why service accounts instead of user OAuth: Service accounts don't expire on password changes and don't require interactive login. They're ideal for scheduled, background data pulls. We use analytics.readonly scope (principle of least privilege) rather than full edit access.
With valid credentials, we pull the last 30 days of traffic for each property:
Property: GA-391842 (JADA e-commerce)
Property: GA-412756 (Sail JADA)
Property: GA-398201 (Burial at Sea)
Property: GA-445923 (QOS main site)
The GA4 Data API call structure:
POST https://analyticsdata.googleapis.com/v1beta/properties/{propertyId}:runReport
Request body:
{
"dateRanges": [{"startDate": "30daysAgo", "endDate": "today"}],
"metrics": [{"name": "activeUsers"}, {"name": "screenPageViews"}],
"dimensions": [{"name": "pagePath"}]
}
Stage 3: Email Campaign Intelligence
Parallel to traffic analysis, we check Constant Contact for scheduled campaigns. The script:
- Connects via Constant Contact OAuth (separate service account)
- Lists all campaigns with status and send dates
- Flags any campaigns past their approval deadline
- Correlates with email templates in
/Users/cb/Documents/repos/email-templates/
We discovered two campaigns in critical states:
- Mother's Day blast — scheduled April 29 (4 days out), still unapproved
- Paul Simon concert blast — proof needed by May 12 (6 days out)
Stage 4: Orchestrator Report Generation
Rather than dumping raw data to console, we feed all findings into our orchestrator system, which generates a structured kanban card. The orchestrator process:
- Receives audit results: GA code gaps, traffic metrics, campaign status
- Generates a multi-section dashboard card with deep links
- Posts to progress dashboard at
https://progress.queenofsandiego.com - Marks urgent items as needs-you cards (unblocks stakeholders)
The orchestrator writes its output to /Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/ for persistence across sessions.
Infrastructure: Dashboard Deep Linking
The progress dashboard at progress.queenofsandiego.com supports hash-based deep links. When the orchestrator creates a card, it generates a linkable URL:
https://progress.queenofsandiego.com/#card-t-31aa2593
The dashboard JS handles hash routing via standard browser navigation:
window.location.hash = `#card-${cardId}`
This allows us to reference findings directly in Slack, emails, or documentation without breaking context.
Key Decisions and Rationale
1. Programmatic GA Access vs. Manual Reporting
Decision: Build reauth scripts and use GA4 Data API instead of exporting PDFs from the GA console.
Why: Automation scales. Tomorrow we can run this weekly. In six months, we can feed traffic data into predictive models. Manual reporting is a dead end.
2. Service Account OAuth for Background Jobs
Decision: Use service account JSON keys rather than user credentials for reauth_ga.py.
Why: Eliminates password dependency. When the team member who set up GA eventually leaves, their credentials don't break the pipeline. Service accounts are tied to the GCP project, not individuals.
3. Orchestrator + Kanban Over Raw Reports
Decision: Pipe audit findings into the progress dashboard instead of generating static HTML reports.