Integrating GA4 Data API with Multi-Site Orchestrator: Audit, Authorization, and Real-Time Reporting Architecture
Over the past development session, we completed a comprehensive Google Analytics 4 audit across all Queen of San Diego properties, implemented programmatic GA4 Data API access, and integrated findings into our orchestrator-driven dashboard reporting system. This post details the technical implementation, infrastructure changes, and architectural decisions that enable real-time traffic analytics across our platform portfolio.
Problem Statement
Prior to this work, our analytics pipeline lacked programmatic access to GA4 data. While manual GA4 console access existed, there was no automated way to:
- Pull historical traffic data for the last 30 days across all properties
- Verify GA tracking code deployment across all sites
- Generate consolidated reports without manual console navigation
- Feed analytics insights into our orchestrator-driven kanban dashboard
Additionally, we needed visibility into which pages across our multi-property ecosystem were properly instrumented with GA tracking codes.
Technical Architecture
GA4 Data API Service Account Setup
The solution leverages Google Cloud service accounts with OAuth 2.0 credentials. The flow is:
Service Account (JSON key)
→ OAuth 2.0 Bearer Token
→ GA4 Data API v1
→ Property metrics/dimensions
→ Orchestrator processing
→ Dashboard card generation
Key file: /Users/cb/Documents/repos/tools/reauth_ga.py
This Python script handles the OAuth token refresh cycle. Rather than storing long-lived tokens, we use the service account private key to request fresh tokens on demand. Each API call includes the current Bearer token in the Authorization: Bearer {token} header.
GA Property ID Mapping
We discovered GA property IDs scattered across multiple repositories:
- Queen of San Diego (main): Property ID found in tracking configuration
- JADA Property: Distinct property ID for separate reporting
- Additional properties: Mapped across tools, templates, and site configurations
All property IDs were catalogued in our project memory at /Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/MEMORY.md for future reference and orchestrator use.
Multi-Site Tracking Code Audit
A systematic audit verified GA tracking code presence across all HTML assets. The audit searched for:
- Google Tag Manager container tags (format:
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXX">) - GA initialization code (
gtag('config', 'G-XXXXX');) - Missing instrumentation on critical pages
This identified gaps where pages lacked proper tracking, allowing us to prioritize instrumentation efforts.
GA4 Data API Integration
API Request Pattern
The GA4 Data API v1 uses a POST-based query model. Example request structure:
POST https://analyticsdata.googleapis.com/v1beta/properties/{propertyId}:runReport
Authorization: Bearer {accessToken}
Content-Type: application/json
{
"dateRanges": [
{
"startDate": "2024-04-09",
"endDate": "2024-05-09"
}
],
"metrics": [
{"name": "activeUsers"},
{"name": "screenPageViews"}
],
"dimensions": [
{"name": "pagePath"}
]
}
The response provides row-by-row traffic data, which we parse into structured records for downstream processing.
Token Management Pattern
Rather than hardcoding tokens, reauth_ga.py implements token refresh logic:
- Read service account JSON from a secure location (not in version control)
- Request fresh token from Google OAuth 2.0 endpoint
- Cache token with TTL to minimize API calls
- Refresh automatically when token expires
This ensures every orchestrator job has valid credentials without manual token rotation.
Orchestrator Integration
Our orchestrator agent consumed the GA4 audit results and generated a comprehensive report, which landed as a live dashboard card: t-31aa2593
The card structure includes:
- Section 1: Last 30 days traffic metrics (activeUsers, pageViews by site)
- Section 2: GA code coverage audit (% of pages instrumented per property)
- Section 3: Top-performing pages and content recommendations
- Section 4: Traffic bottlenecks and improvement opportunities
- Section 5: Operational excellence gaps (email deliverability, landing page speed, etc.)
The deep link for direct access: https://progress.queenofsandiego.com/#card-t-31aa2593
Dashboard Deep Linking Implementation
The progress dashboard supports hash-based navigation. Card IDs are referenced as:
https://progress.queenofsandiego.com/#card-{cardId}
The frontend JavaScript (in dashboard JS files) watches for hash changes and scrolls/highlights the matching card. This pattern enables shareable deep links for specific findings without requiring URL parameters or query strings.
Infrastructure and Storage
Campaign Logs and Deduplication
Email blast operations log sent status to prevent duplicate campaigns. The dedup logic checks:
- Campaign log location: S3 bucket with date-partitioned structure
- Contact CSV source: Constant Contact exports, stored in
/repos/data/contacts/ - Sent status tracking: Per-contact flags to prevent resending during multi-batch operations
Mother's Day and Paul Simon blasts both leveraged this log to ensure no duplicate sends.
Key Decisions and Trade-offs
- Service Account OAuth vs. User OAuth: We chose service account credentials because they don't expire like user tokens and don't require interactive authentication. They're ideal for backend orchestrator jobs.
- Hash-based Deep Linking: Instead of URL parameters, we use hash navigation to avoid server-side routing complexity. The dashboard is client-side-heavy, so hash routing is efficient and stateless.
- Post-based GA Queries: GA4 Data API uses POST (not GET) for complex queries, enabling dimensional filtering and metric aggregation within a single request.
- Property ID Centralization: All GA property IDs are now catalogued in a single memory file to prevent inconsistencies across tools and templates.
What's Next
- Automated Daily Reports: Schedule orchestrator GA jobs to run nightly, pushing updated traffic insights to the dashboard
- Page Speed Integration: Layer in Core Web Vit