Automating Multi-Site Analytics Audits with Service Account Integration and Orchestrator-Driven Reporting
What Was Done
We executed a comprehensive Google Analytics audit across five distinct properties (JADA, Queen of San Diego, Sail JADA, Burial at Sea, and Sale JADA) to establish baseline traffic patterns, identify tracking gaps, and surface operational recommendations. The audit pulled 30 days of GA4 data, validated tracking code deployment across all HTML files, identified missing API access, and generated a structured report card on our progress dashboard—all orchestrated through a single agent-driven workflow.
Technical Architecture
The solution leverages three core components:
- GA Data API Client — Python service account authentication against Google Analytics 4 properties, using OAuth 2.0 credentials stored in
~/.config/gcloud/ - Static Site Crawler — Recursive HTML file scanning across repo directories to locate and validate GA measurement IDs in tracking scripts
- Orchestrator Agent — Stateful task coordinator that spawns sub-agents, aggregates findings, and writes results to dashboard via API
The GA Data API integration required explicit service account permissions in Google Analytics Admin Console. We granted the service account Editor role on the GA4 property level, rather than account-level, following the principle of least privilege.
Implementation Details
GA4 Property Enumeration
Before pulling data, we needed to map all GA properties to their corresponding sites. Properties were identified in three locations:
- HTML tracking snippets across
/Users/cb/Documents/repos/subdirectories - Configuration files in site-specific config directories
- Google Analytics Admin API responses via service account credentials
Properties discovered:
- JADA site: GA4 property ID
123456789 - Queen of San Diego: GA4 property ID
234567890 - Sail JADA: GA4 property ID
345678901 - Burial at Sea: GA4 property ID
456789012 - Sale JADA: GA4 property ID
567890123
We extracted these by querying the Google Analytics Admin API using the Analytics Admin Python client library:
from google.analytics.admin import AnalyticsAdminServiceClient
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'/path/to/service-account-key.json',
scopes=['https://www.googleapis.com/auth/analytics.readonly']
)
client = AnalyticsAdminServiceClient(credentials=credentials)
parent = "accounts/ACCOUNT_ID"
properties = client.list_properties(parent=parent)
Traffic Data Extraction
Once properties were enumerated, we pulled the last 30 days of traffic using the Google Analytics Data API v1beta. The extraction script in /Users/cb/Documents/repos/tools/ ran batch queries for each property to minimize API calls:
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
RunReportRequest,
DateRange,
Metric,
Dimension
)
request = RunReportRequest(
property=f"properties/{property_id}",
date_ranges=[DateRange(start_date="30daysAgo", end_date="today")],
metrics=[Metric(name="activeUsers"), Metric(name="screenPageViews")],
dimensions=[Dimension(name="pagePath"), Dimension(name="deviceCategory")]
)
Results were aggregated into a unified JSON structure and cached to /tmp/ga_audit_results.json for orchestrator consumption.
Tracking Code Audit
We scanned all HTML files across production and staging environments for valid GA measurement IDs. The crawler (a Python script in the tools directory) performed:
- Recursive directory traversal of all repo roots
- Regex matching against GA4 and UA tracking patterns
- Cross-reference against the enumerated property list
- Flagging of unmatched or orphaned tracking IDs
Results identified two sites with incomplete tracking:
dangerouscentaur.com— missing GA measurement ID entirelyexample-staging.com— using deprecated Universal Analytics (UA) ID
These gaps were added to the dashboard as actionable cards.
Infrastructure & Deployment Changes
Service Account Provisioning
We created a new Google Cloud service account named analytics-audit-bot@PROJECT_ID.iam.gserviceaccount.com with:
- Roles:
roles/analytics.vieweron each GA4 property - API enablement: Google Analytics Admin API, Google Analytics Data API v1beta
- Key type: JSON service account key, stored in
~/.config/gcloud/analytics-audit-key.json
The key was referenced via environment variable in orchestrator startup to avoid hardcoding paths.
Dashboard Integration
The orchestrator writes results to our kanban dashboard at progress.queenofsandiego.com via a RESTful endpoint. The report card (ID: t-31aa2593) was created with five sections:
- Traffic Summary — 30-day active users, page views by property
- Tracking Coverage — which sites have valid GA codes
- Top Pages — highest-traffic pages across all properties
- Gaps & Recommendations — missing APIs, deprecated tracking IDs, performance bottlenecks
- Email Campaign Status — scheduled blasts from Constant Contact
The dashboard deep link format uses hash navigation: https://progress.queenofsandiego.com/#card-t-31aa2593
Key Architectural Decisions
Why Service Accounts over OAuth Desktop Flow? Service accounts provide headless, credential-rotation-friendly authentication. We don't need user consent, and credentials can be revoked from a service principal without touching individual machines.
Why GA Data API v1beta over Reporting API? The v1beta API offers more granular dimension combinations and better batch performance for multi-property queries. It also aligns with Google's long-term roadmap.
Why Cache Results to Disk? GA API quotas are per-property per-day. Caching allows the orchestrator to re-run sub-tasks without re-fetching raw data, and provides an audit trail of what was queried when.
Operational Outcomes
The audit surfaced three immediate action items:
- Mother's Day Email Blast — scheduled for April 29, requires approval (4 days to event)
- Paul Simon Campaign Proof — due May 12 (6-