Automating GA4 Traffic Audits and Dashboard Deep Linking: A Multi-Service Orchestration Pattern
Over the past development session, we built and deployed an automated Google Analytics 4 audit system that simultaneously crawls site infrastructure, validates tracking codes, pulls 30-day traffic data, and surfaces actionable insights through a kanban-style dashboard. This post details the technical architecture, orchestration pattern, and infrastructure decisions that made this possible.
The Problem We Solved
Managing analytics across multiple domains (sailjada.com, burialsatsea.com, salejada.com, dangerouscentaur.com) creates visibility gaps:
- No centralized view of which pages have GA tracking codes deployed
- Manual process to pull traffic data across properties
- Campaign status scattered across email platforms and dashboards
- No programmatic access to GA4 Data API
The solution: build an orchestrator agent that runs these audits in parallel, aggregates findings, and surfaces them as interactive dashboard cards with deep linking support.
Architecture: Agent-Driven Orchestration
The system uses a "mother agent" pattern that delegates work to specialized subprocess agents:
Agent: "GA audit + orchestrator report"
├── Subprocess 1: GA code audit (HTML crawler)
├── Subprocess 2: GA4 Data API pull (last 30 days)
├── Subprocess 3: Constant Contact campaign check
└── Subprocess 4: Recommendations synthesis → dashboard card
Each subprocess runs independently and reports back to a central aggregator. The orchestrator doesn't block—it spawns background tasks and returns immediately with a notification pointing to the dashboard card where results will land.
Technical Implementation: GA Code Audit
The GA code audit crawls HTML files across all site repos and flags missing or misconfigured tracking codes.
File paths scanned:
/Users/cb/Documents/repos/jada-main/(sailjada.com, salejada.com)/Users/cb/Documents/repos/burial-at-sea/(burialsatsea.com)/Users/cb/Documents/repos/dangerouscentaur/(dangerouscentaur.com)
The audit looks for GA4 gtag initialization in <head> tags with the pattern:
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-XXXXXXXXXX');
</script>
The script extracts the property ID (G-XXXXXXXXXX) and cross-references it against a master mapping of all GA4 properties to their respective sites. Any missing properties or mismatched IDs are flagged.
GA4 Data API Integration
To pull programmatic traffic data, we set up OAuth2 service account authentication for the Google Analytics Data API:
Key setup steps:
- Created OAuth2 client secret for a service account in Google Cloud Console
- Installed the
google-analytics-dataPython client library:pip install google-analytics-data --break-system-packages - Stored credentials in
/Users/cb/Documents/repos/tools/reauth_ga.pywith proper scope handling - Granted the service account Editor role in GA4 Admin console for each property
The reauth script handles token refresh and maintains a cached token to avoid repeated auth calls. GA4 property IDs are extracted from dashboard screenshot URLs and stored in a central mapping file.
Query pattern used:
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
RunReportRequest,
DateRange,
Metric,
Dimension,
)
client = BetaAnalyticsDataClient()
request = RunReportRequest(
property=f"properties/{GA4_PROPERTY_ID}",
date_ranges=[DateRange(start_date="30daysAgo", end_date="today")],
metrics=[Metric(name="activeUsers"), Metric(name="sessions")],
dimensions=[Dimension(name="pagePath")],
)
response = client.run_report(request)
This returns aggregated traffic by page path over the last 30 days, giving us a definitive inventory of which pages are actually receiving traffic.
Dashboard Integration and Deep Linking
Results are pushed to the dashboard as a kanban card with five sections:
- GA Code Coverage Report: which pages are missing tracking codes
- Last 30-Day Traffic Summary: sessions, users, and engagement metrics by property
- Campaign Status Check: Constant Contact blast statuses and scheduled dates
- Traffic Growth Recommendations: content gaps, high-bounce pages, etc.
- Operational Excellence Gaps: missing APIs, incomplete configs, auth issues
The dashboard HTML supports hash-based deep linking. Cards are referenced using the format:
https://progress.queenofsandiego.com/#card-{id}
Example: Card t-31aa2593 is accessed at https://progress.queenofsandiego.com/#card-t-31aa2593. The dashboard JavaScript listens for hash changes and scrolls to the corresponding card element:
window.addEventListener('hashchange', () => {
const cardId = window.location.hash.replace('#card-', '');
const card = document.getElementById(cardId);
if (card) card.scrollIntoView({ behavior: 'smooth' });
});
Infrastructure: Google Cloud and Search Console Integration
During the audit, we also addressed a critical infrastructure gap: dangerouscentaur.com was never verified in Google Search Console.
Verification process:
- Identified the CloudFront distribution ID for dangerouscentaur.com
- Found the S3 origin bucket for that distribution
- Generated a GSC HTML verification token
- Uploaded the verification HTML file to the root of the S3 bucket:
s3://{bucket-name}/google{verification-hash}.html - Invalidated the CloudFront cache to ensure the file was immediately available:
aws cloudfront create-invalidation --distribution-id {DIST_ID} --paths "/*" - Completed verification in Search Console
- Submitted the sitemap to GSC for crawling
Key Decisions and Rationale
Why agent-driven orchestration? Each audit type (code crawl, API pull, campaign check) has different dependencies and timelines. Running them in parallel with a background agent