Automating GA4 Traffic Audits and Dashboard Deep Linking: A Multi-Service Orchestration Pattern

```html

Over the past development session, we built and deployed an automated Google Analytics 4 audit system that simultaneously crawls site infrastructure, validates tracking codes, pulls 30-day traffic data, and surfaces actionable insights through a kanban-style dashboard. This post details the technical architecture, orchestration pattern, and infrastructure decisions that made this possible.

The Problem We Solved

Managing analytics across multiple domains (sailjada.com, burialsatsea.com, salejada.com, dangerouscentaur.com) creates visibility gaps:

No centralized view of which pages have GA tracking codes deployed
Manual process to pull traffic data across properties
Campaign status scattered across email platforms and dashboards
No programmatic access to GA4 Data API

The solution: build an orchestrator agent that runs these audits in parallel, aggregates findings, and surfaces them as interactive dashboard cards with deep linking support.

Architecture: Agent-Driven Orchestration

The system uses a "mother agent" pattern that delegates work to specialized subprocess agents:

Agent: "GA audit + orchestrator report"
├── Subprocess 1: GA code audit (HTML crawler)
├── Subprocess 2: GA4 Data API pull (last 30 days)
├── Subprocess 3: Constant Contact campaign check
└── Subprocess 4: Recommendations synthesis → dashboard card

Each subprocess runs independently and reports back to a central aggregator. The orchestrator doesn't block—it spawns background tasks and returns immediately with a notification pointing to the dashboard card where results will land.

Technical Implementation: GA Code Audit

The GA code audit crawls HTML files across all site repos and flags missing or misconfigured tracking codes.

File paths scanned:

/Users/cb/Documents/repos/jada-main/ (sailjada.com, salejada.com)
/Users/cb/Documents/repos/burial-at-sea/ (burialsatsea.com)
/Users/cb/Documents/repos/dangerouscentaur/ (dangerouscentaur.com)

The audit looks for GA4 gtag initialization in <head> tags with the pattern:

<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'G-XXXXXXXXXX');
</script>

The script extracts the property ID (G-XXXXXXXXXX) and cross-references it against a master mapping of all GA4 properties to their respective sites. Any missing properties or mismatched IDs are flagged.

GA4 Data API Integration

To pull programmatic traffic data, we set up OAuth2 service account authentication for the Google Analytics Data API:

Key setup steps:

Created OAuth2 client secret for a service account in Google Cloud Console

Installed the google-analytics-data Python client library:

pip install google-analytics-data --break-system-packages

Stored credentials in /Users/cb/Documents/repos/tools/reauth_ga.py with proper scope handling
Granted the service account Editor role in GA4 Admin console for each property

The reauth script handles token refresh and maintains a cached token to avoid repeated auth calls. GA4 property IDs are extracted from dashboard screenshot URLs and stored in a central mapping file.

Query pattern used:

from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
    RunReportRequest,
    DateRange,
    Metric,
    Dimension,
)

client = BetaAnalyticsDataClient()
request = RunReportRequest(
    property=f"properties/{GA4_PROPERTY_ID}",
    date_ranges=[DateRange(start_date="30daysAgo", end_date="today")],
    metrics=[Metric(name="activeUsers"), Metric(name="sessions")],
    dimensions=[Dimension(name="pagePath")],
)
response = client.run_report(request)

This returns aggregated traffic by page path over the last 30 days, giving us a definitive inventory of which pages are actually receiving traffic.

Dashboard Integration and Deep Linking

Results are pushed to the dashboard as a kanban card with five sections:

GA Code Coverage Report: which pages are missing tracking codes
Last 30-Day Traffic Summary: sessions, users, and engagement metrics by property
Campaign Status Check: Constant Contact blast statuses and scheduled dates
Traffic Growth Recommendations: content gaps, high-bounce pages, etc.
Operational Excellence Gaps: missing APIs, incomplete configs, auth issues

The dashboard HTML supports hash-based deep linking. Cards are referenced using the format:

https://progress.queenofsandiego.com/#card-{id}

Example: Card t-31aa2593 is accessed at https://progress.queenofsandiego.com/#card-t-31aa2593. The dashboard JavaScript listens for hash changes and scrolls to the corresponding card element:

window.addEventListener('hashchange', () => {
  const cardId = window.location.hash.replace('#card-', '');
  const card = document.getElementById(cardId);
  if (card) card.scrollIntoView({ behavior: 'smooth' });
});

Infrastructure: Google Cloud and Search Console Integration

During the audit, we also addressed a critical infrastructure gap: dangerouscentaur.com was never verified in Google Search Console.

Verification process:

Identified the CloudFront distribution ID for dangerouscentaur.com
Found the S3 origin bucket for that distribution
Generated a GSC HTML verification token
Uploaded the verification HTML file to the root of the S3 bucket:
```
s3://{bucket-name}/google{verification-hash}.html
```
Invalidated the CloudFront cache to ensure the file was immediately available:
```
aws cloudfront create-invalidation --distribution-id {DIST_ID} --paths "/*"
```
Completed verification in Search Console
Submitted the sitemap to GSC for crawling

Key Decisions and Rationale

Why agent-driven orchestration? Each audit type (code crawl, API pull, campaign check) has different dependencies and timelines. Running them in parallel with a background agent