|

Auditing GA4 Data in BigQuery: A Practical Guide

If you’ve already linked your Google Analytics 4 (GA4) property to BigQuery for raw-data export, you’re ahead of the curve. But simply exporting the data isn’t enough — without regular auditing you may be building dashboards on shaky ground. This post walks you through why auditing matters, how to do it in BigQuery, and how to turn insights into action.


Why Audit Your GA4 → BigQuery Export?

1. Trustworthy data = better decisions

If your data pipeline has errors or blind spots, all downstream reports will inherit the error. For GA4, this might mean: missing events, duplicated events, untracked parameters, or mis-tagged traffic sources. As one audit guide observes:

2. Full raw-data access reveals hidden issues

Dashboards in GA4 UI are useful—but with BigQuery you get raw event-level access that lets you spot subtle problems like:

  • Unusual user-property values
  • Missing UTMs or entirely un-attributed sessions (“not set”)
  • Event parameter bloat that slows queries, drives up cost
  • Data retention gaps or mis-linking of streams

3. Cost performance & governance

Exporting GA4 data to BigQuery is powerful—but if you’re not smart about it you could incur large query costs, inefficient storage, or audit-blind spots. Best practices make a difference. For example: avoid broad queries without date filters; remove unused event parameters. OWOX


The Audit Framework: BigQuery Edition

Here’s a practical 4-step framework you can follow to audit your GA4 dataset in BigQuery.

Step A: Identify your data scope & baseline

  • Confirm the GA4 export link is active: in GA4 → Admin → Product Links → BigQuery.
  • Check table naming / partitioning: GA4 generally exports daily partitions (events_YYYYMMDD).
  • Determine your business baseline: key events (e.g., purchase, lead_submit), key traffic sources, typical user counts.
  • Make note of your dataset size and recent query patterns (so you can spot big shifts post-audit).

Step B: Schema & event-parameter hygiene

  • Unused parameters: Are you collecting event parameters or user properties that aren’t used? That causes dataset bloat and query slow-downs.
  • Parameter duplication or mis-naming: e.g., utm_source, utm_source1, source_utm – hard to query and clean.
  • Partitioning & timestamps: Ensure you’re querying by event_date or event_timestamp with date filters so you don’t scan the entire dataset. Mistake #4 in one audit guide: querying too broad a range.
  • Retention & linking: GA4 defaults may only retain a short window unless you export; verify your linking and retention policy.

Step C: Data-quality checks (SQL queries you should run)

Here are sample checks you can implement in BigQuery SQL:

  • Missing or “not set” traffic source:
    SELECT event_date, COUNTIF(traffic_source.source = '(not set)') AS not_set_sessions, COUNT(*) AS total_sessions FROM `your_project.your_dataset.events_*` WHERE event_name = 'session_start' AND event_date BETWEEN '2025-01-01' AND '2025-01-31' GROUP BY event_date ORDER BY event_date;
    If you see “not set” traffic sources rising, you may have tagging or redirect issues.
  • Duplicate events:
    SELECT event_name, COUNT(*) AS total_count, COUNT(DISTINCT event_bundle_sequence_id) AS unique_sequences FROM `your_project.your_dataset.events_*` WHERE event_date = '2025-10-25' GROUP BY event_name ORDER BY total_count DESC LIMIT 20;
    Big variance between total_count and unique_sequences could indicate duplication.
  • Unused event parameters:
    SELECT event_name, param.key AS param_name, COUNT(*) AS param_count FROM `your_project.your_dataset.events_*`, UNNEST(event_params) AS param WHERE event_date BETWEEN '2025-10-01' AND '2025-10-25' GROUP BY event_name, param_name ORDER BY param_count ASC LIMIT 50;
    Low-count parameters may be candidates for cleanup.
  • Data volume trends: Track daily row counts, monthly growth, to spot anomalies such as sudden drops (e.g., export broken) or spikes (e.g., spam or mis-tagging).
    SELECT event_date, COUNT(*) AS rows FROM `your_project.your_dataset.events_*` WHERE event_date >= '2025-01-01' GROUP BY event_date ORDER BY event_date;

Step D: Governance & ongoing process

  • Audit cadence: Set a monthly check for key metrics plus a quarterly deep audit.
  • Alerting: Use BigQuery + Looker/Looker Studio or other BI tools to alert for: large increases in “not set”, unusual event counts, query costs.
  • Document your dataset: Maintain a data catalog describing each event, parameter, its purpose, owner, and known issues.
  • Audit logs & cost monitoring: Review BigQuery audit logs and query cost reports. For example, Google Cloud’s audit-logs best practices apply. Google Cloud Documentation
  • Clean up: Remove old/unneeded parameters, archive old partitions if required, enforce naming conventions.
  • Training & ownership: Make sure someone owns the GA4 → BigQuery pipeline, and team members know how to query, interpret, and flag issues.

Example Use-Case: Conversion Funnel Discrepancy

You launch a new campaign and notice your GA4 UI shows conversion count up 50% vs last month. But your CRM data doesn’t align. You launch your audit:

  • Check day-by-day table row counts → spike on the campaign launch day
  • Query event parameters for the purchase event → see a new parameter promo_code but null for many rows
  • Check traffic source for these rows → lots of (not set)
    Conclusion: The campaign redirect stripped UTM parameters, causing GA4 to record the session but not attribute correctly. You fix the redirect, then rerun the audit to confirm “not set” traffic source drops.
    This kind of issue is reliably caught when you have raw-data access via BigQuery.

Best Practices Recap

  • Always include date filters and partitions when querying GA4 data in BigQuery to avoid scanning large volumes.
  • Regularly audit and remove unused event parameters / user properties so the dataset stays lean and queries run faster.
  • Align your audit with business goals: what event, what conversion, what metric matters.
  • Automate audit checks, set alerts, and document your dataset.
  • Use raw data export as part of your data governance and analytics maturity journey.

Call to Action: Try GAAuditor.com

If this sounds like a lot to coordinate manually, you’re not alone. That’s why we built GAAuditor.com — a tool designed to help you automate the auditing of your GA4 + BigQuery pipeline, surface anomalies, enforce naming conventions, and provide actionable insights in minutes instead of hours.

Ready to get started? Head over to GAAuditor.com and launch your free trial today. Spend less time digging for issues and more time driving insights.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *