GA4 data export to BigQuery valaidation

Many organizations have specific data requirements that cannot be met by standard reporting tools provided by Google Analytics. To be honest, many of them. To handle this, Google Analytics provides access to raw data, such as Google GA4 export to BigQuery.

From a technical perspective, this creates two data sources that should be aligned on a report level or should be consistent, integrated, and comprehensive.

To reach this target, we can implement several reports that are easy to create on both data sides and check if the reports mismatch

New users dayly report

alt text


select event_date, count(distinct user) uniq_users
from (
select event_date, coalesce(user_id, user_pseudo_id) as user
from bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
where _TABLE_SUFFIX BETWEEN '20210128' AND '20210130' and event_name IN ('first_visit', 'first_open')
)
group by event_date

Users with at least 2 pageviews report

As second example lets consider something that illustrate users amount and their behevior like page interaction

alt text

--amount of users who had at least 2 page_views per day
select event_date, count(distinct user) users
from(
select event_date,coalesce(user_id, user_pseudo_id) as user, count(*) page_views
from bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
where _TABLE_SUFFIX BETWEEN '20210128' AND '20210130' and event_name = 'page_view'
group by event_date,user
) 
where page_views >= 2
group by event_date

Sessions by landing pages

Last one might be report that indicates session amount related to some page interaction like determing landing page

alt text

--amount of sessions for each landing page
select landing_page,count(distinct session_id) sessions
from (
select
    concat((select value.int_value from unnest(event_params) where key = "ga_session_id") ,
    user_pseudo_id) session_id,
    (select value.string_value from unnest(event_params) where key = "page_location")  landing_page
from bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
where _TABLE_SUFFIX BETWEEN '20210128' AND '20210130' and event_name = 'session_start'
)
group by landing_page
order by sessions desc

Data Consistency Monitoring

In spite of getting a general understanding of the consistency of two sources by comparison reports, this approach has significant limitations: it doesn't cover all dimensions and metrics, and it requires manual checking. To overcome these challenges, a comprehensive data consistency monitoring report must be created. This type of report can check hundreds of metrics and dimensions automatically and send notifications with information about discrepancy levels and locations