Materialized country_summary table, plus two bonus functions

362 views
Skip to first unread message

Rick Viscomi

unread,
May 12, 2020, 7:57:38 PM5/12/20
to chrome-ux-report
Hi everyone,

In today's 202004 announcement I mentioned the availability of a new convenience table to help you query country-level data. In this post I'll share more info about the table itself and how to use it.

I've added a new recipe to the CrUX Cookbook to get the 75th percentile LCP across all countries for phone users of web.dev in the latest 202004 dataset: p75-lcp-country.sql.
SELECT
country_code,
`chrome-ux-report`.experimental.GET_COUNTRY(country_code) AS country,
p75_lcp
FROM
`chrome-ux-report.materialized.country_summary`
WHERE
yyyymm = 202004 AND
origin = 'https://web.dev' AND
device = 'phone'
ORDER BY
country_code

This query completes in 0.7 seconds and only consumes 889.3 MB of data!

A few things to note:
  • The country_code field contains the ISO 3166-1 alpha-2 codes for each country.
  • You can use the experimental GET_COUNTRY function to map the two-character country codes to their human-readable names.
  • p75_lcp is a precomputed metric and 75 is the only percentile included.
  • Metrics are aggregated at the device (AKA form factor) level.
  • The table is partitioned and clustered, resulting in much faster queries that consume much less data, which should be a big relief for users of the 1 TB free tier.
As an exercise, try remixing this query to solve these similar use cases:
  1. compare the results of Finland, Sweden, and Norway only
  2. get the percent of fast LCP experiences on desktop
  3. compare the 75th percentile FID of web.dev and developers.google.com
  4. track how CLS experiences of phone users in India have changed in 2020
If you need any help with your queries, feel free to reply to this thread.

I mentioned that p75 is the only percentile included in this table, so you might be wondering if there's an easy way to get other metrics, like medians. There is! The country_summary table is materialized from the experimental.country table, which is basically what you'd get if you combined all country tables together without any precomputation. So you can query the raw data in this table much like you would a country-specific table like chrome-ux-report.country_us.202004.

SELECT
country_code,
`chrome-ux-report`.experimental.GET_COUNTRY(country_code) AS country,
`chrome-ux-report`.experimental.PERCENTILE(ARRAY_AGG(lcp), 50) AS p50_lcp
FROM
`chrome-ux-report.experimental.country`,
UNNEST(largest_contentful_paint.histogram.bin) AS lcp
WHERE
yyyymm = 202004 AND
origin = 'https://web.dev' AND
form_factor.name = 'phone'
GROUP BY
country_code
ORDER BY
country_code

This query completes in 2 seconds and consumes 17.7 GB of data.

As you can see, this query is not as refined as the last. We need to unnest each bin of the LCP histogram, calculate the percentile manually, refer to the device using its proper "form_factor.name" name, and group by the unaggregated country code. But what we lose in convenience, we gain in control over the granularity of data we can access.

Did you catch the surprise I added to the query? In addition to the GET_COUNTRY convenience function, I'm also experimenting with a PERCENTILE function to help you quickly and easily extract stats about metrics' distribution. Just pass in the array of bins and the percentile (1-100), and the function will do the rest. Because the functions are type-sensitive, if you want to get a CLS percentile, you'll need to use the PERCENTILE_NUMERIC function instead.


So that's what's new. I hope these tables and functions unlock many more insights for CrUX users to discover. I'd be happy to answer any questions, and if you have any feedback about these dataset ergonomics please feel free to let me know.

Rick
Reply all
Reply to author
Forward
0 new messages