CrUX Schema Change from String to Record Over Time

121 views
Skip to first unread message

Ryan Siddle

unread,
Nov 24, 2020, 4:47:30 PM11/24/20
to Chrome UX Report (Discussions)
Hi all, we have noticed the schema for "largest_contentful_paint" changed from a cast type String to Record in the tables for 2019/09.

Most of the CrUX example queries on the Github repo[0] demo the first_contentful_paint metric, which has no schema change.

How would one normally go about creating queries that compensate for the changing of schema over time?


Thanks,

Ryan

Rick Viscomi

unread,
Nov 24, 2020, 5:04:35 PM11/24/20
to Chrome UX Report (Discussions), Ryan Siddle
Hi Ryan,

Here's an example of querying LCP before and after 201909:

SELECT
  _TABLE_SUFFIX AS month,
  SUM(IF(lcp.start < 2500, lcp.density, 0)) AS good_lcp
FROM
  `chrome-ux-report.all.2019*`,
  UNNEST(largest_contentful_paint.histogram.bin) AS lcp
WHERE
GROUP BY
  month
ORDER BY
  month

Prior to 201909, the LCP metric didn't exist at all in the schema (not even as a string) so this query only returns results for months 09 through 12.

The experimental.global table is equivalent to every all.YYYYMM table combined, and the schema for this table uses NULL values as placeholders when LCP data wasn't yet available. Here's an example of querying this table:

SELECT
  yyyymm,
  SUM(IF(lcp.start < 2500, lcp.density, 0)) AS good_lcp
FROM
  `chrome-ux-report.experimental.global`,
  UNNEST(largest_contentful_paint.histogram.bin) AS lcp
WHERE
  origin = 'https://www.example.com' AND
  yyyymm BETWEEN 201901 AND 201912
GROUP BY
  yyyymm
ORDER BY
  yyyymm

It's very similar except for the addition of the yyyymm field, and returns similar results for data between September and December.

Finally, if you're only looking to aggregate high-level stats about a metric like the percent of good experiences (as these examples are), then I'd recommend using the materialized.metrics_summary table or device_summary/country_summary if you need to segment by device or country. For example:

SELECT
  yyyymm,
  fast_lcp AS good_lcp
FROM
  `chrome-ux-report.materialized.metrics_summary`
WHERE
  origin = 'https://www.example.com' AND
  date BETWEEN '2019-01-01' AND '2019-12-01'
ORDER BY
  yyyymm

These summary tables contain pre-aggregated fast/slow values for each metric, or NULL if there is no data. So in this case we get NULL until September. Being pre-aggregated means that your queries are simpler and much faster/cheaper to run.

Hope that clarifies how to use the CrUX tables to query data before and after metrics are added.  If not, let me know if you're trying to query a specific country table whose schema doesn't match that of the global tables and please share the query/table that you're having issues with.


Thanks,

Rick

Ryan Siddle

unread,
Nov 30, 2020, 5:48:18 AM11/30/20
to Chrome UX Report (Discussions), Rick Viscomi, Ryan Siddle
Hi Rick,

Thanks for the examples above.

We're building toolkits around the CrUX BigQuery dataset, so we're looking for stability rather than breaking changes.  It would be helpful to understand what "experimental" means for this BigQuery dataset. My current assumption is that it means an alpha/beta for new attributes and it's subject to schema changes and tests, whereas the non-experimental could be stable or due for deprecation at some point in the near future?

Best regards,

Ryan

Rick Viscomi

unread,
Nov 30, 2020, 2:48:49 PM11/30/20
to Chrome UX Report (Discussions), Ryan Siddle, Rick Viscomi
Hi Ryan,

In the CrUX context, "experimental" is used in a couple of ways:
  1. The "experimental" namespace that is used as a prefix for metrics like "experimental.permission.notifications" signifies that the metrics are not necessarily web platform standards or that their implementation may change over time. So for these metrics there is no guarantee of stability. Think of them as incubators for interesting data.

    Metrics without that namespace like "cumulative_layout_shift" are built on standards and expected to remain relatively stable. However, note that the standards themselves may change as is the case with CLS, requiring us to update our implementation.

    Metrics have "graduated" from experimental to stable but so far we've never deprecated one of them. It's not impossible, so if that does happen we'll post to the Announcement channel accordingly.

  2. The "experimental" dataset containing tables like "experimental.global" signifies that the table schemas themselves (how we organize the data) are liable to change. Tables and persistent functions like "experimental.GET_COUNTRY" may also be added or removed over time, but in general any major changes will be accompanied by an announcement.

    Tables in the "materialized" dataset also behave like an "experimental" table in that we're playing with more convenient ways to access the data and they're liable to change. They have additional logic built in to assess fast/slow performance so these thresholds may need to be updated over time in alignment with web.dev/vitals.

Hope that helps,

Rick
Reply all
Reply to author
Forward
0 new messages