Getting Started with CrUX video walkthroughs

265 views
Skip to first unread message

Paul Calvano

unread,
May 6, 2018, 7:43:15 AM5/6/18
to chrome-ux-report
Hello!

If anyone here is new to BigQuery or CrUX, I created some short videos that will walk you through getting started and running some queries - 

Regards,
Paul

Ilya Grigorik

unread,
May 7, 2018, 5:56:53 PM5/7/18
to paulc...@gmail.com, Rick Viscomi, chrome-ux-report
Paul, these are super! I mentioned it on twitter, but worth repeating here as well.. thank you!

The fact that you started at the beginning (how to setup the project and view the dataset) and then did a step by step exploration, building up to the more complex queries is awesome.

+Rick Viscomi perhaps worth integrating links to Pauls walkthrough in the cookbook and/or other docs?

--
You received this message because you are subscribed to the Google Groups "chrome-ux-report" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ux-repo...@chromium.org.
To post to this group, send email to chrome-u...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ux-report/13f32814-e2ea-4eef-bf77-42322c506309%40chromium.org.

joe....@mavens.co.uk

unread,
May 9, 2018, 12:00:54 PM5/9/18
to chrome-ux-report
Hi Paul,
This is fabulous and a great help. I wonder if you can help further?
I am trying to use a query based on your example below to output the onload distribution of tablet, desktop and phone for a specific domain, when the domain uses http and https.  I have tried to use "WHERE origin Like "%.example.com", but percentages  for each bucket/device look strange. The only way I can get the percentages to calculate correctly is to output the data separately for each protocol variant (http vs https) as in the example below. I'm new to SQL and guess there must be a way to merge the data so that it is merged and output for *.example.com as a whole. I tried different uses of net.host(origin) , but that didn't work either.


CREATE TEMPORARY FUNCTION spreadBins(bins ARRAY<STRUCT<start INT64, `end` INT64, density FLOAT64>>)

RETURNS ARRAY<STRUCT<client STRING, start INT64, density FLOAT64>>

LANGUAGE js AS """

  // Convert into 100ms bins and spread the density around.

  const WIDTH = 100;

  return (bins || []).reduce((bins, bin) => {

    bin.start = +bin.start;

    bin.end = Math.min(bin.end, bin.start + 5000);

    const binWidth = bin.end - bin.start;

    for (let start = bin.start; start < bin.end; start += WIDTH) {

      bins.push({

        start,

        density: bin.density / (binWidth / WIDTH)

      });

    }

    return bins;

  }, []);

""";

 

SELECT origin, form_factor.name as client,

       "onload" as metric,

       bin.start / 1000 AS bin,

       SUM(bin.density) as volume

FROM (

  SELECT origin, form_factor, spreadBins(onload.histogram.bin) AS bins

  FROM `chrome-ux-report.all.201804`

  WHERE origin IN ("http://www.example.com", "https://www.example.com")

CROSS JOIN UNNEST (bins) as bin 

GROUP BY origin, client, bin

Paul Calvano

unread,
May 9, 2018, 2:43:03 PM5/9/18
to chrome-ux-report
Hi Joe,

Thanks - I'm glad you found these useful. 


 I have tried to use "WHERE origin Like "%.example.com", but percentages  for each bucket/device look strange. The only way I can get the percentages to calculate correctly is to output the data separately for each protocol variant (http vs https) as in the example below.

The origin column in CrUX includes the protocol and hostname for each site.    In the second video I mentioned that this might be confusing, and it confused me when I first started looking at this data too.   If you look at query such as this one, you'll notice that the density is close to 1 (or 100%) for each origin). 

SELECT origin,
             ROUND(SUM(onload.density),4) as density
FROM `chrome-ux-report.all.201804`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin LIKE "%.example.com"
GROUP BY origin



 If you combine these with NET.HOST() then the densities will be combine to 1.993 (close to 200%) because it is additive.

SELECT NET.HOST(origin) as hostname,
             ROUND(SUM(onload.density),4) as density
FROM `chrome-ux-report.all.201804`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin LIKE "%.example.com"
GROUP BY hostname


        

If you add a dimension such as form_factor.name to this query, then the %s don't add up (which you saw in the more complex query above).  This is a simplified example of the problem you described above when trying to create your histogram.

SELECT NET.HOST(origin) as hostname,
       form_factor.name as client,
       ROUND(SUM(onload.density),4) as density
FROM `chrome-ux-report.all.201804`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin LIKE "%.example.com"
GROUP BY hostname, client



So what can we do about it?   Once solution would be to create a query where we divide the density based on the number of origins factored into the results.   ie, divide by 2 in this example.   However, if you are doing a wildcard query across your subdomains then you'll want to use a denominator based on distinct origins.   Here's a way that you can do this:

SELECT NET.HOST(origin) as hostname,
       form_factor.name as client,
       ROUND(
          SUM(onload.density) /  (SELECT COUNT(DISTINCT origin) FROM `chrome-ux-report.all.201802` WHERE origin LIKE "%.example.com")
       ,4) as density
FROM `chrome-ux-report.all.201804`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin LIKE "%.example.com"
GROUP BY hostname, client

The results for that give me which is closer to what you are looking for.


With a histogram, you may not need to normalize the %s to 100%.   For example, the following query where I use NET.HOST() on the origin will give me the result you are expecting.   All the densities add up to 200% though.   So if you wanted to make a CDF plot out of this then you would need to normalize using the example above (ie, dividing by distinct origins).


CREATE TEMPORARY FUNCTION spreadBins(bins ARRAY<STRUCT<start INT64, `end` INT64, density FLOAT64>>)
RETURNS ARRAY<STRUCT<client STRING, start INT64, density FLOAT64>>
LANGUAGE js AS """
  // Convert into 100ms bins and spread the density around.
  const WIDTH = 100;
  return (bins || []).reduce((bins, bin) => {
    bin.start = +bin.start;
    bin.end = Math.min(bin.end, bin.start + 5000);
    const binWidth = bin.end - bin.start;
    for (let start = bin.start; start < bin.end; start += WIDTH) {
      bins.push({
        start,
        density: bin.density / (binWidth / WIDTH)
      });
    }
    return bins;
  }, []);
""";

SELECT hostname, form_factor.name as client,

       "onload" as metric,
       bin.start / 1000 AS bin,
       SUM(bin.density) as volume
FROM (
  SELECT NET.HOST(origin) as hostname, form_factor, spreadBins(onload.histogram.bin) AS bins
  FROM `chrome-ux-report.all.201804`
  WHERE origin LIKE "%.example.com"

)
CROSS JOIN UNNEST (bins) as bin
GROUP BY hostname, client, bin



I hope this helps!

Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3
Auto Generated Inline Image 4
Auto Generated Inline Image 5

Paul Calvano

unread,
May 9, 2018, 2:55:46 PM5/9/18
to chrome-ux-report
Minor correction.   In my query above, I used

SELECT NET.HOST(origin) as hostname,
       form_factor.name as client,
       ROUND(
          SUM(onload.density) /  (SELECT COUNT(DISTINCT origin) FROM `chrome-ux-report.all.201802` WHERE origin LIKE "%.example.com")
       ,4) as density
FROM `chrome-ux-report.all.201804`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin LIKE "%.example.com"
GROUP BY hostname, client

I was unintentionally referencing an older dataset in my subquery.  Different affect the output, but here's a corrected version anyway -

SELECT NET.HOST(origin) as hostname,
       form_factor.name as client,
       ROUND(
          SUM(onload.density) /  (SELECT COUNT(DISTINCT origin) FROM `chrome-ux-report.all.201804` WHERE origin LIKE "%.example.com")
Reply all
Reply to author
Forward
0 new messages