Rank buckets not adding up?

Ariana Mirian

unread,

Nov 16, 2023, 1:03:23 PM11/16/23

to Chrome UX Report (Discussions)

Hi folks,

First off, thank you for all the work that you do to provide CRUX for public availability! I am a huge fan and appreciate your support.

I had one quick question. I am trying to join the crux metrics materialized table with an external dataset. I wanted to look at the ranking of sites, but when I look at a month of data the rank buckets don't always add up. In this screenshot, you'll notice that there appear to only be 5000 domains that have rank 10000.

I would expect that each rank would have the number of domains it purports to have (e.g. the 1000 rank has 1000 domains). Why would it be otherwise? Am I missing something?

Thanks in advance for your time! I've pasted my query below for ease of use.

```SELECT

  rank,
  count(*)
FROM `chrome-ux-report.materialized.metrics_summary` 
WHERE date > DATE("2023-01-01") and DATE < ("2023-02-02")
GROUP by rank```

Screenshot 2023-11-16 at 10.00.16 AM.png

Barry Pollard

unread,

Nov 16, 2023, 1:11:49 PM11/16/23

to Ariana Mirian, Chrome UX Report (Discussions)

This is expected. As each origin is only assigned to a single rank an origin cannot be in rank 1000 and also 10,000.

There is more explanation here:

https://developer.chrome.com/blog/crux-rank-magnitude/

However, since that post we have also introduced half step ranks so

Rank 1,000 - top 1,000 origins

Rank 5,000 - next 4,000 origins (which + the rank 1000 origins = 5,000 origins).

Rank 10,000 - next 5,000 origins (which + the rank 1000 origins + 4,000 rank 5000 origins = 10,000 origins).

Mostly you should use: “WHERE rank <= “ clauses to include all ranks below the one you’re actually interested in.

--
You received this message because you are subscribed to the Google Groups "Chrome UX Report (Discussions)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ux-repo...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ux-report/1a384cdd-71c2-4192-9aba-c8433ee6710en%40chromium.org.

Ariana Mirian

unread,

Nov 16, 2023, 1:16:30 PM11/16/23

to Barry Pollard, Chrome UX Report (Discussions)

Hi Barry,

Thanks for the quick response and pointer! That makes sense and I had missed that blog.

In this case, why does row 2 of the screenshot (50M domain rank) only have ~8M domains listed? Is this because 40M don't qualify for inclusion in crux, but are still ranked as the 50M most popular origins?

Thanks again,

Ariana

Barry Pollard

unread,

Nov 16, 2023, 1:25:41 PM11/16/23

to Ariana Mirian, Chrome UX Report (Discussions)

The CrUX dataset only has about 18 million origins in it.

So the largest rank (rank = 50M) is incomplete and only has the leftover that didn’t fit in the rank <= 10 million bucket.

The amount of origins is slightly variable depending on how many make the eligibility requirements but think it’ll be a while before we see 50 million eligible ones! And if/when we do we’ll have a new rank = 100 million bucket with again left overs that didn’t fit in lower buckets.

So the largest tank bucket is not any indicator of how many origins are in CrUX versus how many are released publicly. It’s more just the largest bucket needed in the tank steps we use (10 raised to the power of x, with the half steps in between - so 1,000, 5,000, 10,000, 50,000… etc.).

Ariana Mirian

unread,

Nov 16, 2023, 1:28:00 PM11/16/23

to Barry Pollard, Chrome UX Report (Discussions)

Noted, thanks so much for this clear explanation! I appreciate it.

Cheers,

Ariana

Reply all

Reply to author

Forward