Querying ndt.ndt7 vs ndt.unified_[down|up]loads

155 views
Skip to first unread message

Enrico Marocco

unread,
Sep 2, 2025, 1:39:53 PM (5 days ago) Sep 2
to discuss
Hi all,

newbie here, so first of all thanks for everything, what's being done and shared by this community is astonishingly great!

I've being playing a little bit with the data on BigQuery and one of the first things I've noticed that querying the ndt.ndt7 table (using raw.[Down|Up]load.UUID value as upload/download discriminator) returns results statistically almost identical to the unified views, at least for recent data, but is ~2 orders of magnitude lighter in terms of processed data.

So, the question is, what risk is one exposing to by querying the lightweight tables rather than the unified ones?

I read the documentation, I understand that the unified views aggregate data also from other test protocols, but I'm still not clear whether the "completeness" filtering mentioned (that seems the big differentiator) is applied only to unified view tables, or already the other ndt.* tables.

Thanks again for the great work!


Nathan Kinkade

unread,
Sep 2, 2025, 2:42:07 PM (5 days ago) Sep 2
to Enrico Marocco, discuss
Hi Enrico,

In large part due to the new per-user, per-day quotas that we placed on the M-Lab data in BigQuery around 1.5 years ago (currently 10TiB), I have been recommending that people avoid the unified views. As I understand it, the primary motivation for creating the unified views 5 or 6 years ago was to create a single view for querying both ndt5 and ndt7 data at the same time, and that it may largely be considered a transitional tool after the release of ndt7. However, since that time (around 2019/2020), the vast majority of NDT tests are ndt7, so unless you have some specific use case for wanting to query ndt5 data, then it is not worth the complexity and data processing costs of querying the unified views. The unified views do make some efforts to filter out bad or incomplete tests, as you read. But as you have also found, the end aggregated results are probably not much different.

In summary, I would recommend querying the measurement-lab.ndt.ndt7 view directly and avoid the unified views.

Best regards,

Nathan 

--
You received this message because you are subscribed to the Google Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@measurementlab.net.
To view this discussion visit https://groups.google.com/a/measurementlab.net/d/msgid/discuss/c8f16884-2978-449b-b656-64610472369dn%40measurementlab.net.

Enrico Marocco

unread,
Sep 4, 2025, 12:28:17 PM (3 days ago) Sep 4
to discuss, kin...@measurementlab.net, discuss, Enrico Marocco
Thanks Nathan, that makes perfect sense!

It would be probably useful adding a note in the documentation, to save some people's time and GCP credits. If you can point out in which files in the website repo you think it would fit best, I'll be happy to draft some text and submit a patch.

Nathan Kinkade

unread,
Sep 4, 2025, 6:03:38 PM (2 days ago) Sep 4
to Enrico Marocco, discuss
Hi Enrico,

You are absolutely right, and I appreciate the offer. The M-Lab website is derived from this Github repo:


... and the NDT page which references and recommends the unified views is this one;

https://github.com/m-lab/website/blob/main/_pages/tests/ndt/ndt.md

Someone can correct me if I am wrong, but I believe that the only difference between the view measurement-lab.ndt_raw.ndt7 and measurement-lab.ndt.ndt7 is that the latter has server and client annotations (network and geo details).

If you are able to work up and PR to give better recommendations, that would be super helpful. There are numerous other places on the site, pages and blog posts, about the unified views, but just changing that one page would be a good start.

Thanks!

Nathan

Enrico Marocco

unread,
Sep 5, 2025, 5:12:17 PM (2 days ago) Sep 5
to discuss, kin...@measurementlab.net, discuss, Enrico Marocco
All right, I straight submitted https://github.com/m-lab/website/pull/841 without opening any issue, as the contributing guide doesn't seem to require it.

For search engines and bots: in case I messed up with the process or that gets lost for any reason, here's the relevant text (added in a separate section and linked in the BigQuery intro):

Quota Strategies

While the Unified Views (ndt.unified_uploads and ndt.unified_downloads) are the recommended long-term supported views, they are also significantly more resource-intensive — queries against them typically consume 10–20× more quota than queries against the underlying raw tables.

A good compromise, especially during the exploration phase of a project, is to query the ndt.ndt7 table (for data from 2020 onward) or the ndt5 and web100 tables (for earlier periods). These tables use a schema very similar to the unified views, and on large windows of analysis (aggregations of thousands of samples or more) they produce results that are statistically almost identical.

One important difference is that the raw tables contain both upload and download measurements. These can be separated using the raw field attributes. For example, in ndt7 filtering with

  • raw.Download.UUID IS NOT NULL selects only download tests

  • raw.Upload.UUID IS NOT NULL selects only upload tests

This approach allows researchers to conserve their daily quota while still producing high-quality, reproducible results.


Reply all
Reply to author
Forward
0 new messages