Connecting NUTS 3 (eu_NUTS_3_2021_01m) to Cities (v0_cities

Bradley Kalgovas

unread,

Jul 28, 2021, 3:47:52 PM7/28/21

to discuss

Hi there, I want to connect the data in the cities table to the NUTS 3 tables that are in the database. However, in the cities table I can only see ISO3166_2region1 as a potential linking code. How did map the cities data to NUTS 3? What do I join on? Thanks!! :)

Chris Ritzo

unread,

Jul 28, 2021, 5:39:19 PM7/28/21

to discuss, b.ra...@gmail.com

Thanks for asking this question. The two tables you mentioned are perhaps more useful for different purposes.

While the table v0_cities_asn contains statistics by city and ASN as generated by our statistics pipeline service, it does not currently contain a field that could be joined to NUTS geographies. This could potentially be added to the statistics output in the future during the generation of the statistics.

The tables defining EU's NUTS geographies are useful for geographic joins, using BigQuery's GIS support. These tables define the shape outline of NUTS regions in the geometry field. During the generation of statistics, we can identify the geographic region in which individual NDT test rows were geolocated using GIS SQL functions. NDT rows have a latitude and longitude annotation derived from geolocation of the originating IP address, and a GIS function can be used to identify which region that point is within.

This blog post provides an example query that uses a different geographic area, but demonstrates the approach you could use with the geographies in the NUTS tables to join individual NDT tests with the NUTS geographies.

So in summary, I would suggest that joining `measurement-lab.geographies.eu_NUTS_3_2021_01m` with `measurement-lab.statistics.v0_cities_asn` is not the right approach to obtaining cities aggregate data for NUTS3 geographies.

Instead, you could take the base query that generates the cities-asn statistics, and modify it to select only tests within EU NUTS 3 geographies, using the Geographic join approach described above and in the blog post.

I hope this is helpful.

Best regards,

Chris - M-Lab Support

Bradley Kalgovas

unread,

Jul 30, 2021, 2:24:24 PM7/30/21

to discuss, Chris Ritzo, Bradley Kalgovas

Hi Chris, Thanks so much for the info, we ran the following query and got this error message: SELECT * FROM `measurement-lab.ndt.unified_uploads_20201026x` LIMIT 1

Yesterday it was working fine if you can figure out what happened that would be great :)

Kind regards,

Bradley

Message has been deleted

Chris Ritzo

unread,

Jul 30, 2021, 2:27:22 PM7/30/21

to discuss, b.ra...@gmail.com

Hi Bradley,

The error message means you have to use a date limiter such as WHERE date >= "2021-01-01" in your SQL statement. This is the default for our tables so if it worked yesterday that would be an anomaly. This constraint is applied to our tables and views to encourage efficient queries. Just add something like the above to your query and it should work fine.

Best,
Chris

Bradley Kalgovas

unread,

Aug 2, 2021, 11:05:08 AM8/2/21

to discuss, Chris Ritzo, Bradley Kalgovas

Hi Chris, one final question. Sorry to bug you. In the cities table ( v0_cities_asn ) , there is min ( download_MIN ) and max download ( download_MAX ) speeds, but we only see average in the NDT ( unified_downloads ) table. How can we get min and max from the NDA ( unified_downloads) table? Thanks!

Chris Ritzo

unread,

Aug 2, 2021, 11:13:47 AM8/2/21

to discuss, b.ra...@gmail.com, Chris Ritzo

Hello Bradley, It's no problem.

I think an important distinction to note here is that ndt.unified_downloads and ndt.unified_uploads contain individual tests, where the tables in the statistics dataset, such as v0_cities_asn contain aggregated data based on the unified_downloads and unified_uploads tables.

The average field you are referring to in unified_downloads (meanThroughputMbps) is the throughput measurement of an individual test. When an NDT test is run, the server captures a series of snapshots or samples over the time the test is run. These samples are used to calculate meanThroughputMbps. You should not consider this value a mean or average really, unless you are exploring data from individual snapshots or samples within a single NDT test.

If you are wanting to get Min and Max from unified_downloads, I would expect you are wanting those metrics over all tests within a time range and location. You can use BigQuery's aggregation functions, combined with GROUP BY statements in your queries to do this.

See:

https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions

https://cloud.google.com/bigquery/docs/reference/standard-sql/statistical_aggregate_functions

I hope this helps. - Chris

b.ra...@gmail.com

unread,

Aug 2, 2021, 1:07:25 PM8/2/21

to Chris Ritzo, discuss

Hi Chris,

Thank you for your help, I think we almost there.

One more clarification questions: if city aggregation tables forming from NDT table, how exactly is it aggregated: minimum of all the tests in the area, average of the bottom 5-10%, or any other way?

Thank you! Also likewise how is the max calculated.

Kind regards,

Bradley

From: Chris Ritzo <cri...@measurementlab.net>
Sent: Monday, August 2, 2021 8:14 AM
To: discuss <dis...@measurementlab.net>
Cc: b.ra...@gmail.com <b.ra...@gmail.com>; Chris Ritzo <cri...@measurementlab.net>
Subject: Re: Connecting NUTS 3 (eu_NUTS_3_2021_01m) to Cities (v0_cities_asn)

Hello Bradley, It's no problem.

I think an important distinction to note here is that ndt.unified_downloads and ndt.unified_uploads contain individual tests, where the tables in the statistics dataset, such as v0_cities_asn contain aggregated data based on the unified_downloads and unified_uploads tables.

The average field you are referring to in unified_downloads (meanThroughputMbps) is the throughput measurement of an individual test. When an NDT test is run, the server captures a series of snapshots or samples over the time the test is run. These samples are used to calculate meanThroughputMbps. You should not consider this value a mean or average really, unless you are exploring data from individual snapshots or samples within a single NDT test.

If you are wanting to get Min and Max from unified_downloads, I would expect you are wanting those metrics over all tests within a time range and location. You can use BigQuery's aggregation functions, combined with GROUP BY statements in your queries to do this.

See:

https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions

https://cloud.google.com/bigquery/docs/reference/standard-sql/statistical_aggregate_functions

I hope this helps. - Chris

On Monday, August 2, 2021 at 11:05:08 AM UTC-4 b.ra...@gmail.com wrote:

Hi Chris, one final question. Sorry to bug you. In the cities table ( v0_cities_asn ) , there is min ( download_MIN ) and max download ( download_MAX ) speeds, but we only see average in the NDT ( unified_downloads ) table. How can we get min and max from the NDA ( unified_downloads) table? Thanks!

On Friday, 30 July 2021 at 11:27:22 UTC-7 Chris Ritzo wrote:

Hi Bradley,

The error message means you have to use a date limiter such as WHERE date >= "2021-01-01" in your SQL statement. This is the default for our tables so if it worked yesterday that would be an anomaly. This constraint is applied to our tables and views to encourage efficient queries. Just add something like the above to your query and it should work fine.

Best,
Chris

On Friday, July 30, 2021 at 2:24:24 PM UTC-4 b.ra...@gmail.com wrote:

Hi Chris, Thanks so much for the info, we ran the following query and got this error message: SELECT * FROM `measurement-lab.ndt.unified_uploads_20201026x` LIMIT 1

Yesterday it was working fine if you can figure out what happened that would be great :)

Kind regards,
Bradley

Chris Ritzo

unread,

Aug 2, 2021, 1:08:54 PM8/2/21

to discuss, b.ra...@gmail.com

Hi Bradley,

This information is described in the stats-pipeline documentation. I would suggest you start there and follow up with any questions or clarifications that are needed.