Issue with Geolocation Granularity in M-Lab NDT Data (Bahrain Use Case)

54 views
Skip to first unread message

fatma almahfood

unread,
Apr 6, 2026, 6:23:11 AM (20 hours ago) Apr 6
to discuss

Hello,

I am currently working on a project analyzing broadband performance in Bahrain by combining M-Lab NDT data with cellular tower locations to identify coverage gaps and performance issues.

I have attempted to extract speed test data using multiple tables, including:

  • measurement-lab.ndt.unified_downloads
  • measurement-lab.ndt.unified_uploads
  • measurement-lab.ndt.ndt7

Example queries I used:

SELECT DISTINCT
    client.Geo.City,
    client.Geo.Latitude,
    client.Geo.Longitude,
    client.Network.ASName,
    a.MeanThroughputMbps AS throughput_mbps,
    a.Direction AS direction
FROM
    `measurement-lab.ndt.unified_downloads`
WHERE
    client.Geo.CountryCode = "BH"
    AND date BETWEEN "2020-01-01" AND "2026-02-28"
    AND client.Geo.Latitude IS NOT NULL
    AND client.Geo.Longitude IS NOT NULL;


SELECT
    a.TestTime AS test_time,
    client.Geo.City AS city,
    client.Geo.CountryName AS country,
    client.Geo.Subdivision1Name AS region,
    client.Geo.Latitude AS latitude,
    client.Geo.Longitude AS longitude,
    client.Network.ASName AS ISP_name,
    a.MeanThroughputMbps AS throughput_mbps,
    a.Direction AS direction,
    a.MinRTT AS latency_ms
FROM
    `measurement-lab.ndt.ndt7`
WHERE
    client.Geo.CountryCode = 'BH'
    AND date BETWEEN '2020-01-01' AND '2024-12-31'
    AND client.Geo.Latitude IS NOT NULL
    AND client.Geo.Longitude IS NOT NULL;


SELECT *
FROM (
  SELECT
    a.TestTime AS test_date,
    client.Geo.City AS city,
    client.Geo.CountryName AS country,
    client.Geo.Subdivision1Name AS region,
    client.Geo.Latitude AS latitude,
    client.Geo.Longitude AS longitude,
    client.Network.ASName AS ISP_name,

    -- Fix: upload vs download naming
    a.MeanThroughputMbps AS upload_mbps,
    a.Direction AS direction,

    a.MinRTT AS latency_ms,

    -- Grid creation
    CAST(ROUND(client.Geo.Latitude, 3) AS STRING) AS lat_grid,
    CAST(ROUND(client.Geo.Longitude, 3) AS STRING) AS lon_grid,

    -- Keep latest record per grid
    ROW_NUMBER() OVER (
      PARTITION BY
        CAST(ROUND(client.Geo.Latitude, 3) AS STRING),
        CAST(ROUND(client.Geo.Longitude, 3) AS STRING)
      ORDER BY a.TestTime DESC
    ) AS rn

  FROM
    `measurement-lab.ndt.unified_uploads`

  WHERE
    client.Geo.CountryCode = "BH"
    AND date BETWEEN "2020-01-01" AND "2026-12-31"
    AND client.Geo.Latitude IS NOT NULL
    AND client.Geo.Longitude IS NOT NULL
    AND client.Geo.AccuracyRadiusKm <= 5
)

WHERE rn = 1;

However, I am consistently encountering the following issue:

  • Many records share identical latitude/longitude values
  • In some cases, all measurements within a city appear at the same coordinates
  • This results in very limited spatial distribution when visualizing the data

This is problematic for my use case, as my project requires:

  • Mapping speed measurements to specific areas/blocks within Bahrain
  • Comparing performance against nearby cellular tower locations
  • Identifying localized coverage gaps and infrastructure issues

At the moment, the data appears too spatially aggregated to support this level of analysis.


My questions:

  1. Are the client.Geo.Latitude and client.Geo.Longitude fields intentionally coarsened or anonymized (e.g., city-level or grid-level resolution)?
  2. Is there any dataset, field, or method within M-Lab that provides higher-resolution geolocation for NDT tests?
  3. Are there recommended approaches for performing fine-grained spatial analysis using M-Lab data (e.g., grid aggregation, alternative datasets, or APIs)?
  4. Would applying filters such as client.Geo.AccuracyRadiusKm meaningfully improve spatial precision?

Pavlos Sermpezis

unread,
Apr 6, 2026, 6:48:06 AM (20 hours ago) Apr 6
to discuss, fatmaal...@gmail.com
Hello, 

1. M-Lab uses MaxMind's GeloLite v2 database geolocation to annotate tests. Geo granularity in intentionally kept coarse for privacy reasons, because all M-Lab's data become publicly accessible. Also city level accuracy may have limitations (see, e.g., https://www.measurementlab.net/blog/improving-m-lab-geolocation/#geolocation-in-the-public-dataset

2/3. There are alternative datasets that provide geolocation at finer resolution, e.g., MaxMind GeoIP (paid), IPinfo, and others. However, M-Lab does not uses any of them. You would need to access the data and annotate M-Lab's tests.

4. The field AccuracyRadiusKm comes from MaxMind's database. It can indeed be used as a filter to improve precision. However, it is still an estimate. 

Finally, we would recommend using the ndt7_union table instead of the ndt7 table, since it includes measurements from more servers. https://www.measurementlab.net/blog/dynamic-data-bq/

Best regards,
Pavlos
Reply all
Reply to author
Forward
0 new messages