Can i use M-Lab data to estimate the market share of fixed broadband ISPs

232 views
Skip to first unread message

Le Yan

unread,
May 29, 2024, 1:33:37 PMMay 29
to discuss
Hi, 

I am new to the group and i have not spent enough time on the M-Lab data yet. Just want to bounce the idea around and see if anyone did this before. 

Can i use the M-Lab data to understand the market share of fixed broadband ISPs?  Since M-Lab speed test is a larege sample of internet users. If ISP information is included in the data, I should be able to estimate the market share of each ISP, right?

Any suggestions/comments are deeply appreciated!

Regards

Le

Fabion Kauker

unread,
May 29, 2024, 1:40:53 PMMay 29
to Le Yan, discuss
Yes and no.

The hard part is the geo granularity. I've found that the city level is OK but beyond that you need to augment with other data sets.

Here are two samples that can be obtained from MLAB using some queries.

Let me know what you think.

Fabion

--
You received this message because you are subscribed to the Google Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@measurementlab.net.
To view this discussion on the web visit https://groups.google.com/a/measurementlab.net/d/msgid/discuss/d83fad0a-9950-4b10-971e-09d5e5cf9f9fn%40measurementlab.net.
sample_ip.csv
sample_all.csv

Felipe sarmiento

unread,
May 29, 2024, 1:58:07 PMMay 29
to Fabion Kauker, Le Yan, discuss

Hi there,

At the national level or for large cities with sufficient sample sizes, the data could potentially provide some indication of market share. However, keep in mind that individual users can perform multiple tests, which might skew the results.

I would recommend using M-Lab data more for analyzing ISP footprints rather than market share. For reliable market share data, it's usually best to refer to official sources from the ICT ministry or the National Regulatory Agency in the relevant country.


Lai Yi Ohlsen

unread,
May 29, 2024, 2:12:41 PMMay 29
to Felipe sarmiento, Fabion Kauker, Le Yan, discuss
Hey all, 

Thanks for the question and thoughts here. For context, M-Lab collects and publishes about 4 million measurements per day from integrations of our open source measurements globally. These are active measurements, meaning users have to opt in to them, though some do so on an automatic, recurring basis. In other words, we're not doing active scans of Internet, we're depending on users to run tests. 

This means that the devices and networks represented in M-Lab's dataset are not conclusive representations of all users or networks in a region or the market share. And to the point of automatic testing, I'd recommend aggregating per IP address to ensure that no one client over represents a particular network.

It's also worth noting that the majority of tests come from the Google Search integration of NDT, which is accessed through the browser and like all data collection techniques, this comes with its own biases that might affect how representative the data is of the market. 

Will also add that I tend to think one of the strengths of our data is the longitudinal consistency. For example, because M-Lab has been collecting and publishing data for 15 years, if representation of an ISP increases in a region over time, that might suggest they have increased their presence. 

All of this said, depending on the data volume in the region and if analyzed with all the above in mind, it could be quite informative and at least give you a starting point for what's available. Agree with others that it will be best used as a complement to other regional authorities. 

Happy to discuss in more detail on a call (with others on thread as well, if wanted). 



--
Lai Yi Ohlsen



Le Yan

unread,
May 29, 2024, 3:10:44 PMMay 29
to Felipe sarmiento, Fabion Kauker, discuss
Thank you both, very helpful!

For duplications (single user doing multiple tests), i assume i can deup based on Laitude/Longitude/IP? 




Le Yan

unread,
May 29, 2024, 3:10:47 PMMay 29
to discuss, la...@measurementlab.net, f.ka...@gmail.com, Le Yan, discuss, thecolombia...@gmail.com
Thanks Lai Yi. I think the biggest issue might be sample bias. The fiber users are a lot less likely to do the speed test, compared to the Cable or even copper wire user. As a result the Fiber provider's share observed in the data set will be much smaller than reality. 

Any thoughts on this?

Bradley Kalgovas

unread,
Jul 8, 2024, 12:11:53 PM (9 days ago) Jul 8
to discuss, yann...@gmail.com, la...@measurementlab.net, f.ka...@gmail.com, discuss, thecolombia...@gmail.com
Hi All,  Doing a similar thing. I was thinking that a similar dedup might be better do to based on UUID. Does that stand for unique user id? Thanks!

Marc Goldburg

unread,
Jul 10, 2024, 1:06:06 PM (7 days ago) Jul 10
to discuss, yann...@gmail.com
APNIC publishes estimates of numbers of customers per AS, which is a proxy for ISP size.  The methodology is based on ad tracking, and described here and here.

This approach may be less affected by the self-selection bias than MLAB results (Google search users, users experiencing connection issues, etc.).

Marc

Bradley Kalgovas

unread,
Jul 10, 2024, 4:51:17 PM (7 days ago) Jul 10
to discuss, ma...@connectivitycap.com, yann...@gmail.com

Hi Marc, Thanks so much for your help. Totally understand and that is a great check. 

Quick question - how can i approximate if someone is running multiple speed tests at the same household/ property/ location - I just want to get a sense of people running duplicate speed tests and was thinking of using IP, lat and long as a proxy. However I noticed that IP looked like public IP and I heard that sometimes those are shared (are they?) and lat and long look to be at city level. Are there any ways to do this?

Any help appreciated!

Marc Goldburg

unread,
Jul 11, 2024, 11:33:21 AM (6 days ago) Jul 11
to discuss, b.ra...@gmail.com, Marc Goldburg, yann...@gmail.com
Whether IP addresses are shared/reused, and how, is dependent on the ISP.  For IPv4, many fixed-line ISPs provide their end-users with unique public addresses that persist for long times (different from so-called static addresses that, for a fee, are guaranteed to not to change).  Others provide unique per-user addresses that change on a regular basis, e.g., every 24 hours.  Some fixed-line ISPs and most cellular ones -- e.g., home Internet via LTE -- use CGNAT to map multiple end-users to a single routeable address.  Starlink also uses CGNAT.

For IPv6 with its larger address space, my guess is that CGNAT is not used even with cellular connections.  v6 addresses may have long-term persistence or be updated on a regular basis.  There are major ISPs that follow each model.

If you're interested in a limited number of ISPs, you could research the address assignment scheme of each and come up with an algorithm to determine whether individual speed tests come from the same end-user.
Reply all
Reply to author
Forward
0 new messages