I've encountered some issues and questions while working with MSAK data. I'd greatly appreciate any insights or guidance on the following matters:
1. Querying the measurement-lab.msak.throughput1 tableWhen I try to retrieve mlab msak data, I first tried to query the measurement-lab.msak.throughput1 table like this to see its structure:
SELECT
*
FROM
`measurement-lab.msak.throughput1`
WHERE
date BETWEEN '2024-09-25' AND '2024-09-27'
LIMIT 1
I receive the following error: Cannot query over table 'mlab-oti.raw_msak.annotation2' without a filter over column(s) 'date' that can be used for partition elimination.
This is puzzling because I am using date as a partition. Could someone explain why this error occurs and how to resolve it?
2. Discrepancy in number of entries retrieved
I've recorded the IPv6 address and time for each device before and after each M-Lab speedtest. However, when I query the measurement-lab.msak_raw.throughput1 table using the IPv6 address and date, I'm getting more entries than the actual number of tests performed.
For example, with this query:
SELECT
*
FROM `measurement-lab.msak_raw.throughput1`
WHERE
date BETWEEN "2024-09-27" AND "2024-09-28"
AND raw.Client LIKE '%specific_ipv6_address%'
AND raw.Direction = 'download'
ORDER BY date DESC
I performed around 16 tests, but I'm getting 63 entries. This pattern repeats for other devices I've tested - about 15-16 tests yield around 60 entries.
Questions:
Lastly, I'd like to confirm the correct method for calculating download/upload speeds from the retrieved data. Currently, I'm using this approach:
Is this calculation method correct? Are there any direct ways to obtain these speeds from the data?
Thank you in advance for any help or clarification you can provide. Your insights will be invaluable for my research.
Best,
Jiayi Liu
--
You received this message because you are subscribed to the Google Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@measurementlab.net.
To view this discussion on the web visit https://groups.google.com/a/measurementlab.net/d/msgid/discuss/0d0711a4-ac32-48a5-9333-066d8360d6d2n%40measurementlab.net.

This returned 40 rows. After grouping by MeasurementID as you suggested, I got the following result:These results look OK to me. I've looked at the raw.ClientMetadata fields to find out the client: the single-stream tests are from speed.measurementlab.net where we're currently running a calibration experiment against ndt7 and testing MSAK with different configurations (at that time, MSAK was running with a single BBR stream). Tests with 3 streams did not provide a "client_name" querystring parameter -- if that's your own client, I recommend sending "client_name" so they are easier to identify in the BQ data.Then, I filtered these rows based on the start and end times I recorded for each M-Lab test, keeping only the rows whereraw.startTimefalls within these intervals. Here's what I found after filtering:I believe the system clock on the machine where the client ran might be off by ~40 seconds, which in this case is impacting your analysis. All the M-Lab servers run systemd-timesyncd to keep the clocks synchronized via NTP. Data saved in BigQuery is generated by the MSAK server and the timestamp is the one from the server's system clock. I looked at the mlab1-nuq08 machine (the one from MeasurementID "5b0a5e50-2f71-4a86-8392-b83c01b05004") and the system clock is currently accurate to the second. I recommend not using the exact start time recorded by the client to aggregate the data (or, at least, account for +/- 1 minute in the query if you need to do so)I'm curious about your mention of 4 threads. Is it possible that they're being split into 1+3?No, you're just observing a combination of single-stream tests (the ones from speed.measurementlab.net) and tests from your client, which is configured to use 3 streams. I previously mentioned 4 streams as an example, MSAK's number of streams is configurable by the client and can change on a test-by-test basis.Regarding calculating download/upload speeds: Do you plan to create a unified view for MSAK data similar to the NDT? This would allow us to directly obtain download/upload speeds through a MeanThroughputMbps field, which would be very helpful for analysis.We plan on adding an msak.throughput1_uploads view which only contains uploads, aggregated with the same methodology as msak.throughput1_downloads.Let me know if you have any further questions!P.S. I don't know if it was intentional, but your reply didn't go to the Discuss group but to me only. I did not cc the group, but feel free to do it again if needed!-RobertoOn Thu, Oct 3, 2024 at 8:13 AM Jiayi Liu <miali...@gmail.com> wrote:Hi Roberto,Thank you for your detailed response. I'm glad the issue with querying measurement-lab.msak.throughput1 has been resolved.Regarding the discrepancy in the number of entries retrieved, I've done some further investigation based on your explanation. Here's what I found:I queried the MSAK data using this SQL:
SELECT
*
FROM `measurement-lab.msak_raw.throughput1`
WHERE
date BETWEEN "2024-09-29" AND "2024-09-30"
AND raw.Client LIKE '%2600:1010:b047:e1eb:f0a5:4cd4:4861:2b37%'
AND raw.Direction = 'download'
ORDER BY date ASCThis returned 40 rows. After grouping by MeasurementID as you suggested, I got the following result:Then, I filtered these rows based on the start and end times I recorded for each M-Lab test, keeping only the rows whereraw.startTimefalls within these intervals. Here's what I found after filtering:Interestingly, each MeasurementID within the time intervals appears only once. I've verified this pattern across multiple devices.
I'm curious about your mention of 4 threads. Is it possible that they're being split into 1+3? If so, I'm wondering why, for example, the MeasurementID 5b0a5e50-2f71-4a86-8392-b83c01b05004 has 3 rows all with a Start Time of 2024-09-30 11:21:25 (PDT), which is earlier than the start time I recorded for my first experiment? I have attached the related files below.
Regarding calculating download/upload speeds: Do you plan to create a unified view for MSAK data similar to the NDT? This would allow us to directly obtain download/upload speeds through a MeanThroughputMbps field, which would be very helpful for analysis.
Thank you again for your assistance. I look forward to your insights on these follow-up questions.Best,Jiayi Liu
Based on your explanation, am I correct in understanding that the current web-based MSAK tests on https://speed.measurementlab.net are single-stream? Upon reviewing my filtered data, I noticed that for the MeasurementIDs with 3 rows, their client metadata indeed doesn't include browser information like "firefox".
For NDT data:
Should I use MeanThroughputMbps for both Download and Upload speeds?
For latency, should I use the minRTT from the download test?
For MSAK data:
Should I focus only on the results with a single row?
For these single-row results, should I calculate the speed using Application.BytesReceived?
Thank you again for your help in clarifying these points. Your guidance is invaluable in ensuring the accuracy of my comparative analysis.