K6_http_req_duration_$quantile_stat Metrics are the Same Across Quantiles for Certain APIs

32 views
Skip to first unread message

Zhang Zhao

unread,
Oct 3, 2024, 8:59:29 PM10/3/24
to Prometheus Users
Background: 
I am working on integration between k6 (A performance testing framework) and Prometheus & Grafana so that the performance testing result metrics are feeding to Prometheus and the data is being presented on Grafana dashboard. 

Issue:
I am facing an issue with Prometheus when using k6 to send trend metrics with the $quantile_stat method. We're using the order method of sending metrics where the trend quantiles like http_req_duration_$quantile_stat are pre-calculated in k6 before being sent to Prometheus and displayed in Grafana.

When running a specific test case and switching the trend metric query to different quantile values in Grafana, the panels don't update properly. In each iteration of the test, a login API' is called, followed by one of several other APIs based on the challenge case selected, and then a cleanup API. The only API that seems to reflect changes when switching the quantile is the login API, while all other APIs remain static, showing no differences across the quantiles.

To troubleshoot, I viewed the Prometheus graphs for the k6_http_req_duration_$quantile_stat metrics. I plotted all the APIs on a single graph. Switching between quantile values did not cause any changes in the graphs except for the login API.

Attached are screenshots of the graphs showing the results for api1 with quantile min and max, and as you can see, there are the same.

Test case code:
function run(data, challengeCase) { login_api() switch (challengeCase) { case 1: api1(data); break; case 2: api2(data); break; ... case 8: api8(data); break; case 9: break; } cleanup_api(data); } export function testName(data) { let caseNum = randomIntBetween(1, 8); run(data, caseNum); }

In this setup, each case triggers a different API call, with a “cleanup” API running at the end of each iteration. In Prometheus, when graphing k6_http_req_duration_$quantile_stat for each API, the login API is the only one that changes when the $quantile_stat is modified, while the others remain unaffected by the $quantile_stat. I initially thought this might be because the login API runs with every iteration, which could explain why it changes with the quantile. However, the cleanup API also runs at the end of every iteration, yet its metrics remain static regardless of the quantile.


Additional Tests

Since this test case is part of a larger codebase with many dependencies, I wanted to isolate the issue. To do so, I created a custom test case with dummy API calls, similar to this one, and when I reran the test, everything worked perfectly — the quantile metrics updated as expected across the board.

This leaves me wondering if there’s something specific about my original test case or APIs causing the minp90p95p99, and max values to remain the same for an API, regardless of the quantile.

Has anyone experienced this before or have any ideas why the quantiles wouldn’t change for an API with this type of test case or executor? Could there be something I’m overlooking that causes the values to remain identical for different quantiles?

Reference:

The integration uses Prometheus Remote Write to feed k6 metrics to Prometheus.

https://grafana.com/docs/k6/latest/results-output/real-time/prometheus-remote-write/#send-test-metrics-to-a-remote-write-endpoint



ZZ


max.png
min.png

Brian Candler

unread,
Oct 4, 2024, 5:15:06 AM10/4/24
to Prometheus Users
On Friday 4 October 2024 at 01:59:29 UTC+1 Zhang Zhao wrote:
When running a specific test case and switching the trend metric query to different quantile values in Grafana, the panels don't update properly.

I think you should first remove Grafana from the equation entirely. If the problem is something to do with Grafana, e.g. Grafana dashboard variables, then the appropriate place to ask would be the Grafana Community: https://community.grafana.com/

However, in this case it seems here that the problem is likely how you are generating the metrics in the first place and submitting them using the Remote Write protocol. You haven't shown any code which does that. If that code is part of the "k6" framework that you refer to, then probably the place you should be asking is on a discussion group for that framework.

Is "$quantile_stat" a feature of Grafana or k6? That should help you decide where to focus your attention.

If you still think the issue is to do with Prometheus, then you should reproduce your problem using only Prometheus components (e.g. the Prometheus web interface, which directly talks to the Prometheus web UI). You'd also need to basic information to allow the problem to be reproduced, such as what version of Prometheus you're running, and samples of the remote write requests.

I would say that in general, Prometheus is very good at faithfully storing the data you give it, so if you see a problem it's likely to be "garbage in, garbage out". But if you're using one of the more bleeding-edge features like native histograms, then it's possible that you've found a Prometheus issue.
Reply all
Reply to author
Forward
0 new messages