Background:
I am working on integration between k6 (A performance testing framework) and Prometheus & Grafana so that the performance testing result metrics are feeding to Prometheus and the data is being presented on Grafana dashboard.
Issue:
I am facing an issue with Prometheus when using k6 to send trend metrics with the $quantile_stat method. We're using the order method of sending metrics where the trend quantiles like http_req_duration_$quantile_stat are pre-calculated in k6 before being sent to Prometheus and displayed in Grafana.
When running a specific test case and switching the trend metric query to different quantile values in Grafana, the panels don't update properly. In each iteration of the test, a login API' is called, followed by one of several other APIs based on the challenge case selected, and then a cleanup API. The only API that seems to reflect changes when switching the quantile is the login API, while all other APIs remain static, showing no differences across the quantiles.
To troubleshoot, I viewed the Prometheus graphs for the k6_http_req_duration_$quantile_stat metrics. I plotted all the APIs on a single graph. Switching between quantile values did not cause any changes in the graphs except for the login API.
Attached are screenshots of the graphs showing the results for api1 with quantile min and max, and as you can see, there are the same.
Test case code:
function run(data, challengeCase) {
login_api()
switch (challengeCase) {
case 1:
api1(data);
break;
case 2:
api2(data);
break;
...
case 8:
api8(data);
break;
case 9:
break;
}
cleanup_api(data);
}
export function testName(data) {
let caseNum = randomIntBetween(1, 8);
run(data, caseNum);
}
In this setup, each case triggers a different API call, with a “cleanup” API running at the end of each iteration. In Prometheus, when graphing k6_http_req_duration_$quantile_stat for each API, the login API is the only one that changes when the $quantile_stat is modified, while the others remain unaffected by the $quantile_stat. I initially thought this might be because the login API runs with every iteration, which could explain why it changes with the quantile. However, the cleanup API also runs at the end of every iteration, yet its metrics remain static regardless of the quantile.
Additional TestsSince this test case is part of a larger codebase with many dependencies, I wanted to isolate the issue. To do so, I created a custom test case with dummy API calls, similar to this one, and when I reran the test, everything worked perfectly — the quantile metrics updated as expected across the board.
This leaves me wondering if there’s something specific about my original test case or APIs causing the min, p90, p95, p99, and max values to remain the same for an API, regardless of the quantile.
Has anyone experienced this before or have any ideas why the quantiles wouldn’t change for an API with this type of test case or executor? Could there be something I’m overlooking that causes the values to remain identical for different quantiles?
Reference:
The integration uses Prometheus Remote Write to feed k6 metrics to Prometheus.
https://grafana.com/docs/k6/latest/results-output/real-time/prometheus-remote-write/#send-test-metrics-to-a-remote-write-endpoint
ZZ