Metric similar to iostat's "avgqu-sz"?

438 views
Skip to first unread message

fntn...@gmail.com

unread,
Nov 28, 2018, 11:50:48 AM11/28/18
to Prometheus Users
Hello,

I'm starting to use Prometheus.
After searching a lot, I couldn't find a way to get the equivalent of "avgqu-sz" from the following output by "iostat -nz 1" command:


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00


Is there any idea on how to get it (or similar) from Prometheus?

Best,
Francisco Neves

Ben Kochie

unread,
Nov 28, 2018, 2:11:47 PM11/28/18
to fntn...@gmail.com, Prometheus Users
Hrm, queue time is a good question. I don't remember exactly what metric that translates to.

One thing I look at is node IO time, for example:

rate(node_disk_io_time_seconds_total[1m])

This tells me how many seconds per second (utilization) a disk has.

What you're looking for generally would come from the node_exporter.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a7cbf9c2-bf47-4409-999a-a750f0198626%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Francisco Neves

unread,
Nov 29, 2018, 5:43:29 AM11/29/18
to sup...@gmail.com, promethe...@googlegroups.com
Thank you for your reply.

I am also using IO time but for measuring Disk utilisation. 
However, I am looking for a metric that reveals Disk saturation, and queue length would be great!

If there's no metric that reveals that, maybe it could be done for the next release.

Best,
Francisco Neves

Ben Kochie

unread,
Nov 29, 2018, 8:40:17 AM11/29/18
to Francisco Neves, Prometheus Users
rate(node_disk_io_time_seconds_total[1m) is saturation.

I'm reading the source code of iostat to see if I can better understand what it's doing.

It uses a macro:
/* With S_VALUE macro, the interval of time (@p) is given in 1/100th of a second */
#define S_VALUE(m,n,p)    (((double) ((n) - (m))) / (p) * 100)

Then the actual math is:

S_VALUE(ioj->rq_ticks, ioi->rq_ticks, itv) / 1000.0)

The variables used:
 * @itv   Interval of time.
 * @ioi   Current sample statistics.
 * @ioj   Previous sample statistics.

rq_ticks comes from reading /sys/block/$device/stat, which is documented[0] as:
time_in_queue milliseconds total wait time for all requests

The node_exporter currently uses /proc/diskstats[1] to collect all data about IO, which doesn't seem to include the queue time data at all. :-(

I've filed https://github.com/prometheus/node_exporter/issues/1179 in order to discuss this feature request.

If anyone in the community would like to help with this, I am happy to review the code.


Ben Kochie

unread,
Dec 6, 2018, 4:45:15 PM12/6/18
to Francisco Neves, Prometheus Users
After some investigation, it looks like the metric isn't missing at all.

avgqu-sz is reported by rate(node_disk_io_time_weighted_seconds_total[5m])

Chris Bulleri

unread,
Mar 20, 2019, 1:53:15 PM3/20/19
to Prometheus Users
I'm new to Prometheus and using the basic node_exporter and I'm looking for the results that you would see in iostat -xkz 1 15 or something like that.  

Have you done anything else like that to follow on.  Looking for an iostat dashboard like the followinghttps://grafana.com/dashboards/4857, have you gone down that path?

Ben Kochie

unread,
Mar 20, 2019, 3:09:53 PM3/20/19
to Chris Bulleri, Prometheus Users
I guess I hadn't thought about doing a direct translation of iostat.


As you can see, not everything is a direct translation. For example, it auto-scales b/kb/mb/etc for throughput graphs, rather than be strictly kb/sec. Also the values for average read/write size (avgrq-sz avgqu-sz) are technically in 512b sectors in iostat, not bytes.

Chris Bulleri

unread,
Mar 20, 2019, 7:36:44 PM3/20/19
to Prometheus Users
Could you share the JSON.  I was looking at the iostat.c in github and was going to look at doing a translation but it looks like you already have something.  I would love to check it out and if I can make it better would of course share back.

Brian Brazil

unread,
Mar 20, 2019, 7:58:49 PM3/20/19
to Chris Bulleri, Prometheus Users
On Wed, 20 Mar 2019 at 23:36, Chris Bulleri <chris....@gmail.com> wrote:
Could you share the JSON.  I was looking at the iostat.c in github and was going to look at doing a translation but it looks like you already have something.  I would love to check it out and if I can make it better would of course share back.


For more options, visit https://groups.google.com/d/optout.

Harald Koch

unread,
Mar 20, 2019, 9:29:34 PM3/20/19
to Prometheus Users
On Wed, Mar 20, 2019, at 19:58, Brian Brazil wrote:

Is there a cut and paste error in that article? To my layman's eyes, it looks like the calculations for "avgrq-sz" and "await" are using the same metrics and formula.

--
Harald Koch

Ben Kochie

unread,
Mar 21, 2019, 3:27:02 AM3/21/19
to Chris Bulleri, Prometheus Users
You can download the json from our Grafana instance. Just click the "share" button in the upper right.

Brian Brazil

unread,
Mar 21, 2019, 4:58:25 AM3/21/19
to Harald Koch, Prometheus Users
Yip, fixed. Thanks.

Brian
 

--
Harald Koch

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chris Bulleri

unread,
Mar 21, 2019, 8:11:54 AM3/21/19
to Prometheus Users
So this is what I see when doing an iostat -xky and I've checked the article out(thank you).  Any incite into the others?  Not to learn about templates to get the variables correct.  Thank you again.

r/s     -- Yes 
w/s     -- Yes 
rkB/s     -- Yes
wkB/s   -- Yes
rrqm/s   -- Yes
wrqm/s  -- Yes
%rrqm  -- Yes
%wrqm-- Yes
r_await  -- Yes
w_await  -- Yes
aqu-sz -- I didn't see this calculation
rareq-sz  -- I didn't see this calculation
wareq-sz   -- I didn't see this calculation
svctm   -- Yes
%util -- Yes

Brian Brazil

unread,
Mar 21, 2019, 8:14:16 AM3/21/19
to Chris Bulleri, Prometheus Users
On Thu, 21 Mar 2019 at 12:11, Chris Bulleri <chris....@gmail.com> wrote:
So this is what I see when doing an iostat -xky and I've checked the article out(thank you).  Any incite into the others?  Not to learn about templates to get the variables correct.  Thank you again.

r/s     -- Yes 
w/s     -- Yes 
rkB/s     -- Yes
wkB/s   -- Yes
rrqm/s   -- Yes
wrqm/s  -- Yes
%rrqm  -- Yes
%wrqm-- Yes
r_await  -- Yes
w_await  -- Yes

 
aqu-sz -- I didn't see this calculation
rareq-sz  -- I didn't see this calculation
wareq-sz   -- I didn't see this calculation

These are also known as avgqu-sz and friends

Brian
 
svctm   -- Yes
%util -- Yes

On Wednesday, November 28, 2018 at 11:50:48 AM UTC-5, fntn...@gmail.com wrote:
Hello,

I'm starting to use Prometheus.
After searching a lot, I couldn't find a way to get the equivalent of "avgqu-sz" from the following output by "iostat -nz 1" command:


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00


Is there any idea on how to get it (or similar) from Prometheus?

Best,
Francisco Neves

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Harald Koch

unread,
Mar 25, 2019, 6:03:06 PM3/25/19
to Prometheus Users


On Thu, Mar 21, 2019, at 04:58, Brian Brazil wrote:
On Thu, 21 Mar 2019 at 01:29, Harald Koch <c...@pobox.com> wrote:

On Wed, Mar 20, 2019, at 19:58, Brian Brazil wrote:

Is there a cut and paste error in that article? To my layman's eyes, it looks like the calculations for "avgrq-sz" and "await" are using the same metrics and formula.

Yip, fixed. Thanks.

Thanks! This helped me discover, and then eliminate, a performance issue with ActiveMQ and excessive disk writes.

--
Harald

Ben Kochie

unread,
Mar 25, 2019, 6:11:03 PM3/25/19
to Harald Koch, Prometheus Users
Nice, maybe you can write up a little blog post about it? Grab a couple graph screenshots, describe what you found/fixed? Might be a nice story for https://prometheus.io/blog/


--
Harald

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Chris Siebenmann

unread,
Mar 25, 2019, 9:51:20 PM3/25/19
to Harald Koch, Prometheus Users, cks.prom...@cs.toronto.edu
There is a minor issue with avgqu-sz (which has been renamed to 'aqu-sz'
in recent enough versions of Linux iostat); while it's based on field 11,
node_disk_io_time_weighted_seconds_total, I believe that it has to be
computed as the rate of node_disk_io_time_weighted_seconds_total divided
by the rate of node_disk_io_time_seconds_total.

(Actually looking at the current iostat code for this makes my head
hurt. I believe that dividing weighted seconds total by time seconds
total is the correct calculation for the average queue size, but I'm
not convinced that it's what the current iostat is doing.)

svctm is deprecated because it is basically meaningless, as you can
tell from the calculation. It would be nice to actually know the device
service time, but Linux doesn't provide it; all you can get is the total
wait time, covering both the time a request spent in the queue and the
time it spent actually dispatched to the device.

- cks
(I have long standing opinions on Linux disk IO stats and iostat, and I
was quite pleased to discover that node_exporter exposes the raw kernel
stats intact apart from minor changes like mapping milliseconds into
seconds and 'sectors' into bytes, which makes the metrics far more natural
in Prometheus.)

Ben Kochie

unread,
Mar 26, 2019, 3:26:49 AM3/26/19
to Chris Siebenmann, Harald Koch, Prometheus Users
On Tue, Mar 26, 2019 at 2:51 AM Chris Siebenmann <cks.prom...@cs.toronto.edu> wrote:
> On Thu, Mar 21, 2019, at 04:58, Brian Brazil wrote:
> > On Thu, 21 Mar 2019 at 01:29, Harald Koch <c...@pobox.com> wrote:
> >> On Wed, Mar 20, 2019, at 19:58, Brian Brazil wrote:
> >>> https://www.robustperception.io/mapping-iostat-to-the-node-exporters-node_disk_-metrics
> >>> may be of use.
> >>
> >> Is there a cut and paste error in that article? To my layman's
> >> eyes, it looks like the calculations for "avgrq-sz" and "await" are
> >> using the same metrics and formula.
> >
> > Yip, fixed. Thanks.
>
> Thanks! This helped me discover, and then eliminate, a performance
> issue with ActiveMQ and excessive disk writes.

 There is a minor issue with avgqu-sz (which has been renamed to 'aqu-sz'
in recent enough versions of Linux iostat); while it's based on field 11,
node_disk_io_time_weighted_seconds_total, I believe that it has to be
computed as the rate of node_disk_io_time_weighted_seconds_total divided
by the rate of node_disk_io_time_seconds_total.

(Actually looking at the current iostat code for this makes my head
hurt. I believe that dividing weighted seconds total by time seconds
total is the correct calculation for the average queue size, but I'm
not convinced that it's what the current iostat is doing.)

I looked at that code as well, with the same reaction. :-)

I don't think it was dividing by the node_disk_io_time_seconds_total, just the number of seconds in the requested step. This is basically the same as taking a rate() in PromQL.
 

 svctm is deprecated because it is basically meaningless, as you can
tell from the calculation. It would be nice to actually know the device
service time, but Linux doesn't provide it; all you can get is the total
wait time, covering both the time a request spent in the queue and the
time it spent actually dispatched to the device.

        - cks
(I have long standing opinions on Linux disk IO stats and iostat, and I
was quite pleased to discover that node_exporter exposes the raw kernel
stats intact apart from minor changes like mapping milliseconds into
seconds and 'sectors' into bytes, which makes the metrics far more natural
in Prometheus.)

Yup, we prefer raw counter stats converted to base units in Prometheus.

 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Brian Brazil

unread,
Mar 26, 2019, 5:23:27 AM3/26/19
to Ben Kochie, Chris Siebenmann, Harald Koch, Prometheus Users
On Tue, 26 Mar 2019 at 07:26, Ben Kochie <sup...@gmail.com> wrote:


On Tue, Mar 26, 2019 at 2:51 AM Chris Siebenmann <cks.prom...@cs.toronto.edu> wrote:
> On Thu, Mar 21, 2019, at 04:58, Brian Brazil wrote:
> > On Thu, 21 Mar 2019 at 01:29, Harald Koch <c...@pobox.com> wrote:
> >> On Wed, Mar 20, 2019, at 19:58, Brian Brazil wrote:
> >>> https://www.robustperception.io/mapping-iostat-to-the-node-exporters-node_disk_-metrics
> >>> may be of use.
> >>
> >> Is there a cut and paste error in that article? To my layman's
> >> eyes, it looks like the calculations for "avgrq-sz" and "await" are
> >> using the same metrics and formula.
> >
> > Yip, fixed. Thanks.
>
> Thanks! This helped me discover, and then eliminate, a performance
> issue with ActiveMQ and excessive disk writes.

 There is a minor issue with avgqu-sz (which has been renamed to 'aqu-sz'
in recent enough versions of Linux iostat); while it's based on field 11,
node_disk_io_time_weighted_seconds_total, I believe that it has to be
computed as the rate of node_disk_io_time_weighted_seconds_total divided
by the rate of node_disk_io_time_seconds_total.

(Actually looking at the current iostat code for this makes my head
hurt. I believe that dividing weighted seconds total by time seconds
total is the correct calculation for the average queue size, but I'm
not convinced that it's what the current iostat is doing.)

I looked at that code as well, with the same reaction. :-)

The trick is to look at an older version of the code, makes it a small bit easier to understand :)

Brian
 

I don't think it was dividing by the node_disk_io_time_seconds_total, just the number of seconds in the requested step. This is basically the same as taking a rate() in PromQL.
 

 svctm is deprecated because it is basically meaningless, as you can
tell from the calculation. It would be nice to actually know the device
service time, but Linux doesn't provide it; all you can get is the total
wait time, covering both the time a request spent in the queue and the
time it spent actually dispatched to the device.

        - cks
(I have long standing opinions on Linux disk IO stats and iostat, and I
was quite pleased to discover that node_exporter exposes the raw kernel
stats intact apart from minor changes like mapping milliseconds into
seconds and 'sectors' into bytes, which makes the metrics far more natural
in Prometheus.)

Yup, we prefer raw counter stats converted to base units in Prometheus.

 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20190326015116.4F2B8320405%40apps1.cs.toronto.edu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chris Siebenmann

unread,
Mar 26, 2019, 11:16:39 AM3/26/19
to Ben Kochie, Chris Siebenmann, Harald Koch, Prometheus Users
> > There is a minor issue with avgqu-sz (which has been renamed to
> > 'aqu-sz' in recent enough versions of Linux iostat); while it's
> > based on field 11, node_disk_io_time_weighted_seconds_total,
> > I believe that it has to be computed as the rate of
> > node_disk_io_time_weighted_seconds_total divided by the rate of
> > node_disk_io_time_seconds_total.
> >
> > (Actually looking at the current iostat code for this makes my head
> > hurt. I believe that dividing weighted seconds total by time seconds
> > total is the correct calculation for the average queue size, but I'm
> > not convinced that it's what the current iostat is doing.)
>
> I looked at that code as well, with the same reaction. :-)
>
> I don't think it was dividing by the node_disk_io_time_seconds_total,
> just the number of seconds in the requested step. This is basically
> the same as taking a rate() in PromQL.

The two versions are the same at 100% utilization (ie, when
rate(node_disk_io_time_seconds_total) is 1), but give different
answers if the utilization is lower than 100%. Based on a simple
thought experiment, I believe that dividing by the time seconds
total instead of (implicitly) assuming 100% utilization has to
be correct.

The thought experiment: imagine that your queue size is always 1 and
you have IO to the device active for half a second. This makes the rate
of time_seconds_total 0.5 and, since the queue size is always one, also
makes the rate of time_weighted_seconds_total be 0.5.

(t_w_s_t will only be greater than t_s_t if the queue size is above one.
If the queue size is always one, they count at the same rate.)

This does mean that the current iostat code gives incorrect
numbers. It wouldn't be for the first time.

- cks
Reply all
Reply to author
Forward
0 new messages