Re: Increased read IO wait times after Bullseye upgrade

Gareth Evans

unread,

Nov 8, 2022, 9:30:05 PM11/8/22

to

On Tue 8 Nov 2022, at 09:48, Vukovics Mihály <v...@informatik.hu> wrote:
> Hello Community,
>
> since I have upgraded my debian 10 to 11, the read IO wait time of all
> disks have been increased dramatically.
[...]
> Chart attached, you can clearly see the date of the upgrade.
>
> Any ideas?

Hello,

I'm not an expert in the area, but a few thoughts:

---
(Seemingly unlikely issues)
Hardware problems coincident with upgrade - smartmon tests?
Fragmentation? What % capacity is used? What filesystem?
---

RAID status?

Have you rebooted?

After a quick web search, this article suggests iowait may not be the best indicator of problems (though not sure if this extends to wait times for particular operations)...

https://ioflood.com/blog/2021/01/25/iowait-in-linux-is-iowait-too-high/

...does iostat's %util give cause for concern? (per the article)

HDDs or SSDs? The article suggests ~100ms may be reasonble for HDDs - your chart only seems to show one outstandingly high wait time - do you actually notice a difference in performance?

Not sure if any of that helps but I will follow the thread with interest.

Nice chart btw. What produced the data and the chart itself?

Best wishes,
Gareth

Gareth Evans

unread,

Nov 10, 2022, 6:40:06 AM11/10/22

to

On Thu 10 Nov 2022, at 07:04, Vukovics Mihaly <v...@informatik.hu> wrote:
> Hi Gareth,
>
> - Smartmon/smarctl does not report any hw issues on the HDDs.
> - Fragmentation score is 1 (not fragmented at all)
> - 18% used only
> - RAID status is green (force-resynced)
> - rebooted several times
> - the IO utilization is almost zero(!) - chart attached
> - tried to change the io scheduler of the disks from mq-deadline to noop:
> does not bring change.
> - the read latency increased on ALL 4 discs by the bullseye upgrade!
> - no errors/warnings in kern.log/syslog/messages
>
> Br,
> Mihaly

Hi Mihaly,

People here often recommend SATA cable/connection checks - might this be worthwhile?

"[using]

iostat -mxy 10

[...]

If %util is consistently under 30% most of the time, most likely you don’t have a problem with disk i/o. If you’re on the fence, you can also look at r_await and w_await columns — the average amount of time in milliseconds a read or write disk request is waiting before being serviced — to see if the drive is able to handle requests in a timely manner. A value less than 10ms for SSD or 100ms for hard drives is usually not cause for concern,"

(-- from the article linked in my first response)

Are the {r,w}_await values what is shown in your first chart, inter alia? I imagine so.

Does performance actually seem to be suffering?

Unfortunately I can't try to replicate this as I don't have a HDD system, let alone MDRAID5, hence the rather basic questions, which you may well already have considered.

Write wait time has also increased somewhat, according to your first graph.

Is anything hogging CPU?

Free memory/swap usage seems unlikely to be the issue after several reboots.

I might be barking up the wrong tree here but do you know which kernel version you were using on Buster?

There were minor version changes to mdadm in Bullseye
https://packages.debian.org/buster/mdadm
https://packages.debian.org/bullseye/mdadm

which made me wonder if in-kernel parts of MD changed too.

It might be interesting to diff the relevant kernel sources between Buster and Bullseye, perhaps starting with drivers/md/raid5.c extracted from
/usr/src/linux-source-{KERNEL-VERSION}.tar.xz

https://packages.debian.org/search?keywords=linux-source&searchon=names&suite=oldstable&section=all

https://packages.debian.org/bullseye/linux-source-5.10

This assumes the identification of the driver in [3] (below) is anything to go by.

$ apt-file search md/raid456

[on Bullseye] seems to agree.

[though are sources up to date?...
$ uname -a
... 5.10.149-2 ...

vs

"Package: ... (5.10.140-1)"
https://packages.debian.org/bullseye/linux-source-5.10]

I'm somewhat out of my range of experience here and this would be very much "an exercise" for me.

I'm sorry I can't try to replicate the issue. Can you, on another system, if you have time?

Best wishes,
G

[1] "Initial driver: /lib/modules/4.18.0-305.el8.x86_64/kernel/drivers/md/raid456.ko.xz
I think this has been changed?"
https://grantcurell.github.io/Notes%20on%20mdraid%20Performance%20Testing/

Gareth Evans

unread,

Nov 10, 2022, 8:40:05 AM11/10/22

to

On Thu 10 Nov 2022, at 11:36, Gareth Evans <dono...@fastmail.fm> wrote:
[...]

> This assumes the identification of the driver in [3] (below) is
> anything to go by.

I meant [1] not [3].

Also potentially of interest:

"Queue depth

The queue depth is a number between 1 and ~128 that shows how many I/O requests are queued (in-flight) on average. Having a queue is beneficial as the requests in the queue can be submitted to the storage subsystem in an optimised manner and often in parallel. A queue improves performance at the cost of latency.

If you have some kind of storage performance monitoring solution in place, a high queue depth could be an indication that the storage subsystem cannot handle the workload. You may also observe higher than normal latency figures. As long as latency figures are still within tolerable limits, there may be no problem."

https://louwrentius.com/understanding-storage-performance-iops-and-latency.html

See

$ cat /sys/block/sdX/device/queue_depth

Gareth Evans

unread,

Nov 10, 2022, 10:20:05 AM11/10/22

to

On Thu 10 Nov 2022, at 11:36, Gareth Evans <dono...@fastmail.fm> wrote:
[...]

> I might be barking up the wrong tree ...

But simpler inquiries first.

I was wondering if MD might be too high-level to cause what does seem more like a "scheduley" issue -

https://www.thomas-krenn.com/de/wikiDE/images/d/d0/Linux-storage-stack-diagram_v4.10.pdf

but, apparently, even relatively "normal" processes can cause high iowait too:

NFS...
https://www.howtouselinux.com/post/troubleshoot-high-iowait-issue-on-linux-system
https://www.howtouselinux.com/post/use-linux-nfsiostat-to-troubleshoot-nfs-performance-issue

SSH...
Use iotop to find io-hogs...
https://www.howtouselinux.com/post/quick-guide-to-fix-linux-iowait-issue
https://www.howtouselinux.com/post/check-disk-io-usage-per-process-with-iotop-on-linux

What's a typical wa value (%) from top?

Thanks,
G

Vukovics Mihály

unread,

Nov 11, 2022, 2:50:06 AM11/11/22

to

Hello Gareth,

the average io wait state is 3% in the last 1d14h. I have checked the IO
usage with several tools and have not found any processes/threads
generation too much read/write requests. As you can see on my first
graph, only the read wait time increased significantly, the write not.

Br,
Mihaly

--
--
Köszönettel:
Vukovics Mihály

Vukovics Mihály

unread,

Nov 11, 2022, 3:00:50 AM11/11/22

to

Nicholas Geovanis

unread,

Nov 11, 2022, 11:40:06 AM11/11/22

to

On Fri, Nov 11, 2022, 1:58 AM Vukovics Mihály <v...@informatik.hu> wrote:

Hi Gareth,

I have already tried to change the queue depth for the physichal disks
but that has almost no effect.
There is almost no load on the filesystem, here is 10s sample from atop.
1-2 write requests but 30-50ms of average io.

DSK |          sdc | busy 27% | read       0 | write      2 |
KiB/r      0 | KiB/w      0 |               | MBr/s    0.0 | MBw/s
0.0 | avq     1.83 | avio 38.0 ms |
DSK |          sdb | busy     18% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.63 | avio 52.0 ms |
DSK |          sde | busy     18% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.63 | avio 52.0 ms |
DSK |          sda | busy     17% | read       0 | write 1 |
KiB/r      0 | KiB/w      1 |               | MBr/s 0.0 | MBw/s
0.0 | avq     1.60 | avio 48.0 ms |

Those numbers for percentage busy seem very high to me for such a low rate of IO initiation. Either the blocks being moved are very large (not necessarily wrong, maybe just poorly configured for the load) or there are other things going on with the drives.

Are the physical drives shared with any other systems? Are multiple VMs of whatever type running on the same hardware host?

Another possibility: the drives and or filesystems are thrashing as they respond to hardware and/or filesystem problems. Anything interesting there in dmsg or logs?

Vukovics Mihály

unread,

Nov 11, 2022, 12:00:05 PM11/11/22

to

Hi Gareth,

dmesg is "clean", there disks are not shared in any way and there is no virtualization layer installed.

Gareth Evans

unread,

Nov 11, 2022, 8:30:05 PM11/11/22

to

On 11 Nov 2022, at 16:59, Vukovics Mihály <v...@informatik.hu> wrote:

Hi Gareth,

dmesg is "clean", there disks are not shared in any way and there is no virtualization layer installed.

Hello, but the message was from Nicholas :)

Looking at your first graph, I noticed the upgrade seems to introduce a broad correlation between read and write iowait. Write wait before the uptick is fairly consistent and apparently unrelated to read wait.

Does that suggest anything to anyone?

In your atop stats, one drive (sdc) is ~50% more "busy" than the others, has double the number of writes, a higher avq value and lower avio time. Is it normal for raid devices to vary this much?

Is this degree of difference consistent over time? Might atop stats during some eg. fio tests be useful?

Have you done any filesystem checks?

Thanks,

Gareth

Vuki

unread,

Nov 14, 2022, 10:40:05 AM11/14/22

to

Nicholas Geovanis

unread,

Nov 14, 2022, 9:30:05 PM11/14/22

to

On Fri, Nov 11, 2022, 7:27 PM Gareth Evans <dono...@fastmail.fm> wrote:

On 11 Nov 2022, at 16:59, Vukovics Mihály <v...@informatik.hu> wrote:

Hi Gareth,

dmesg is "clean", there disks are not shared in any way and there is no virtualization layer installed.
Hello, but the message was from Nicholas :)

Looking at your first graph, I noticed the upgrade seems to introduce a broad correlation between read and write iowait. Write wait before the uptick is fairly consistent and apparently unrelated to read wait.

Does that suggest anything to anyone?

What I see in the first graph that's odd is that only read latency really increased. The other wait times remained pretty stable, just a small uptick with greater variance.That graph is only the sda drive apparently.

What could bring about a jump in only read latency? Yet not in write latency or device wait time. Seems to me it must be some buffer, filesystem parameter or device queue changed size at the upgrade.