All of my OSTs look pretty much the same-
read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
1: 88811 38 38 | 46375 17 17
2: 1497 0 38 | 7733 2 20
4: 1161 0 39 | 1840 0 21
8: 1168 0 39 | 7148 2 24
16: 922 0 40 | 3297 1 25
32: 979 0 40 | 7602 2 28
64: 1576 0 41 | 9046 3 31
128: 7063 3 44 | 16284 6 37
256: 129282 55 100 | 162090 62 100
read | write
disk fragmented I/Os ios % cum % | ios % cum %
0: 51181 22 22 | 0 0 0
1: 45280 19 42 | 82206 31 31
2: 16615 7 49 | 29108 11 42
3: 3425 1 50 | 17392 6 49
4: 110445 48 98 | 129481 49 98
5: 1661 0 99 | 2702 1 99
read | write
disk I/O size ios % cum % | ios % cum %
4K: 45889 8 8 | 56240 7 7
8K: 3658 0 8 | 6416 0 8
16K: 7956 1 10 | 4703 0 9
32K: 4527 0 11 | 11951 1 10
64K: 114369 20 31 | 134128 18 29
128K: 5095 0 32 | 17229 2 31
256K: 7164 1 33 | 30826 4 35
512K: 369512 66 100 | 465719 64 100
Oddly, there's no 1024K row in the I/O size table...
...and these seem small to me as well, but I can't seem to change them.
Writing new values to either doesn't change anything.
# cat /sys/block/sdb/queue/max_hw_sectors_kb
320
# cat /sys/block/sdb/queue/max_sectors_kb
320
Hardware in question is DELL PERC 6/E and DELL PERC H800 RAID
controllers, with MD1000 and MD1200 arrays, respectively.
Any clues on where I should look next?
Thanks,
Kevin
Kevin Hildebrand
University of Maryland, College Park
Office of Information Technology
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
First, max_sectors_kb should normally be set to a power of 2 number,
like 256, over an odd size like 320. This number should also match the
native raid size of the device, to avoid read-modify-write cycles. (See
Bug 22886 on why not to make it > 1024 in general).
See Bug 17086 for patches to increase the max_sectors_kb limitation for
the mptsas driver to 1MB, or the true hardware maximum, rather than a
driver limit; however, the hardware may still be limited to sizes < 1MB.
Also, to clarify the sizes: the smallest bucket >= transfer_size is the
one incremented, so a 320KB IO increments the 512KB bucket. Since your
HW says it can only do a 320KB IO, there will never be a 1MB IO.
You may want to instrument your HBA driver to see what is going on (ie,
why the max_hw_sectors_kb is < 1024).
Kevin
The max_sectors numbers (320) are what is being set by default- I am able
to set it to something smaller than 320, but not larger.
Kevin
Right. You can not set max_sectors_kb larger than max_hw_sectors_kb
(Linux normally defaults most drivers to 512, but Lustre sets them to be
the same): you may want to instrument your HBA driver to see what is
going on (ie, why the max_hw_sectors_kb is < 1024). I don't know if it
is due to a driver limitation or a true hardware limit.
Most drivers have a limit of 512KB by default; see Bug 22850 for the
patches that fixed the QLogic and Emulex fibre channel drivers.
Kevin
Why would the disk(s) be pegged while llobdstat shows zero activity?
After a few minutes in this state, the %util drops back down to single
digit percentages and normal I/O resumes on the clients.
Thanks,
Kevin
>
> One of the oddities that I'm seeing that has me grasping at write
> fragmentation and I/O sizes may not be directly related to these things at
> all. Periodically, iostat will show that one or more of my OST disks will
> be running at 99% utilization. Reads per second is somewhere in the
> 150-200 range, while read kB/second is quite small.
That sounds familiar. You're probably experiencing these:
https://bugzilla.lustre.org/show_bug.cgi?id=24183
http://jira.whamcloud.com/browse/LU-15
Jason
--
Jason Rappleye
System Administrator
NASA Advanced Supercomputing Division
NASA Ames Research Center
Moffett Field, CA 94035