I took time and remeasured tiobench results on recent kernel. A short conclusion is that there is still a performance regression which I reported few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA drive. tiobench sequential write performance numbers with 16 threads: 2.6.29: AVG STDERR 37.80 38.54 39.48 -> 38.606667 0.687475
So about 5% regression. The regression happened sometime between 2.6.29 and 2.6.30 and stays the same since then... With deadline scheduler, there's no regression. Shouldn't we do something about it?
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> I took time and remeasured tiobench results on recent kernel. A short > conclusion is that there is still a performance regression which I reported > few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA > drive. tiobench sequential write performance numbers with 16 threads: > 2.6.29: AVG STDERR > 37.80 38.54 39.48 -> 38.606667 0.687475
> So about 5% regression. The regression happened sometime between 2.6.29 and > 2.6.30 and stays the same since then... With deadline scheduler, there's > no regression. Shouldn't we do something about it?
> I took time and remeasured tiobench results on recent kernel. A short > conclusion is that there is still a performance regression which I reported > few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA > drive. tiobench sequential write performance numbers with 16 threads: > 2.6.29: AVG STDERR > 37.80 38.54 39.48 -> 38.606667 0.687475
> So about 5% regression. The regression happened sometime between 2.6.29 and > 2.6.30 and stays the same since then... With deadline scheduler, there's > no regression. Shouldn't we do something about it?
Sorry it took so long, but I've been flat out lately. I ran some numbers against 2.6.29 and 2.6.32-rc5, both with low_latency set to 0 and to 1. Here are the results (average of two runs):
Legend: rlat - read latency rrlat - random read latency wlat - write lancy rwlat - random write latency * - the two runs reported vastly different numbers: 67.53 and 172.46
So, as you can see, if we turn off the low_latency tunable, we get better numbers across the board with the exception of random writes. It's also interesting to note that the latencies reported by tiobench are more favorable with low_latency set to 0, which is counter-intuitive.
So, now it seems we don't have a regression in sequential read bandwidth, but we do have a regression in random read bandwidth (though the random write latencies look better). So, I'll look into that, as it is almost 10%, which is significant.
Hi Jeff, what hardware are you using for tests? I see aggregated random read bandwidth is larger than sequential read bandwidth, and write bandwidth greater than read. Is this a SAN with multiple independent spindles?
On Thu, Nov 5, 2009 at 9:10 PM, Jeff Moyer <jmo...@redhat.com> wrote: > Jan Kara <j...@suse.cz> writes:
>> Hi,
>> I took time and remeasured tiobench results on recent kernel. A short >> conclusion is that there is still a performance regression which I reported >> few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA >> drive. tiobench sequential write performance numbers with 16 threads: >> 2.6.29: AVG STDERR >> 37.80 38.54 39.48 -> 38.606667 0.687475
>> So about 5% regression. The regression happened sometime between 2.6.29 and >> 2.6.30 and stays the same since then... With deadline scheduler, there's >> no regression. Shouldn't we do something about it?
> Sorry it took so long, but I've been flat out lately. I ran some > numbers against 2.6.29 and 2.6.32-rc5, both with low_latency set to 0 > and to 1. Here are the results (average of two runs):
> Legend: > rlat - read latency > rrlat - random read latency > wlat - write lancy > rwlat - random write latency > * - the two runs reported vastly different numbers: 67.53 and 172.46
> So, as you can see, if we turn off the low_latency tunable, we get > better numbers across the board with the exception of random writes. > It's also interesting to note that the latencies reported by tiobench > are more favorable with low_latency set to 0, which is > counter-intuitive.
> So, now it seems we don't have a regression in sequential read > bandwidth, but we do have a regression in random read bandwidth (though > the random write latencies look better). So, I'll look into that, as it > is almost 10%, which is significant.
Sorry, I don't see a 10% regression in random read from your numbers. I see a larger one in sequential write for low_latency=1 (this was the regression Jan reported in the original message), but not for low_latency=0. And a 10% regression in random writes, that is not completely fixed even by disabling low_latency.
I guess your seemingly counter-intuitive results for low_latency are due to the uncommon hardware (low_latency was intended mainly for desktop-class disks). Luckily, the patches queued for 2.6.33 already address this low_latency misbehaviour.
Jeff Moyer <jmo...@redhat.com> writes: > Jan Kara <j...@suse.cz> writes:
>> Hi,
>> I took time and remeasured tiobench results on recent kernel. A short >> conclusion is that there is still a performance regression which I reported >> few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA >> drive. tiobench sequential write performance numbers with 16 threads: >> 2.6.29: AVG STDERR >> 37.80 38.54 39.48 -> 38.606667 0.687475
>> So about 5% regression. The regression happened sometime between 2.6.29 and >> 2.6.30 and stays the same since then... With deadline scheduler, there's >> no regression. Shouldn't we do something about it?
> Sorry it took so long, but I've been flat out lately. I ran some > numbers against 2.6.29 and 2.6.32-rc5, both with low_latency set to 0 > and to 1. Here are the results (average of two runs):
I modified the tiobench script to do a drop_caches between runs so I could stop fiddling around with the numbers myself. Extra credit goes to anyone who hacks it up to report standard deviation.
Anyway, here are the latest results, average of 3 runs each for 2.6.29 and 2.6.32-rc6 with low_latency set to 0. Note that there was a fix in CFQ that would result in properly preempting the active queue for metadata I/O.
Given those numbers, everything looks ok from a regression perspective. More investigation should be done for the random read numbers (given that they fluctuate quite a bit), but that's purely an enhancement at this point in time.
Just to be sure, I'll kick off 10 runs and make sure the averages fall out the same way. If you don't hear from me, though, assume this regression is fixed. The key is to set low_latency to 0 for this benchmark. We should probably add notes about when to switch off low_latency to the io scheduler documentation. Jens, would you mind doing that?
Jeff, Jens, do you think we should try to do more auto-tuning of cfq parameters? Looking at those numbers for SANs, I think we are being suboptimal in some cases. E.g. sequential read throughput is lower than random read. In those cases, converting all sync queues in sync-noidle (as defined in for-2.6.33) should allow a better aggregate throughput when there are multiple sequential readers, as in those tiobench tests. I also think that current slice_idle and slice_sync values are good for devices with 8ms seek time, but they are too high for non-NCQ flash devices, where "seek" penalty is under 1ms, and we still prefer idling. If we agree on this, should the measurement part (I'm thinking to measure things like seek time, throughput, etc...) be added to the common elevator code, or done inside cfq? If we want to put it in the common code, maybe we can also remove the duplication of NCQ detection, by publishing the NCQ flag from elevator to the io-schedulers.
Corrado Zoccolo <czocc...@gmail.com> writes: > Jeff, Jens, > do you think we should try to do more auto-tuning of cfq parameters? > Looking at those numbers for SANs, I think we are being suboptimal in > some cases. > E.g. sequential read throughput is lower than random read.
I investigated this further, and this was due to a problem in the benchmark. It was being run with only 500 samples for random I/O and 65536 samples for sequential. After fixing this, we see random I/O is slower than sequential, as expected.
> I also think that current slice_idle and slice_sync values are good > for devices with 8ms seek time, but they are too high for non-NCQ > flash devices, where "seek" penalty is under 1ms, and we still prefer > idling.
Do you have numbers to back that up? If not, throw a fio job file over the fence and I'll test it on one such device.
> If we agree on this, should the measurement part (I'm thinking to > measure things like seek time, throughput, etc...) be added to the > common elevator code, or done inside cfq?
Well, if it's something that is of interest to others, than pushing it up a layer makes sense. If only CFQ is going to use it, keep it there.
On Tue, Nov 10, 2009 at 5:47 PM, Jeff Moyer <jmo...@redhat.com> wrote: > Corrado Zoccolo <czocc...@gmail.com> writes:
>> Jeff, Jens, >> do you think we should try to do more auto-tuning of cfq parameters? >> Looking at those numbers for SANs, I think we are being suboptimal in >> some cases. >> E.g. sequential read throughput is lower than random read.
> I investigated this further, and this was due to a problem in the > benchmark. It was being run with only 500 samples for random I/O and > 65536 samples for sequential. After fixing this, we see random I/O is > slower than sequential, as expected. Ok. >> I also think that current slice_idle and slice_sync values are good >> for devices with 8ms seek time, but they are too high for non-NCQ >> flash devices, where "seek" penalty is under 1ms, and we still prefer >> idling.
> Do you have numbers to back that up? If not, throw a fio job file over > the fence and I'll test it on one such device.
It is based on reasoning. Currently idling is based on the assumption that we can wait up to 10ms, to get a better request than jumping far away, since the jump will likely cost more than that. If the jump costs around 1ms, like on flash cards, then waiting 10ms is surely wasted time. On the other hand, on flash cards a random write could cost 50ms or more, so we will need to differentiate the last idle before switching to async writes from the inter-read idles. This should be possible with the new workload based infrastructure, but we need to measure those characteristic times in order to use them in the heuristics.
>> If we agree on this, should the measurement part (I'm thinking to >> measure things like seek time, throughput, etc...) be added to the >> common elevator code, or done inside cfq?
> Well, if it's something that is of interest to others, than pushing it > up a layer makes sense. If only CFQ is going to use it, keep it there.
If the direction is to have only one intelligent I/O scheduler, as the removal of anticipatory indicates, then it is the latter. I don't think noop or deadline will ever make any use of them. But it could still be useful for reporting performance as seen by the kernel, after the page cache.
> > Hi Jeff, > > what hardware are you using for tests? > > I see aggregated random read bandwidth is larger than sequential read > > bandwidth, and write bandwidth greater than read. > > Is this a SAN with multiple independent spindles?
> Yeah, this is a single path to an HP EVA storage array. There are 24 or > so disks striped in the pool used to create the volume I am using. Jan, > could you repeat your tests with /sys/block/sdX/queue/iosched/low_latency > set to 0?
I'll give it a spin tomorrow...
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> > I took time and remeasured tiobench results on recent kernel. A short > > conclusion is that there is still a performance regression which I reported > > few months ago. The machine is Intel 2 CPU with 2 GB RAM and plain SATA > > drive. tiobench sequential write performance numbers with 16 threads: > > 2.6.29: AVG STDERR > > 37.80 38.54 39.48 -> 38.606667 0.687475
> > So about 5% regression. The regression happened sometime between 2.6.29 and > > 2.6.30 and stays the same since then... With deadline scheduler, there's > > no regression. Shouldn't we do something about it?
> Sorry it took so long, but I've been flat out lately. I ran some > numbers against 2.6.29 and 2.6.32-rc5, both with low_latency set to 0 > and to 1. Here are the results (average of two runs):
> Legend: > rlat - read latency > rrlat - random read latency > wlat - write lancy > rwlat - random write latency > * - the two runs reported vastly different numbers: 67.53 and 172.46
> So, as you can see, if we turn off the low_latency tunable, we get > better numbers across the board with the exception of random writes. > It's also interesting to note that the latencies reported by tiobench > are more favorable with low_latency set to 0, which is > counter-intuitive.
> So, now it seems we don't have a regression in sequential read > bandwidth, but we do have a regression in random read bandwidth (though > the random write latencies look better). So, I'll look into that, as it > is almost 10%, which is significant.
Sadly, I don't see the improvement you can see :(. The numbers are the same regardless low_latency set to 0: 2.6.32-rc5 low_latency = 0: 37.39 36.43 36.51 -> 36.776667 0.434920 But my testing environment is a plain SATA drive so that probably explains the difference...
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Jan Kara <j...@suse.cz> writes: > Sadly, I don't see the improvement you can see :(. The numbers are the > same regardless low_latency set to 0: > 2.6.32-rc5 low_latency = 0: > 37.39 36.43 36.51 -> 36.776667 0.434920 > But my testing environment is a plain SATA drive so that probably > explains the difference...
I just retested (10 runs for each kernel) on a SATA disk with no NCQ support and I could not see a difference. I'll try to dig up a disk that support NCQ. Is that what you're using for testing?
Cheers, Jeff
2.6.29 2.6.32-rc6,low_latency=0 ---------------------------------- Average: 34.6648 34.4475 Pop.Std.Dev.: 0.55523 0.21981 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> > Sadly, I don't see the improvement you can see :(. The numbers are the > > same regardless low_latency set to 0: > > 2.6.32-rc5 low_latency = 0: > > 37.39 36.43 36.51 -> 36.776667 0.434920 > > But my testing environment is a plain SATA drive so that probably > > explains the difference...
> I just retested (10 runs for each kernel) on a SATA disk with no NCQ > support and I could not see a difference. I'll try to dig up a disk > that support NCQ. Is that what you're using for testing?
Hmm, strange. Miklos Szeredi tried tiobench on his machine and he also saw the regression. I'll try to think what could make the difference.
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Jan Kara <j...@suse.cz> writes: > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: >> Jan Kara <j...@suse.cz> writes:
>> > Sadly, I don't see the improvement you can see :(. The numbers are the >> > same regardless low_latency set to 0: >> > 2.6.32-rc5 low_latency = 0: >> > 37.39 36.43 36.51 -> 36.776667 0.434920 >> > But my testing environment is a plain SATA drive so that probably >> > explains the difference...
>> I just retested (10 runs for each kernel) on a SATA disk with no NCQ >> support and I could not see a difference. I'll try to dig up a disk >> that support NCQ. Is that what you're using for testing? > I don't think I am. How do I find out?
Good question. ;-) I grep for NCQ in dmesg output and make sure it's greater than 0/32. There may be a better way, though.
>> 2.6.29 2.6.32-rc6,low_latency=0 >> ---------------------------------- >> Average: 34.6648 34.4475 >> Pop.Std.Dev.: 0.55523 0.21981 > Hmm, strange. Miklos Szeredi tried tiobench on his machine and he also > saw the regression. I'll try to think what could make the difference.
On Thu, Nov 12 2009, Jeff Moyer wrote: > Jan Kara <j...@suse.cz> writes:
> > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: > >> Jan Kara <j...@suse.cz> writes:
> >> > Sadly, I don't see the improvement you can see :(. The numbers are the > >> > same regardless low_latency set to 0: > >> > 2.6.32-rc5 low_latency = 0: > >> > 37.39 36.43 36.51 -> 36.776667 0.434920 > >> > But my testing environment is a plain SATA drive so that probably > >> > explains the difference...
> >> I just retested (10 runs for each kernel) on a SATA disk with no NCQ > >> support and I could not see a difference. I'll try to dig up a disk > >> that support NCQ. Is that what you're using for testing? > > I don't think I am. How do I find out?
> Good question. ;-) I grep for NCQ in dmesg output and make sure it's > greater than 0/32. There may be a better way, though.
On Thu, Nov 12 2009, Jeff Moyer wrote: > Jens Axboe <jens.ax...@oracle.com> writes:
> > On Thu, Nov 12 2009, Jeff Moyer wrote: > >> Good question. ;-) I grep for NCQ in dmesg output and make sure it's > >> greater than 0/32. There may be a better way, though.
> > cat /sys/block/<dev>/device/queue_depth
> > :-)
> OK, your comment about only working for SCSI disks threw me off. > Perhaps you meant only works for devices that use the sd driver?
Yeah, only works for storage that plugs into the SCSI stack.
> > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: > >> Jan Kara <j...@suse.cz> writes:
> >> > Sadly, I don't see the improvement you can see :(. The numbers are the > >> > same regardless low_latency set to 0: > >> > 2.6.32-rc5 low_latency = 0: > >> > 37.39 36.43 36.51 -> 36.776667 0.434920 > >> > But my testing environment is a plain SATA drive so that probably > >> > explains the difference...
> >> I just retested (10 runs for each kernel) on a SATA disk with no NCQ > >> support and I could not see a difference. I'll try to dig up a disk > >> that support NCQ. Is that what you're using for testing? > > I don't think I am. How do I find out?
> Good question. ;-) I grep for NCQ in dmesg output and make sure it's > greater than 0/32. There may be a better way, though.
Message in the logs: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-8: Hitachi HTS722016K9SA00, DCDOC54P, max UDMA/133 ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 So apparently no NCQ. /sys/block/sda/device/queue_depth shows 1 but I guess that's just it's way of saying "no NCQ".
What I thought might make a difference why I'm seeing the drop and you are not is size of RAM or number of CPUs vs the tiobench file size or number of threads. I'm running on a machine with 2 GB of RAM, using 4 GB filesize. The machine has 2 cores and I'm using 16 tiobench threads. I'm now rerunning tests with various numbers of threads to see how big difference it makes.
Honza
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> On Thu 12-11-09 15:44:02, Jeff Moyer wrote: > > Jan Kara <j...@suse.cz> writes:
> > > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: > > >> Jan Kara <j...@suse.cz> writes:
> > >> > Sadly, I don't see the improvement you can see :(. The numbers are the > > >> > same regardless low_latency set to 0: > > >> > 2.6.32-rc5 low_latency = 0: > > >> > 37.39 36.43 36.51 -> 36.776667 0.434920 > > >> > But my testing environment is a plain SATA drive so that probably > > >> > explains the difference...
> > >> I just retested (10 runs for each kernel) on a SATA disk with no NCQ > > >> support and I could not see a difference. I'll try to dig up a disk > > >> that support NCQ. Is that what you're using for testing? > > > I don't think I am. How do I find out?
> > Good question. ;-) I grep for NCQ in dmesg output and make sure it's > > greater than 0/32. There may be a better way, though. > Message in the logs: > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: ATA-8: Hitachi HTS722016K9SA00, DCDOC54P, max UDMA/133 > ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata1.00: configured for UDMA/133 > So apparently no NCQ. /sys/block/sda/device/queue_depth shows 1 but I > guess that's just it's way of saying "no NCQ".
> What I thought might make a difference why I'm seeing the drop and you > are not is size of RAM or number of CPUs vs the tiobench file size or > number of threads. I'm running on a machine with 2 GB of RAM, using 4 GB > filesize. The machine has 2 cores and I'm using 16 tiobench threads. I'm > now rerunning tests with various numbers of threads to see how big > difference it makes.
OK, here are the numbers (3 runs of each test): 2.6.29: Threads Avg Stddev 1 42.043333 0.860439 2 40.836667 0.322938 4 41.810000 0.114310 8 40.190000 0.419603 16 39.950000 0.403072 32 39.373333 0.766913
So apparently the difference between 2.6.29 and 2.6.32-rc7 increases as the number of threads rises. With how many threads have you been running when using SATA drive and what machine is it? I'm now running a test with larger file size (8GB instead of 4) to see what difference it makes.
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Jan Kara <j...@suse.cz> writes: > On Mon 16-11-09 11:47:44, Jan Kara wrote: >> On Thu 12-11-09 15:44:02, Jeff Moyer wrote: >> > Jan Kara <j...@suse.cz> writes:
>> > > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: >> > >> Jan Kara <j...@suse.cz> writes:
>> > >> > Sadly, I don't see the improvement you can see :(. The numbers are the >> > >> > same regardless low_latency set to 0: >> > >> > 2.6.32-rc5 low_latency = 0: >> > >> > 37.39 36.43 36.51 -> 36.776667 0.434920 >> > >> > But my testing environment is a plain SATA drive so that probably >> > >> > explains the difference...
>> > >> I just retested (10 runs for each kernel) on a SATA disk with no NCQ >> > >> support and I could not see a difference. I'll try to dig up a disk >> > >> that support NCQ. Is that what you're using for testing? >> > > I don't think I am. How do I find out?
>> > Good question. ;-) I grep for NCQ in dmesg output and make sure it's >> > greater than 0/32. There may be a better way, though. >> Message in the logs: >> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> ata1.00: ATA-8: Hitachi HTS722016K9SA00, DCDOC54P, max UDMA/133 >> ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32) >> ata1.00: configured for UDMA/133 >> So apparently no NCQ. /sys/block/sda/device/queue_depth shows 1 but I >> guess that's just it's way of saying "no NCQ".
>> What I thought might make a difference why I'm seeing the drop and you >> are not is size of RAM or number of CPUs vs the tiobench file size or >> number of threads. I'm running on a machine with 2 GB of RAM, using 4 GB >> filesize. The machine has 2 cores and I'm using 16 tiobench threads. I'm >> now rerunning tests with various numbers of threads to see how big >> difference it makes. > OK, here are the numbers (3 runs of each test): > 2.6.29: > Threads Avg Stddev > 1 42.043333 0.860439 > 2 40.836667 0.322938 > 4 41.810000 0.114310 > 8 40.190000 0.419603 > 16 39.950000 0.403072 > 32 39.373333 0.766913
> So apparently the difference between 2.6.29 and 2.6.32-rc7 increases as > the number of threads rises. With how many threads have you been running > when using SATA drive and what machine is it? > I'm now running a test with larger file size (8GB instead of 4) to see > what difference it makes.
I've been running with both 8 and 16 threads. The machine has 4 CPUs and 4GB of RAM. I've been testing with an 8GB file size.
>> So apparently the difference between 2.6.29 and 2.6.32-rc7 increases as >> the number of threads rises. With how many threads have you been running >> when using SATA drive and what machine is it? >> I'm now running a test with larger file size (8GB instead of 4) to see >> what difference it makes.
> I've been running with both 8 and 16 threads. The machine has 4 CPUs > and 4GB of RAM. I've been testing with an 8GB file size.
Other details may be relevant, e.g.the file system on which the file is located, whether the caches are dropped before starting each run, and so on.
> > On Mon 16-11-09 11:47:44, Jan Kara wrote: > >> On Thu 12-11-09 15:44:02, Jeff Moyer wrote: > >> > Jan Kara <j...@suse.cz> writes:
> >> > > On Wed 11-11-09 12:43:30, Jeff Moyer wrote: > >> > >> Jan Kara <j...@suse.cz> writes:
> >> > >> > Sadly, I don't see the improvement you can see :(. The numbers are the > >> > >> > same regardless low_latency set to 0: > >> > >> > 2.6.32-rc5 low_latency = 0: > >> > >> > 37.39 36.43 36.51 -> 36.776667 0.434920 > >> > >> > But my testing environment is a plain SATA drive so that probably > >> > >> > explains the difference...
> >> > >> I just retested (10 runs for each kernel) on a SATA disk with no NCQ > >> > >> support and I could not see a difference. I'll try to dig up a disk > >> > >> that support NCQ. Is that what you're using for testing? > >> > > I don't think I am. How do I find out?
> >> > Good question. ;-) I grep for NCQ in dmesg output and make sure it's > >> > greater than 0/32. There may be a better way, though. > >> Message in the logs: > >> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > >> ata1.00: ATA-8: Hitachi HTS722016K9SA00, DCDOC54P, max UDMA/133 > >> ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32) > >> ata1.00: configured for UDMA/133 > >> So apparently no NCQ. /sys/block/sda/device/queue_depth shows 1 but I > >> guess that's just it's way of saying "no NCQ".
> >> What I thought might make a difference why I'm seeing the drop and you > >> are not is size of RAM or number of CPUs vs the tiobench file size or > >> number of threads. I'm running on a machine with 2 GB of RAM, using 4 GB > >> filesize. The machine has 2 cores and I'm using 16 tiobench threads. I'm > >> now rerunning tests with various numbers of threads to see how big > >> difference it makes. > > OK, here are the numbers (3 runs of each test): > > 2.6.29: > > Threads Avg Stddev > > 1 42.043333 0.860439 > > 2 40.836667 0.322938 > > 4 41.810000 0.114310 > > 8 40.190000 0.419603 > > 16 39.950000 0.403072 > > 32 39.373333 0.766913
> > So apparently the difference between 2.6.29 and 2.6.32-rc7 increases as > > the number of threads rises. With how many threads have you been running > > when using SATA drive and what machine is it? > > I'm now running a test with larger file size (8GB instead of 4) to see > > what difference it makes.
> I've been running with both 8 and 16 threads. The machine has 4 CPUs > and 4GB of RAM. I've been testing with an 8GB file size.
OK, I see a similar regression also with 8GB file size: 2.6.29: 1 41.556667 0.787415 2 40.866667 0.714112 4 40.726667 0.228376 8 38.596667 0.344706 16 39.076667 0.180801 32 37.743333 0.147271
BTW: I'm running the test always on a fresh ext3 in data=ordered mode with barrier=1.
Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/