IO performance problems in 2.4.19-pre5 when writing to DVD-RAM/ZIP/MO

Andrea Arcangeli

unread,

Apr 16, 2002, 11:00:12 AM4/16/02

to

On Fri, Apr 05, 2002 at 11:04:18PM +0200, Moritz Franosch wrote:
>
>
> Hello Andrea,
>
>
> When releasing 2.4.19-pre5, Marcelo wrote
>
> This release has -aa writeout scheduling changes, which should
> improve IO performance (and interactivity under heavy write loads).
>
> _Please_ test that extensively looking for any kind of problems
> (performance, interactivity, etc).
>
> I did test it because I noticed serious IO performance problems with
> earlier kernels.
> http://groups.google.com/groups?selm=linux.kernel.rxxn103tdbw.fsf%40synapse.t30.physik.tu-muenchen.de&output=gplain
> http://groups.google.com/groups?selm=linux.kernel.rxxsn83rd4c.fsf%40synapse.t30.physik.tu-muenchen.de&output=gplain
>
> The problem is that writing to a DVD-RAM, ZIP or MO device almost
> totally blocks reading from a _different_ device. Here is some data.
>
> nr bench read write 2.4.18 2.4.19-rc5 expected factor
> 1 dd 30GB HDD DVD-RAM 278 490 60 8.2
> 2 dd 120GB HDD DVD-RAM 197 438 32 14
> 3 dd 30GB HDD ZIP 158 239 60 4.0
> 4 dd 120GB HDD ZIP 142 249 32 7.8
> 5 dd 30GB HDD 120GB HDD 87 89 60 1.5
> 6 dd 120GB HDD 30GB HDD 66 69 32 2.2
> 7 cp 30GB HDD 120GB HDD 97 77 60 1.3
> 8 cp 120GB HDD 30GB HDD 78 65 50 1.3
>
> The columns 2.4.18 and 2.4.19-rc5 list execution times in seconds of
> the respective benchmark. The column "expected" lists the time I would
> have expected for the respective benchmark to complete with a
> "perfect" kernel. The "factor" is the factor 2.4.19-rc5 is slower than
> a perfect kernel would be.
>
> The "dd" benchmark reads by 'dd' a file of size 1GB from the device
> listed under "read" and writes to /dev/null while _another_ 'dd'
> process writes to the device listed under "write" and reads from
> /dev/zero. Please see the source below. The "cp" benchmark simply
> copies a file of size 1GB from "read" to "write".
>
> I have four IDE devices installed in that system:
> hda: 30 GB IDE HDD 5400 rpm
> hdb: 9.4 GB ATAPI DVD-RAM
> hdc: 120 GB IDE HDD 7200 rpm
> hdd: 100 MB ATAPI ZIP
>
> As you can see, the "cp" benchmark (7,8) has considerably improved
> between 2.4.18 and 2.4.19-rc5 and it is only a factor of 1.3 away from
> perfect. Good work!
>
> The performance problems can be seen in benchmarks 1-4. Writing to
> DVD-RAM while reading from the (fast) 130GB HDD (benchmark 2) almost
> totally blocks the read process. Under 2.4.19-rc5, it takes 14 times
> longer to 'dd' a 1GB file from HDD to /dev/null while writing to
> DVD-RAM than without any other IO. Without any other IO it only takes
> 32 seconds to read the 1GB file from the 130GB HDD. I would expect
> that writing simultaneously to _another_ device while reading a file
> would have no impact on the read speed. That's why I expected 32
> seconds for benchmark 2 to complete. Similarly, in benchmark 6,
> reading 1GB from the 120GB HDD should only take 32 seconds. But it
> takes more than twice that time when writing simultaneously to the
> 30GB HDD.
>
> With 'vmstat 1' I have made the observation that 2.4.19-pre5 is a bit
> "fairer" that 2.4.18. Under 2.4.18, a writing process to DVD-RAM
> almost totally blocks reading from HDD whereas under 2.4.19-pre5 about
> 1-2 MB/s can be read simultaneously. So I was astonished that in my
> benchmarks 1-4, kernel 2.4.19-pre5 performed much worse than
> 2.4.18. The reason may be that the main throughput stems from the
> short moments where, for what reason whatsoever, read speed increases
> to 20-30 MB/s, as is normal. In benchmarks 3 and 4 the 'dd' process
> writing to the ZIP drive terminated with "no space left on device"
> before the reading 'dd' process completed. The reading 'dd' process
> then probably got a higher throughput (I checked that in X with
> xosview). That's probably the reason why benchmarks 3 and 4 (ZIP) took
> shorter than 1 and 2 (DVD-RAM).
>
> I ran all benchmarks just after booting into runlevel 1 (single user
> mode) on a Suse 7.1 system. The system is a Athlon 700 MHz, KT133
> chipset, 256 MB RAM, 256 MB swap.
>
>
> The "dd" benchmark is:
>
> #!/bin/bash
>
> dd if=/dev/zero of=$1/tmp bs=1000000 &
> # a sleep is sometimes necessary for bad performance (to fill cache?)
> sleep 30
> time dd if=$2 of=/dev/null bs=1000000
>
>
> Filesystems are:
> jfranosc@nomad:~ > mount
> /dev/hda3 on / type reiserfs (rw,noatime)
> proc on /proc type proc (rw)
> devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
> /dev/hda1 on /boot type ext2 (rw,noatime)
> /dev/hda6 on /home type ext2 (rw,noatime)
> /dev/hda7 on /lscratch type reiserfs (rw,noatime)
> /dev/hdc2 on /lscratch2 type reiserfs (rw,noatime)
> shmfs on /dev/shm type shm (rw)
> automount(pid341) on /net type autofs (rw,fd=5,pgrp=341,minproto=2,maxproto=4)
> automount(pid334) on /misc type autofs (rw,fd=5,pgrp=334,minproto=2,maxproto=4)
> /dev/hdd4 on /mzip type vfat (rw,noexec,nosuid,nodev,user=jfranosc)
> /dev/sr0 on /dvd type ext2 (rw,noexec,nosuid,nodev,user=jfranosc)
>
>
> Boot parameters in /etc/lilo.conf:
> append = "hdb=ide-scsi hdd=ide-floppy"
>
>
> I report this performance problem because, first, there is room for
> improvements of IO performance up to factor 2 when using multiple
> disks, and, second, writing to DVD-RAM or ZIP almost makes the system
> unusable (because read performance drops to virtually zero).
>
> If you have patches that you think should be tested, I'd like to try
> them.

The reason hd is faster is because new algorithm is much better than the
previous mainline code. Now the reason the DVDRAM hangs the machine
more, that's probably because more ram can be marked dirty with those
new changes (beneficial for some workload, but it stalls much more the
fast hd, if there's one very slow blkdev in the system). You can try
decrasing the percent of vm dirty in the system with:

echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush

hope this helps,

Right fix is different but not suitable for 2.4.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rik van Riel

unread,

Apr 16, 2002, 11:50:07 AM4/16/02

to

On Tue, 16 Apr 2002, Andrea Arcangeli wrote:
> On Fri, Apr 05, 2002 at 11:04:18PM +0200, Moritz Franosch wrote:

> > The problem is that writing to a DVD-RAM, ZIP or MO device almost
> > totally blocks reading from a _different_ device. Here is some data.
> >
> > nr bench read write 2.4.18 2.4.19-rc5 expected factor
> > 1 dd 30GB HDD DVD-RAM 278 490 60 8.2
> > 2 dd 120GB HDD DVD-RAM 197 438 32 14
> > 3 dd 30GB HDD ZIP 158 239 60 4.0
> > 4 dd 120GB HDD ZIP 142 249 32 7.8
> > 5 dd 30GB HDD 120GB HDD 87 89 60 1.5
> > 6 dd 120GB HDD 30GB HDD 66 69 32 2.2
> > 7 cp 30GB HDD 120GB HDD 97 77 60 1.3
> > 8 cp 120GB HDD 30GB HDD 78 65 50 1.3
> >
> > The columns 2.4.18 and 2.4.19-rc5 list execution times in seconds of
> > the respective benchmark. The column "expected" lists the time I would
> > have expected for the respective benchmark to complete with a
> > "perfect" kernel. The "factor" is the factor 2.4.19-rc5 is slower than
> > a perfect kernel would be.

> The reason hd is faster is because new algorithm is much better than the

> previous mainline code. Now the reason the DVDRAM hangs the machine
> more, that's probably because more ram can be marked dirty with those
> new changes (beneficial for some workload, but it stalls much more the
> fast hd, if there's one very slow blkdev in the system). You can try
> decrasing the percent of vm dirty in the system with:
>
> echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush

Judging from the performance regression above it would seem the
new defaults suck rocks.

Can we please stop optimising Linux for a single workload benchmark
and start tuning it for the common case of running multiple kinds
of applications and making sure one application can't mess up the
others ?

Personally I couldn't care less if my tar went 30% faster if it
meant having my desktop unresponsive for the whole time.

regards,

Rik
--
http://www.linuxsymposium.org/2002/
"You're one of those condescending OLS attendants"
"Here's a nickle kid. Go buy yourself a real t-shirt"

http://www.surriel.com/ http://distro.conectiva.com/

Andrea Arcangeli

unread,

Apr 16, 2002, 12:30:17 PM4/16/02

to

Your desktop is not unresponsive for the whole time. The problem happens
only under a flood to DVD and ZIP and as you can see above 2.4.18 sucks
rocks for such a workload too in the first place and that's nothing new,
not a problem introduced by my changes, more detailed explanation
follows.

DVDRAM and ZIP writes are dogslow and for such a slow blkdev allowing
60% of freeable ram to be locked in dirty buffers, is exactly like if
you have to swapout on top of a DVDRAM instead of on top of a 40M/sec hd
while allocating memory. You will have to wait minutes before such 60%
of freeable ram is flushed to disk. Zip and DVDRAM write with a bandwith
lower than 1M/sec, swapping out on them clearly can lead a malloc to
take minutes (having to flush dirty data as said is equivalent to
swapout on them). OTOH if you don't reach the 60% the DVDRAM and ZIP
will behave better with the new changes, it's the cp /dev/zero /dvdram
that cause you to wait the 100 seconds at the next malloc.

So if you are used to cp /dev/zero /dvdram you should definitely reduce
the max dirty amount of freeable ram to 3%, and that's what the above bdflush
tuning does.

The only way to avoid you to set the nfract levels, is to have them per
blkdev but I don't see it as a 2.4 requirement but it would be nice to
have it for 2.5 at least.

This is an issue of swapping on a very slow HD, you will be slow be
sure of that, with mainline too, with the new changes even more becaue
you will have to swapout more. Reducing the bdflush is equivalent to swap
less. If the HD would be as fast as memory we should swap more. The
slower the HD the less we must swap to be fast. It's not that we are
optimizing for a single workload, but it's that it's not possible to
generate with math the most efficient number that will lead to the max
possible performance in terms of performance while still providing good
latency, the current heuristic is tuned for a fairly normal hd, so
slower hd requires and always required in previous mainline too, the
tuning of the VM to avoid swapping out too much if the HD runs at less
than 1M/sec.

Not to tell the writes to a slow HD will hang all the writes of the fast
HD, and that's another basic design issue that is completly unchanged
between the two kernels, but that could be completly rewritten to allow
a fast HD to write at max speed while the slow HD writes at max speed.
This is definitely not possible right now, the fast HD will write seldom
in small chunks in such a workload.

You should acknowledge the new changes and defaults are better, they
even are better at showing basic design problems of the kernel with HD
with performance much slower than memory than the normal hd. Just trying
to hide those design problems by setting a default of 3% would be wrong,
doing that would lead to your production desktop and servers to be much
slower. If it's slow only during the backup to zip or dvdram it's not a
showstopper, no failures, just higher write and read latencies and
higher allocation latencies than when you run cp /dev/zero on a normal
HD.

Andrea

Alan Cox

unread,

Apr 16, 2002, 12:30:18 PM4/16/02

to

> > The problem is that writing to a DVD-RAM, ZIP or MO device almost
> > totally blocks reading from a _different_ device. Here is some data.

Yes I saw this with M/O disks, thats one reason the -ac tree doesn't adopt
all the ll_rw_blk/elevator changes from the vanilla tree.

> > DVD-RAM while reading from the (fast) 130GB HDD (benchmark 2) almost
> > totally blocks the read process. Under 2.4.19-rc5, it takes 14 times

You'll see this on other things too. Large file creates seem to basically
stall anything wanting swap

> > benchmarks 1-4, kernel 2.4.19-pre5 performed much worse than
> > 2.4.18. The reason may be that the main throughput stems from the
> > short moments where, for what reason whatsoever, read speed increases

Fairness, throughput, latency - pick any two..

> Right fix is different but not suitable for 2.4.

Curious - what do you think the right fix is ?

Andrea Arcangeli

unread,

Apr 16, 2002, 12:50:07 PM4/16/02

to

On Tue, Apr 16, 2002 at 05:09:17PM +0100, Alan Cox wrote:
> > > The problem is that writing to a DVD-RAM, ZIP or MO device almost
> > > totally blocks reading from a _different_ device. Here is some data.
>
> Yes I saw this with M/O disks, thats one reason the -ac tree doesn't adopt
> all the ll_rw_blk/elevator changes from the vanilla tree.

that should have nothing to do with the elevator. The elevator matters
within the same disk. here it's the other (fast) disks that are slower
while you write to the slow ZIP M/O DVDRAM. That is always been the
case, I remeber that since I run 2.0.25 the first time with ppa.

> > > DVD-RAM while reading from the (fast) 130GB HDD (benchmark 2) almost
> > > totally blocks the read process. Under 2.4.19-rc5, it takes 14 times
>
> You'll see this on other things too. Large file creates seem to basically
> stall anything wanting swap

the "wanting swap" bit also depends how much anon/shm ram there is in the
system compared to clean freeable cache. With the rest of the patches
applied you should not want swap during a large file create unless
you've quite a lot of physical ram mapped in shm/anon.

> > > benchmarks 1-4, kernel 2.4.19-pre5 performed much worse than
> > > 2.4.18. The reason may be that the main throughput stems from the
> > > short moments where, for what reason whatsoever, read speed increases
>
> Fairness, throughput, latency - pick any two..
>
> > Right fix is different but not suitable for 2.4.
>
> Curious - what do you think the right fix is ?

One part of the fix is not to allow dirty buffers belonging to the
zip/M/O/DVDRAM drives to grow over 3/4% of the total freeable ram in the
system. So the rest of the 96/97% of freeable ram can be allocated
nearly atomically without blocking. And really that percentage can
depend on the user needs too. If an user needs to rewrite and rewrite
and he never goes to use more than 20% of the physical ram as cache, he
will probably want a limit of 30%, not 3/4%, even if then a malloc
requiring such 30% to be flushed to disk could take several minutes to
return. It's not a trivial problem, but at least having per-blkdev
tunings would make it much better. 60% of ram in something that writes
512bytes/sec would be totally insane for example. If something writes at
512bytes/sec we should allow at most a few pages of cache to be dirty
simultaneously. the best would be if the kernel could learn a limit with
runtime for each blkdev, the fixed 3/4% still is not very appealing.

The other side of the fix, is to rewrite the write against writes in the
BUF_DIRTY list, now even if the allocations don't block, the other async
flushes will wait those three pages to be written at 512bytes/sec,
despite the other async flushes could go to the hd in parallel at
30M/sec.

The linux vm (this is always been true since 2.0) is tuned and behaves
well with normal HD running at similar speeds, if the speed of the HD
very a lot or if the HD is dogslow, linux async flushing it's not
optimal. The new more aggressive tunings just put it at the light more,
despite other more server oriented workloads are improved because of the
faster hardware and the fact they actually take advantage of the larger
dirty cache, unlike the dd where if it would be synchronous it wouldn't
matter.

Andrea

Rik van Riel

unread,

Apr 16, 2002, 12:50:07 PM4/16/02

to

On Tue, 16 Apr 2002, Alan Cox wrote:

> > > benchmarks 1-4, kernel 2.4.19-pre5 performed much worse than
> > > 2.4.18. The reason may be that the main throughput stems from the
> > > short moments where, for what reason whatsoever, read speed increases
>
> Fairness, throughput, latency - pick any two..

Personally I try to go for fairness and latency in -rmap,
since most real workloads I've encountered don't seem to
have throughput problems.

The standard "it's getting slow" complaint has been about
response time and fairness 90% of the time, usually when
the system stalls one process during some other activity.

> > Right fix is different but not suitable for 2.4.
>
> Curious - what do you think the right fix is ?

Tuning the current system for latency and fairness should
keep most people happy. Desktop users really won't notice
if unpacking an RPM takes 20% longer, but having their
mp3 skip during RPM unpacking is generally considered
unacceptable.

regards,

Rik
--
http://www.linuxsymposium.org/2002/
"You're one of those condescending OLS attendants"
"Here's a nickle kid. Go buy yourself a real t-shirt"

http://www.surriel.com/ http://distro.conectiva.com/

-

Roger Larsson

unread,

Apr 16, 2002, 3:20:06 PM4/16/02

to

On Tuesday 16 April 2002 16.53, Andrea Arcangeli wrote:
> The reason hd is faster is because new algorithm is much better than the
> previous mainline code. Now the reason the DVDRAM hangs the machine
> more, that's probably because more ram can be marked dirty with those
> new changes (beneficial for some workload, but it stalls much more the
> fast hd, if there's one very slow blkdev in the system). You can try
> decrasing the percent of vm dirty in the system with:
>
>
> echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush
>
>
> hope this helps,
>
>
> Right fix is different but not suitable for 2.4.
>
>
> Andrea

In an other recent thread "PROMBLEM: CD burning at 16x uses excessive CPU,
although DMA is enabled" it was found out that writing to CD-R did not use
DMA. This resulted in lots of wasted CPU cycles.

From a main by Anssi Saari
> cdrdao simulate -n --speed 8 foo.cue 2.62s user 3.37s system 1% cpu 6:41
> cdrdao simulate -n --speed 12 foo.cue 2.78s user 29.91s system 12% cpu 4:31
> cdrdao simulate -n --speed 16 foo.cue 2.67s user 128.8s system 52% cpu 4:11

> But even though 50% is quite high, CPU load is not the problem as such,
> the problem is getting data to the writer fast enough. And it's not
> happening. Even a single audio track that is completely cached so that
> there is no HD access has problems. It's like somehow accessing the CD
> writer hogs the system for such long periods that there is insufficient
> time to fill the writing program's buffer.

Might this be part of the problem in this case too? Moritz please time your
commands and use vmstat too... (time spent in interrupt while running the
idle process - does not always show up)

/RogerL

--
Roger Larsson
Skellefteå
Sweden

Moritz Franosch

unread,

Apr 17, 2002, 11:30:18 AM4/17/02

to

> > The problem is that writing to a DVD-RAM, ZIP or MO device almost
> > totally blocks reading from a _different_ device. Here is some data.
> >
> > nr bench read write 2.4.18 2.4.19-rc5 expected factor
> > 1 dd 30GB HDD DVD-RAM 278 490 60 8.2
> > 2 dd 120GB HDD DVD-RAM 197 438 32 14
> > 3 dd 30GB HDD ZIP 158 239 60 4.0
> > 4 dd 120GB HDD ZIP 142 249 32 7.8
> > 5 dd 30GB HDD 120GB HDD 87 89 60 1.5
> > 6 dd 120GB HDD 30GB HDD 66 69 32 2.2
> > 7 cp 30GB HDD 120GB HDD 97 77 60 1.3
> > 8 cp 120GB HDD 30GB HDD 78 65 50 1.3

Should be -pre5, sorry.

> The reason hd is faster is because new algorithm is much better than the
> previous mainline code. Now the reason the DVDRAM hangs the machine
> more, that's probably because more ram can be marked dirty with those
> new changes (beneficial for some workload, but it stalls much more the
> fast hd, if there's one very slow blkdev in the system). You can try
> decrasing the percent of vm dirty in the system with:
>
> echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush

With the bdflush-parameters above, I get

nr bench read write 2.4.19-pre5 expected factor
9 dd 30GB HDD DVD-RAM 208/0/6 60 3.5
10 dd 120GB HDD DVD-RAM 39/0/6 32 1.2
11 dd 30GB HDD ZIP 66/0/10 60 1.1
12 dd 120GB HDD ZIP 85/0/7 32 2.7

Numbers in the column 2.4.19-pre5 are total time / user time / system
time in seconds.

Performance is much better with the new parameters. Also, with the new
parameters, the system can read from HDD almost steadily while writing
to DVD. This should much increase responsiveness.

In cases 9 and 12 where performance is bad, both tested drives are on
the same IDE controller. Should that matter?

> Right fix is different but not suitable for 2.4.

I'm looking forward to the definitive solution.

Thank you very much,

Moritz

--
Dipl.-Phys. Moritz Franosch
http://Franosch.org

Moritz Franosch

unread,

Apr 17, 2002, 11:40:15 AM4/17/02

to

> Judging from the performance regression above it would seem the
> new defaults suck rocks.

I first thought that 2.4.19-pre5 would be better than 2.4.18 because
vmstat showed that 2.4.19-pre5 could still read 1-2 MB per second from
HDD while writing to DVD-RAM, whereas 2.4.18 blocked totally for more
than 10 seconds or so. But there are short moments under both kernels
(with default bdflush parameters) where you get data from HDD at a
very high rate before it drops again. It seems the main throughput
over a long time stems from these short moments.

> Can we please stop optimising Linux for a single workload benchmark
> and start tuning it for the common case of running multiple kinds
> of applications and making sure one application can't mess up the
> others ?
>
> Personally I couldn't care less if my tar went 30% faster if it
> meant having my desktop unresponsive for the whole time.

That's why I did the benchmarks in the first place, because my desktop
was unresponsive while writing to DVD-RAM.

Moritz

--
Dipl.-Phys. Moritz Franosch
http://Franosch.org

Moritz Franosch

unread,

Apr 17, 2002, 12:00:06 PM4/17/02

to

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> > > benchmarks 1-4, kernel 2.4.19-pre5 performed much worse than
> > > 2.4.18. The reason may be that the main throughput stems from the
> > > short moments where, for what reason whatsoever, read speed increases
>
> Fairness, throughput, latency - pick any two..

That's exactly the point. Writing large files to DVD-RAM leads to low
throughput when reading from HDD, long latencies and doesn't even
let the HDD read as mush data as is written to DVD-RAM (with 2.4.18),
which is very unfair.

My benchmarks show bad throughput. What I first observed was bad
latency when writing to DVD-RAM (no mouse movement for 3 seconds or
so, long times switching between applications, text output delayed
when typing). Fairness shouldn't be an issue because 2.4.18/19-pre5
are also bad when both tested disks are on different IDE controllers,
therefore no resources must be shared between the reading and the
writing process (except RAM for cache, but there is plenty).

Moritz

--
Dipl.-Phys. Moritz Franosch
http://Franosch.org

Andreas Dilger

unread,

Apr 17, 2002, 12:40:08 PM4/17/02

to

On Apr 17, 2002 17:26 +0200, Moritz Franosch wrote:
> In cases 9 and 12 where performance is bad, both tested drives are on
> the same IDE controller. Should that matter?

Yes, very much. IDE is broken in this regard - you can only do I/O
to one device on the IDE channel at the same time. Either you need
to get additional IDE controllers (about $35 or so) or you split your
devices so that they are on separate IDE channels (i.e. DVD and ZIP
together, HD on the other channel if copying HD <-> DVD and HD <-> ZIP).
Of course with 2 HDs, you should probably keep them on separate channels
also.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

Andrea Arcangeli

unread,

Apr 17, 2002, 1:40:10 PM4/17/02

to

On Wed, Apr 17, 2002 at 05:26:13PM +0200, Moritz Franosch wrote:
>
>
> > > The problem is that writing to a DVD-RAM, ZIP or MO device almost
> > > totally blocks reading from a _different_ device. Here is some data.
> > >
> > > nr bench read write 2.4.18 2.4.19-rc5 expected factor
> > > 1 dd 30GB HDD DVD-RAM 278 490 60 8.2
> > > 2 dd 120GB HDD DVD-RAM 197 438 32 14
> > > 3 dd 30GB HDD ZIP 158 239 60 4.0
> > > 4 dd 120GB HDD ZIP 142 249 32 7.8
> > > 5 dd 30GB HDD 120GB HDD 87 89 60 1.5
> > > 6 dd 120GB HDD 30GB HDD 66 69 32 2.2
> > > 7 cp 30GB HDD 120GB HDD 97 77 60 1.3
> > > 8 cp 120GB HDD 30GB HDD 78 65 50 1.3
>
> Should be -pre5, sorry.

Never mind, that was clear :).

>
> > The reason hd is faster is because new algorithm is much better than the
> > previous mainline code. Now the reason the DVDRAM hangs the machine
> > more, that's probably because more ram can be marked dirty with those
> > new changes (beneficial for some workload, but it stalls much more the
> > fast hd, if there's one very slow blkdev in the system). You can try
> > decrasing the percent of vm dirty in the system with:
> >
> > echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush
>
> With the bdflush-parameters above, I get
>
> nr bench read write 2.4.19-pre5 expected factor
> 9 dd 30GB HDD DVD-RAM 208/0/6 60 3.5
> 10 dd 120GB HDD DVD-RAM 39/0/6 32 1.2
> 11 dd 30GB HDD ZIP 66/0/10 60 1.1
> 12 dd 120GB HDD ZIP 85/0/7 32 2.7
>
> Numbers in the column 2.4.19-pre5 are total time / user time / system
> time in seconds.
>
> Performance is much better with the new parameters. Also, with the new
> parameters, the system can read from HDD almost steadily while writing
> to DVD. This should much increase responsiveness.

Good. As said that's a mere workaround at the moment, something generic
and possibly autotuning is a bit more complex, and still a research
item (I see it more as a 2.5 thing, because the slow reads during DVD
etc.. isn't really a regression).

> In cases 9 and 12 where performance is bad, both tested drives are on
> the same IDE controller. Should that matter?

yes very much so, I should had asked you about it, I was assuming it
wasn't bound by hardware limitations. IDE isn't able to submit commands
in parallel to both hosts on the same IDE controller, so you will get
much slower performance doing I/O on the master and slave of the same
ide channel. So if your "expected" doesn't count the IDe channel
collisions then it sounds good.

> > Right fix is different but not suitable for 2.4.
>
> I'm looking forward to the definitive solution.

Right you are.

> Thank you very much,

You're welcome.

Andrea

Moritz Franosch

unread,

Apr 17, 2002, 3:20:13 PM4/17/02

to

Roger Larsson <roger....@skelleftea.mail.telia.com> writes:

> In an other recent thread "PROMBLEM: CD burning at 16x uses excessive CPU,
> although DMA is enabled" it was found out that writing to CD-R did not use
> DMA. This resulted in lots of wasted CPU cycles.

> > But even though 50% is quite high, CPU load is not the problem as such,

> > the problem is getting data to the writer fast enough. And it's not
> > happening. Even a single audio track that is completely cached so that
> > there is no HD access has problems. It's like somehow accessing the CD
> > writer hogs the system for such long periods that there is insufficient
> > time to fill the writing program's buffer.
>
> Might this be part of the problem in this case too? Moritz please time your
> commands

I have timed the new benchmarks. User time is 0, system time is 3% in
the worst case.

> and use vmstat too... (time spent in interrupt while running the
> idle process - does not always show up)

'vmstat 1' during benchmark 2 (reading from 130GB HDD, writing to
DVD-RAM), kernel 2.4.19-pre5, default bdflush parameters.

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 5 2 10120 2932 7040 182276 0 6 344 501 135 217 4 2 94
1 0 2 10120 2184 7056 182976 0 0 8 0 116 115 2 2 96
0 1 2 10120 2244 7068 182932 0 0 0 3944 125 129 0 1 99
0 1 2 10120 2220 7072 182944 0 0 0 0 111 97 1 0 99
1 0 2 10120 2608 7080 182548 0 0 0 4032 136 123 1 0 99
1 0 3 10120 2676 7112 182448 0 0 4 40 168 155 1 1 98
1 1 2 10120 2180 7124 182928 0 0 4 0 112 104 1 2 97
0 1 2 10120 2192 7128 182924 0 0 0 4026 112 124 1 1 98
1 1 2 10120 2288 7132 182816 0 0 0 0 111 109 1 0 99
1 1 2 10120 2408 7136 182692 0 0 0 0 111 117 1 0 99
1 1 3 10120 2276 7160 182796 0 0 0 4044 116 142 1 1 98
1 1 2 10120 2216 7164 182840 0 0 4 12 108 92 1 0 99
1 1 2 10120 2320 7172 182744 0 0 0 0 115 132 1 1 98
1 0 2 10120 3168 7180 181888 0 0 0 4032 151 186 1 1 98
0 2 2 10120 2180 7188 182868 0 0 0 0 111 122 1 2 97
0 1 2 10120 2876 7188 182164 0 0 0 4032 111 113 1 4 95
0 2 3 10120 2772 7220 182244 0 0 0 40 113 133 1 3 96
0 1 2 10120 2192 7228 182824 0 0 0 0 116 123 1 2 97
0 1 2 10120 2300 7232 182704 0 0 0 0 111 116 1 0 99

[reading from HDD starts about here]

2 1 2 10120 2304 7256 181616 0 0 1172 4032 311 372 0 1 99
0 6 3 10120 2876 7272 180992 0 0 144 4048 113 124 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 2 3 10120 2284 7276 181672 0 0 1520 638 144 227 2 2 96
2 0 3 10120 2544 7284 181424 0 0 1020 0 161 177 1 1 98
3 0 3 10120 2820 7284 181216 0 0 1392 3876 186 209 1 1 98
0 3 3 10120 2504 7288 181536 0 0 980 0 127 133 0 1 99
2 0 3 10120 3124 7288 180972 0 0 1236 3890 220 258 2 0 98
0 2 3 10120 2596 7296 181564 0 0 0 0 113 137 1 1 98
0 2 3 10120 2204 7304 181992 0 0 516 0 122 127 1 2 97
0 2 3 10120 2416 7308 181816 0 0 240 3922 117 133 1 0 99
0 3 3 10120 2176 7312 182108 0 0 652 0 122 135 1 1 98
0 2 3 10120 2296 7316 181984 0 0 400 0 118 130 1 0 99
4 0 3 10120 2956 7316 181252 0 0 24 3930 173 200 1 1 98
0 4 3 10120 2184 7320 182048 0 0 1956 0 148 200 0 4 96
0 3 3 10120 2432 7324 181892 0 0 0 3968 111 133 0 1 99
2 0 3 10120 2544 7332 181804 0 0 1092 0 138 187 1 1 98
0 2 3 10120 2180 7336 182200 0 0 388 0 121 143 1 2 97
0 2 3 10120 2236 7340 182144 0 0 640 3968 121 148 1 1 98
0 3 3 10120 2560 7340 181760 0 0 616 0 124 138 1 3 96
0 2 3 10120 2484 7344 181932 0 0 76 0 113 128 1 2 97
5 0 3 10120 2640 7348 181780 0 0 832 3968 157 174 1 2 98
0 2 3 10120 2432 7352 182012 0 0 240 0 116 135 1 1 98
2 0 3 10120 2420 7360 182036 0 0 1348 3720 173 185 1 1 98

'vmstat 1' during benchmark 2 (reading from 130GB HDD, writing to
DVD-RAM), kernel 2.4.19-pre5,
'echo 2 500 0 0 500 3000 3 1 0 >/proc/sys/vm/bdflush'

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 1 1 11020 110372 6956 78828 0 0 8 6946 116 88 1 10 89
0 1 2 11020 110344 6980 78832 0 0 0 44 120 96 1 0 99
0 1 1 11020 110344 6980 78832 0 0 0 3686 128 75 1 2 97
0 1 1 11020 110344 6980 78832 0 0 0 0 111 72 1 1 98
0 1 1 11020 110344 6980 78832 0 0 0 0 107 79 1 1 98
0 1 1 11020 110344 6980 78832 0 0 0 0 105 71 1 0 99
0 1 1 11020 110344 6980 78832 0 0 0 0 103 79 1 0 99
0 1 1 11020 102688 6996 86472 0 0 0 3942 111 93 1 5 94
0 1 1 11020 102688 6996 86472 0 0 0 0 114 79 0 1 99
0 1 1 11020 102684 6996 86476 0 0 0 3486 131 78 0 0 100
0 1 1 11020 102684 6996 86476 0 0 0 0 112 81 1 0 99
0 1 1 11020 94968 6996 94192 0 0 0 4372 111 84 1 5 94
1 1 2 11020 75780 7020 112368 0 0 18176 40 403 638 1 15 84
1 1 1 11020 46204 7028 141936 0 0 29576 0 582 971 1 23 76
3 1 1 11020 17368 7064 170736 0 0 28836 0 574 928 2 25 73
1 1 2 11020 4336 7092 186864 0 0 28188 3810 570 1024 1 22 77
1 1 1 11020 4436 7096 186760 0 0 27548 0 553 1155 1 19 80
1 1 1 11020 4388 5860 189088 0 1088 28704 1088 587 1190 2 25 73
1 1 1 11020 4460 5892 188984 0 0 29856 40 590 1243 1 24 75
1 1 1 11020 3204 5936 190232 0 0 27032 3540 550 1126 1 26 73
1 1 2 11020 3572 5964 189848 0 0 28828 0 570 1232 1 19 80
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 1 1 11020 4440 5992 188952 0 0 27292 3636 563 1147 1 21 78
1 1 1 11052 4416 6028 190044 0 824 28584 824 588 1225 1 26 73
0 2 3 11052 2256 6060 192168 0 0 16068 12 375 685 1 11 88

[reading and writing takes place concurrently at normal speed, but
there are still some "dropouts"]

0 2 1 11052 2276 6060 192120 0 0 0 2686 110 97 1 2 97
5 0 1 11052 2592 6060 191720 0 0 0 0 417 208 0 1 99
1 1 1 11052 3248 6132 191192 0 0 26472 5260 542 1212 1 31 68
2 1 1 11052 4384 6160 190040 0 0 28572 0 572 1173 1 20 79
2 1 1 11052 3716 6188 190680 0 0 27548 3844 570 1171 1 23 76
1 1 1 11052 4460 6216 189912 0 0 28316 0 570 1197 1 17 82
1 1 1 11704 2244 6244 192928 0 280 29096 280 595 1011 1 28 71
1 1 2 11704 3208 6300 191932 0 0 27678 4478 564 1123 1 25 74
1 1 2 11704 3332 6304 191828 0 0 29468 8 585 1229 1 27 72
1 1 2 11704 3416 6332 191700 0 0 30236 0 589 1253 1 19 80
0 2 2 11704 4400 6364 190676 0 444 29080 444 589 1241 1 24 75
1 1 1 12956 4372 6384 190968 0 508 28708 3450 588 1204 1 25 74
1 1 1 12956 4348 6412 190972 0 0 27296 0 569 1133 1 20 79
1 1 1 12956 3312 6416 192012 0 0 19092 0 417 801 1 15 84
1 1 1 12956 3308 6444 192012 0 0 28572 0 571 1181 1 25 74
1 1 1 12956 3284 6472 192012 0 0 28444 0 568 1178 1 26 73
1 1 1 12956 3284 6576 192312 0 68 26524 8276 548 1228 1 30 69
1 1 1 13944 4396 6600 191672 0 1060 29468 1060 599 1243 1 23 76

Moritz

--
Dipl.-Phys. Moritz Franosch
http://Franosch.org