Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why are disk writes so slow?

0 views
Skip to first unread message

Mark Bucciarelli

unread,
Sep 26, 2006, 11:46:44 AM9/26/06
to freebsd-p...@freebsd.org
I am reading Richard Stevens' "Advanced Programming in the UNIX
Environment," a most excellent book.

Out of curiosity, I tried his I/O efficiency program on my IBM
A30 Thinkpad, running 6.0-RELEASE with default tuning parameters.
The test program reads file on stdin and writes to stdout, and
you modify bufsize to watch how time changes.

As in his example (with a bufsize of 8192),

time ./a.out < 1.5M-testfile > /dev/null

runs five times faster than (clock time)

time ./a.out < 1.5M-testfile > /a.out.out

Can someone explain to me why writing is five times as slow as
writing? What's going on in the computer?

The file is not O_SYNC, so it can't be validating the data on the
disk.

Later in the same chapter, he shows the impact of O_SYNC flag. I
re-ran this experiment too, and while everything is two orders of
magnitude faster than his times in the book, the relative speed
of writing with O_SYNC is three times slower.

1993 2006
----- ----
normal write 2.3s .023s
O_SYNC 13.4s .364s
slowdow factor 5.8 15.8

Is this all b/c disks are so much larger?

m

Bill Moran

unread,
Sep 26, 2006, 1:03:00 PM9/26/06
to Mark Bucciarelli, freebsd-p...@freebsd.org
In response to Mark Bucciarelli <ma...@gaiahost.coop>:

I'm rather confused as to exactly what your question is ...

First off, writes are slower than reads if the data you're reading is
already cached in RAM. Unless you have _very_ little RAM in your
machine, then anything that takes .023s to write is going to be able
to fit entirely in the buffer cache, thus repeated access doesn't
require any real disk activity.

Secondly, as to why Stevens saw less of a slowdown with O_SYNC than
you did -- I doubt there's one easy answer. Discs are manufactured
differently now than they were in 93, and there's even a huge difference
between different brands and different types (i.e. SCSI/SATA) in
addition to the differences in hardware connecting the disks, and
the drivers for that hardware. There are dozens of places where
the difference could be occurring. I would guess that the drive
itself does write caching, and this heavily optimizes async writes,
but can't improve the performance of sync writes any.

--
Bill Moran
Collaborative Fusion Inc.

Mark Bucciarelli

unread,
Sep 26, 2006, 2:05:38 PM9/26/06
to Bill Moran, freebsd-p...@freebsd.org
On Tue, Sep 26, 2006 at 01:03:00PM -0400, Bill Moran wrote:
> In response to Mark Bucciarelli <ma...@gaiahost.coop>:
>
> > Can someone explain to me why writing is five times as slow as
> > writing? What's going on in the computer?
>
> I'm rather confused as to exactly what your question is ...
>

Heh, I'm just trying to understand how my computer works. I was
surprised that writes were _so much_ slower than reads. I
figured somebody here knew.

> First off, writes are slower than reads if the data you're
> reading is already cached in RAM. Unless you have _very_
> little RAM in your machine, then anything that takes .023s to
> write is going to be able to fit entirely in the buffer cache,
> thus repeated access doesn't require any real disk activity.

I could try running the test immediately after rebooting.

Although I have no idea if he did. I bet he didn't. And we both
used the same size file: 1.5M. I wonder how much RAM he had.

> Secondly, as to why Stevens saw less of a slowdown with O_SYNC
> than you did -- I doubt there's one easy answer. Discs are
> manufactured differently now than they were in 93, and there's
> even a huge difference between different brands and different
> types (i.e. SCSI/SATA) in addition to the differences in
> hardware connecting the disks, and the drivers for that
> hardware. There are dozens of places where the difference
> could be occurring. I would guess that the drive itself does
> write caching, and this heavily optimizes async writes, but
> can't improve the performance of sync writes any.

I see.

Yeah, if you look at normal write compared to read speed that did
improve between the two data points. So that supports your
conjecture.

So you think this data has no value?

1993 2006
----- ----
(1) /dev/null write 0.3s .005s <-- read speed
(2) normal write 2.3s .023s
(3) O_SYNC 13.4s .364s

(2) / (1) 7.6 4.6 1.5x faster

(3) / (2) 5.8 15.8 2.75x slower relative to ASYNC

(3) / (1) 44.6 72.8 1.5x slower relative to read


It does makes me wonder how this test runs on Linux, since I
think databases use O_SYNC. I guess I'd have to install Linux on
my laptop and run the same test to have any useful information.
And reboot between each test. And shut down network interface
and all daemons.

In any case, thanks for your reply.

m

Jason Stone

unread,
Sep 26, 2006, 4:33:50 PM9/26/06
to Mark Bucciarelli, freebsd-p...@freebsd.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> As in his example (with a bufsize of 8192),
>
> time ./a.out < 1.5M-testfile > /dev/null
>
> runs five times faster than (clock time)
>
> time ./a.out < 1.5M-testfile > /a.out.out

a) your 1.5M-testfile is most likely still in the cache from previous test
runs or from when you created it.

b) reading and writing to the same disk, you're going to thrash the disk
with seeks.

so, some other experiments to try might include:

a) create a whole bunch of test files, reboot, and then make sure you use
a different test file for every run.

b) try variations where you use a ramdisk for the read and disk for the
write, then a disk for the read and a ramdisk for the write, and then a
ramdsik for both.

c) try reading from /dev/zero and writing to disk as the converse of
reading from disk and writing to /dev/null, etc.


-Jason
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)
Comment: See https://private.idealab.com/public/jason/jason.gpg

iD8DBQFFGY6vswXMWWtptckRAu1QAKDg1M3AFoDyHX7Zh3pfMz5RO3zyrQCfcQor
z78KtLyYIOKzeaAzq5xYLPY=
=Xe8O
-----END PGP SIGNATURE-----

Oliver Fromme

unread,
Sep 27, 2006, 10:02:27 AM9/27/06
to freebsd-p...@freebsd.org, ma...@gaiahost.coop
Mark Bucciarelli wrote:
> 1993 2006
> ----- ----
> (1) /dev/null write 0.3s .005s <-- read speed

If you mean to say that 1.5 MB have been read in 0.005s,
then that's certainly _not_ the read speed of your disk
drive. No single drive currently in existence can deliver
300 MB per second. Those 1.5 MB came from the cache.

> (2) normal write 2.3s .023s

Looks reasonable.

> (3) O_SYNC 13.4s .364s

Also looks reasonable. Of course it depends a lot on the
type of disk (SCSI, ATA), interface speed (PIO*, UDMA*),
drive configuration (write caching etc.), vendor of disk
and controller, etc.

> It does makes me wonder how this test runs on Linux, since I
> think databases use O_SYNC.

Usually databases issue an fsync() call at important points
in time (e.g. after a full transaction was comitted). The
performance is better than running all writes synchronously.

> I guess I'd have to install Linux on
> my laptop and run the same test to have any useful information.

What exactly do you try to find out? Linux has different
file systems, different virtual memory management, different
buffer cache implementation, different scheduler, different
I/O drivers ... The numbers that you'll get won't be very
useful for comparisons.

> And reboot between each test. And shut down network interface
> and all daemons.

And don't read and write at the same time on the same drive
because the disk heads' seek times will blow the benchmark
up. If you want to measure write speed, don't read from
the disk at the same time, and vice versa. You should use
a disk which isn't used for anything else, i.e. don't use
the system disk for benchmarks.

If you want to benchmark the pure speed of the drive (not
the speed of the file system), then don't put a filesystem
on the disk at all. Instead, use the raw device.

Best regards
Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"If you think C++ is not overly complicated, just what is a protected
abstract virtual base pure virtual private destructor, and when was the
last time you needed one?"
-- Tom Cargil, C++ Journal

Eric Anderson

unread,
Sep 26, 2006, 12:39:30 PM9/26/06
to freebsd-p...@freebsd.org

It's probably because of caching on the disk. The normal write goes
in/out of the on-disk cache, the O_SYNC may be forced to go to the platters.

Also, if you didn't already, you should run the test many times,
umounting/mounting the filesystem in question in between each test.
Also, I recommend using a block device, instead of a file on a
filesystem, since the filesystem could allocate blocks for the file
differently each time, causing varying results.

Eric

--
------------------------------------------------------------------------
Eric Anderson Sr. Systems Administrator Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------

0 new messages