[PERFORM] Using IOZone to simulate DB access patterns

Josh Berkus

unread,

Apr 3, 2009, 7:12:55 PM4/3/09

to

All,

I've been using Bonnie++ for ages to do filesystem testing of new DB
servers. But Josh Drake recently turned me on to IOZone.

Thing is, IOZone offers a huge complex series of parameters, so I'd
really like to have some idea of how to configure it so its results are
applicable to database performance.

For example, I have a database which is expected to grow to around 200GB
in size, most of which consists of two tables partioned into 0.5GB
chunks. Reads and writes are consistent and fairly random. The system
has 8 cores. How would you configure IOZone to test a filesystem for this?

--Josh Berkus

--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Josh Berkus

unread,

Apr 3, 2009, 8:09:53 PM4/3/09

to

On 4/3/09 4:12 PM, Josh Berkus wrote:
> All,
>
> I've been using Bonnie++ for ages to do filesystem testing of new DB
> servers. But Josh Drake recently turned me on to IOZone.

Related to this: is IOZone really multi-threaded? I'm doing a test run
right now, and only one CPU is actually active. While there are 6
IOZone processes, most of them are idle.

--Josh

henk de wit

unread,

Apr 4, 2009, 6:00:52 AM4/4/09

to

> I've been using Bonnie++ for ages to do filesystem testing of new DB servers. But Josh Drake recently turned me on to IOZone.

Perhaps a little off-topic here, but I'm assuming you are using Linux to test your DB server (since you mention Bonnie++). But it seems to me that IOZone only has a win32 client. How did you actually run IOZone on Linux?

Express yourself instantly with MSN Messenger! MSN Messenger

Jesper Krogh

unread,

Apr 4, 2009, 6:49:52 AM4/4/09

to

henk de wit wrote:
>> I've been using Bonnie++ for ages to do filesystem testing of new DB servers. But Josh Drake recently turned me on to IOZone.
>
> Perhaps a little off-topic here, but I'm assuming you are using Linux to
> test your DB server (since you mention Bonnie++). But it seems to me
> that IOZone only has a win32 client. How did you actually run IOZone on
> Linux?

$ apt-cache search iozone
iozone3 - Filesystem and Disk Benchmarking Tool

--
Jesper

henk de wit

unread,

Apr 4, 2009, 11:54:43 AM4/4/09

to

> $ apt-cache search iozone
> iozone3 - Filesystem and Disk Benchmarking Tool

You are right. I was confused with IOMeter, which can't be run on Linux (the Dynamo part can, but that's not really useful without the 'command & control' part).

Josh Berkus

unread,

Apr 10, 2009, 1:41:47 AM4/10/09

to

All,

Wow, am I really the only person here who's used IOZone?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

Mark Kirkwood

unread,

Apr 10, 2009, 2:26:58 AM4/10/09

to

Josh Berkus wrote:
> All,
>
> Wow, am I really the only person here who's used IOZone?
>

No - I used to use it exclusively, but everyone else tended to demand I
redo stuff with bonnie before taking any finding seriously... so I've
kinda 'submitted to the Borg' as it were....

Joshua D. Drake

unread,

Apr 10, 2009, 12:00:33 PM4/10/09

to

On Fri, 2009-04-03 at 17:09 -0700, Josh Berkus wrote:
> On 4/3/09 4:12 PM, Josh Berkus wrote:
> > All,
> >
> > I've been using Bonnie++ for ages to do filesystem testing of new DB
> > servers. But Josh Drake recently turned me on to IOZone.
>
> Related to this: is IOZone really multi-threaded? I'm doing a test run
> right now, and only one CPU is actually active. While there are 6
> IOZone processes, most of them are idle.

In order to test real interactivity (AFAIK) with iozone you have to
launch multiple iozone instances. You also need to do them from separate
directories, otherwise it all starts writing the same file. The work I
did here:

http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

Was actually with multiple bash scripts firing separate instances. The
interesting thing here is the -s 1000m and -r8k. Those options are
basically use a 1000 meg file (like our data files) with 8k chunks (like
our pages).

Based on your partitioning scheme, what is the break out? Can you
reasonably expect all partitions to be used equally?

Sincerely,

Joshua D. Drake

>
> --Josh
>
>
--
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

Josh Berkus

unread,

Apr 10, 2009, 1:10:03 PM4/10/09

to

On 4/9/09 11:26 PM, Mark Kirkwood wrote:
> Josh Berkus wrote:
>> All,
>>
>> Wow, am I really the only person here who's used IOZone?
>>
>
> No - I used to use it exclusively, but everyone else tended to demand I
> redo stuff with bonnie before taking any finding seriously... so I've
> kinda 'submitted to the Borg' as it were....

Bonnie++ has its own issues with concurrency; it's using some kind of
ad-hoc threading implementation, which results in not getting real
parallelism. I just did a test with -c 8 on Bonnie++ 1.95, and the
program only ever used 3 cores.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--

Josh Berkus

unread,

Apr 10, 2009, 1:15:36 PM4/10/09

to

JD,

> In order to test real interactivity (AFAIK) with iozone you have to
> launch multiple iozone instances. You also need to do them from separate
> directories, otherwise it all starts writing the same file. The work I
> did here:

Actually, current IOZone allows you to specify multiple files. For
example, the command line I was using:

iozone -R -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 8 -l 6 -u 6 -r 8k -s 4G -F f1
f2 f3 f4 f5 f6

And it does indeed launch 6 processes under that configuration.
However, I found that for pretty much all of the write tests except for
the first the processes blocked each other:

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 26 6061 5825 0 80 0 - 11714 wait pts/3 00:00:00 iozone
1 D 26 6238 6061 0 78 0 - 11714 sync_p pts/3 00:00:03 iozone
1 D 26 6239 6061 0 78 0 - 11714 sync_p pts/3 00:00:03 iozone
1 D 26 6240 6061 0 78 0 - 11714 sync_p pts/3 00:00:03 iozone
1 D 26 6241 6061 0 78 0 - 11714 sync_p pts/3 00:00:03 iozone
1 D 26 6242 6061 0 78 0 - 11714 stext pts/3 00:00:03 iozone
1 R 26 6243 6061 0 78 0 - 11714 - pts/3 00:00:03 iozone

Don Capps says that the IOZone code is perfect, and that pattern
indicates a problem with my system, which is possible. Can someone else
try concurrent IOZone on their system and see if they get the same
pattern? I just don't have that many multi-core machines to test on.

Also, WTF is the difference between "Children See" and "Parent Sees"?
IOZone doesn't document this anywhere.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--

Josh Berkus

unread,

Apr 10, 2009, 1:31:35 PM4/10/09

to

Scott,

> FIO with profiles such as the below samples are easy to set up, and they can
> be mix/matched to test what happens with mixed read/write seq/rand -- with
> surprising and useful tuning results. Forcing a cache flush or sync before
> or after a run is trivial. Changing to asynchronous I/O, direct I/O, or
> other forms is trivial. The output result formatting is very useful as
> well.

FIO? Link?

Greg Smith

unread,

Apr 10, 2009, 2:01:33 PM4/10/09

to

On Fri, 10 Apr 2009, Scott Carey wrote:

> FIO with profiles such as the below samples are easy to set up

There are some more sample FIO profiles with results from various
filesystems at
http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

Greg Smith

unread,

Apr 10, 2009, 3:25:01 PM4/10/09

to

On Fri, 10 Apr 2009, Scott Carey wrote:

> I wish to thank Greg here as many of my profile variations came from the
> above as a starting point.

That page was mainly Mark Wong's work, I just remembered where it was.

M. Edward (Ed) Borasky

unread,

Apr 10, 2009, 8:03:37 PM4/10/09

to

I've done quite a bit with IOzone, but if you're on Linux, you have lots of options. In particular, you can actually capture I/O patterns from a running application with blktrace, and then replay them with btrecord / btreplay.

The documentation for this stuff is a bit hard to find. Some of the distros don't install it by default. But have a look at

http://ow.ly/2zyW

for some "Getting Started" info.
--
M. Edward (Ed) Borasky
http://www.linkedin.com/in/edborasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Mark Wong

unread,

Apr 11, 2009, 2:44:33 PM4/11/09

to

On Fri, Apr 10, 2009 at 11:01 AM, Greg Smith <gsm...@gregsmith.com> wrote:
> On Fri, 10 Apr 2009, Scott Carey wrote:
>
>> FIO with profiles such as the below samples are easy to set up
>
> There are some more sample FIO profiles with results from various
> filesystems at
> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

There's a couple of potential flaws I'm trying to characterize this
weekend. I'm having second thoughts about how I did the sequential
read and write profiles. Using multiple processes doesn't let it
really do sequential i/o. I've done one comparison so far resulting
in about 50% more throughput using just one process to do sequential
writes. I just want to make sure there shouldn't be any concern for
being processor bound on one core.

The other flaw is having a minimum run time. The max of 1 hour seems
to be good to establishing steady system utilization, but letting some
tests finish in less than 15 minutes doesn't provide "good" data.
"Good" meaning looking at the time series of data and feeling
confident it's a reliable result. I think I'm describing that
correctly...

Regards,
Mark

Mark Wong

unread,

Apr 26, 2009, 11:28:13 PM4/26/09

to

On Sat, Apr 11, 2009 at 11:44 AM, Mark Wong <mar...@gmail.com> wrote:
> On Fri, Apr 10, 2009 at 11:01 AM, Greg Smith <gsm...@gregsmith.com> wrote:
>> On Fri, 10 Apr 2009, Scott Carey wrote:
>>
>>> FIO with profiles such as the below samples are easy to set up
>>
>> There are some more sample FIO profiles with results from various
>> filesystems at
>> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
>
> There's a couple of potential flaws I'm trying to characterize this
> weekend. I'm having second thoughts about how I did the sequential
> read and write profiles. Using multiple processes doesn't let it
> really do sequential i/o. I've done one comparison so far resulting
> in about 50% more throughput using just one process to do sequential
> writes. I just want to make sure there shouldn't be any concern for
> being processor bound on one core.
>
> The other flaw is having a minimum run time. The max of 1 hour seems
> to be good to establishing steady system utilization, but letting some
> tests finish in less than 15 minutes doesn't provide "good" data.
> "Good" meaning looking at the time series of data and feeling
> confident it's a reliable result. I think I'm describing that
> correctly...

FYI, I've updated the wiki with the parameters I'm running with now.
I haven't updated the results yet though.

Mark Wong

unread,

Apr 26, 2009, 11:44:51 PM4/26/09

to

On Sat, Apr 11, 2009 at 7:00 PM, Scott Carey <sc...@richrelevance.com> wrote:

>
>
> On 4/11/09 11:44 AM, "Mark Wong" <mar...@gmail.com> wrote:
>
>> On Fri, Apr 10, 2009 at 11:01 AM, Greg Smith <gsm...@gregsmith.com> wrote:
>>> On Fri, 10 Apr 2009, Scott Carey wrote:
>>>
>>>> FIO with profiles such as the below samples are easy to set up
>>>
>>> There are some more sample FIO profiles with results from various
>>> filesystems at
>>> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
>>
>> There's a couple of potential flaws I'm trying to characterize this
>> weekend. I'm having second thoughts about how I did the sequential
>> read and write profiles. Using multiple processes doesn't let it
>> really do sequential i/o. I've done one comparison so far resulting
>> in about 50% more throughput using just one process to do sequential
>> writes. I just want to make sure there shouldn't be any concern for
>> being processor bound on one core.
>

> FWIW, my raid array will do 1200MB/sec, and no tool I've used can saturate
> it without at least two processes. 'dd' and fio can get close (1050MB/sec),
> if the block size is <= ~32k <=64k. With a postgres sized 8k block 'dd'
> can't top 900MB/sec or so. FIO can saturate it only with two+ readers.
>
> I optimized my configuration for 4 concurrent sequential readers with 4
> concurrent random readers, and this helped the overall real world
> performance a lot. I would argue that on any system with concurrent
> queries, concurrency of all types is important to measure. Postgres isn't
> going to hold up one sequential scan to wait for another. Postgres on a
> 3.16Ghz CPU is CPU bound on a sequential scan at between 250MB/sec and
> 800MB/sec on the type of tables/queries I have. Concurrent sequential
> performance was affected by:
> Xfs -- the gain over ext3 was large
> Readahead tuning -- about 2MB per spindle was optimal (20MB for me, sw raid
> 0 on 2x[10 drive hw raid 10]).
> Deadline scheduler (big difference with concurrent sequential + random
> mixed).
>
> One reason your tests write so much faster than they read was the linux
> readahead value not being tuned as you later observed. This helps ext3 a
> lot, and xfs enough so that fio single threaded was faster than 'dd' to the
> raw device.

>
>>
>> The other flaw is having a minimum run time. The max of 1 hour seems
>> to be good to establishing steady system utilization, but letting some
>> tests finish in less than 15 minutes doesn't provide "good" data.
>> "Good" meaning looking at the time series of data and feeling
>> confident it's a reliable result. I think I'm describing that
>> correctly...
>

> It really depends on the specific test though. You can usually get random
> iops numbers that are realistic in a fairly short time, and 1 minute long
> tests for me vary by about 3% (which can be +-35MB/sec in my case).
>
> I ran my tests on a partition that was only 20% the size of the whole
> volume, and at the front of it. Sequential transfer varies by a factor of 2
> across a SATA disk from start to end, so if you want to compare file systems
> fairly on sequential transfer rate you have to limit the partition to an
> area with relatively constant STR or else one file system might win just
> because it placed your file earlier on the drive.

That's probably what is going with the 1 disk test:

http://207.173.203.223/~markwkm/community10/fio/linux-2.6.28-gentoo/1-disk-raid0/ext2/seq-read/io-charts/iostat-rMB.s.png

versus the 4 disk test:

http://207.173.203.223/~markwkm/community10/fio/linux-2.6.28-gentoo/4-disk-raid0/ext2/seq-read/io-charts/iostat-rMB.s.png

These are the throughput numbs but the iops are in the same directory.

Laurent Laborde

unread,

Apr 27, 2009, 11:33:46 AM4/27/09

to

you can also play with this-tiny-shiny tool :
http://pgfoundry.org/projects/pgiosim/
It just works and heavily stress the disk with random read/write.

--
F4FQM
Kerunix Flan
Laurent Laborde