ZFS send performance tuning?

Carsten Aulbert

unread,

Aug 11, 2008, 10:09:25 AM8/11/08

to

Hi all,

first of all, thanks a lot everyone for helping me with my older problem
back in June(?) where I wanted to tune ZFS send/receive over the network
(sorry I don't have the old message anymore thus no direct reply to that
thread).

The suggestion of mbuffer was a great one, since this showed that our
bottleneck was netcat pushing the data over the network (only about 100
MB/s - even locally). When we used socat we were able to get more than
300 MB/s when transferring /dev/zero to /dev/null ovr the network.

However (there is always a "however), when using zfs send we only get
about 160 MB/s over the network or 180 MB/s to /dev/null locally. Right
now our zpool on the x4500 looks like this:

zpool status
pool: atlashome
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
atlashome ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
c5t3d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
c7t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
c0t6d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
c7t6d0 ONLINE 0 0 0
c0t7d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
c5t7d0 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
c7t7d0 ONLINE 0 0 0

What steps would be necessary to get a faster zfs send, i.e. a faster
way to replicate snapshots to other servers, without losing (much)
capacity and/or risk the data integrity/safety?

Thanks a lot for any hint

Cheers

Carsten

Rainer Duffner

unread,

Aug 12, 2008, 8:50:55 AM8/12/08

to

Carsten Aulbert schrieb:
> Hi all,
>

>
> What steps would be necessary to get a faster zfs send, i.e. a faster
> way to replicate snapshots to other servers, without losing (much)
> capacity and/or risk the data integrity/safety?

In my brief ZFS send experiments I noticed that it was only maybe 30%
faster than rsync.
That may sound a lot - but it took about 20 minutest to transfer 6 GB of
small files locally and about 30 minutes to transfer them via rsync.
zfs send yielded 5.5 MB/s.

So, ZFS send/receive is not what I thought it to be (bypass filesystem
and work directly on the bit-level).
So, if rsync is slow, zfs send /receive isn't much faster.

Rainer

Carsten Aulbert

unread,

Aug 12, 2008, 9:27:16 AM8/12/08

to

Rainer Duffner wrote:

> In my brief ZFS send experiments I noticed that it was only maybe 30%
> faster than rsync.
> That may sound a lot - but it took about 20 minutest to transfer 6 GB of
> small files locally and about 30 minutes to transfer them via rsync.
> zfs send yielded 5.5 MB/s.
>

30% would already be much, since we talk about moving TBytes of data here.

> So, ZFS send/receive is not what I thought it to be (bypass filesystem
> and work directly on the bit-level).
> So, if rsync is slow, zfs send /receive isn't much faster.

I could test rsync as well, but zfs snapshots look nice enough to keep
going with ZFS ;)

Thanks

Carsten

Ian Collins

unread,

Aug 13, 2008, 4:09:49 AM8/13/08

to

Carsten Aulbert wrote:
>
> What steps would be necessary to get a faster zfs send, i.e. a faster
> way to replicate snapshots to other servers, without losing (much)
> capacity and/or risk the data integrity/safety?
>

Are you doing full sends, or incrementals?

--
Ian Collins.

Ian Collins

unread,

Aug 13, 2008, 4:11:47 AM8/13/08

to

Rainer Duffner wrote:
> Carsten Aulbert schrieb:
>> Hi all,
>>
>
>>
>> What steps would be necessary to get a faster zfs send, i.e. a faster
>> way to replicate snapshots to other servers, without losing (much)
>> capacity and/or risk the data integrity/safety?
>
> In my brief ZFS send experiments I noticed that it was only maybe 30%
> faster than rsync.

Did you compare incremental sends with rsync? I've found rsync to be
very slow at incremental updates (a couple of thousand changes in a few
million files).

--
Ian Collins.

Carsten Aulbert

unread,

Aug 13, 2008, 4:16:26 AM8/13/08

to

Ian Collins wrote:

> Are you doing full sends, or incrementals?
>

Right now, only full sends of a snapshot (27 GB) couple of thousand
files (need to count them at some point). I'm currently benchmarking it
with various settings, but I'm already beyond the sweet spot for both
raidz and raidz2. If people are interested, I can pots the findings
later. But still top performance for dumping the snapshot to /dev/null
*locally* was a mere 174 MB/s from 46 disks. I don't think that I really
like this performance much.

Cheers

Carsten

Rainer Duffner

unread,

Aug 13, 2008, 4:55:19 AM8/13/08

to

Ian Collins schrieb:

The sync consisted of two zfs filesystems: 6 GB and 3.3 TB.
The smaller one consisted only of small files (thumbnails and stuff),
the large where youtube-style flashvideos...

After the test-transfer of the 6 GB filesystem, we decided zfs
send/receive was not worth the trouble - especially as you don't get a
progress report.
So we went for rsync, which took about 30 minutes for the 6 GB
filesystem and some 13 hours for the 3.3 TB filesystem (locally, from /a
to /b - no network involved).
The 3.3 TB filesystem consisted of about half a million files, I think
the 6 GB filesystem had roughly about the same number of files, but I
may be wrong.
Anyway, an incremental rsync for both these files just took around 5
minutes (server has 8 GB RAM, 2*DC Opteron-F).
Source was a mirrored pool, thin-striped over two full MSA70s (2*25
disks) of 146 GB SAS disks, destination was a mirrored pool,
thin-striped over 32 1TB WD SATA2 drives in two Promise Vtrak J610sS cases.
All hanging on two of SUN's no-RAID SAS-HBAs.

The Promise JBODs seem to work very well - the price of the chassis is
about the same as the one from SUN, but you have to source the disks
yourself ;-)

cheers,
Rainer

Ian Collins

unread,

Aug 13, 2008, 4:55:52 AM8/13/08

to

Your pool configuration isn't a very good one (way too many drives in
each vdev), try more, smaller vdevs. Try 6 drives, one per controller
in each vdev.

--
Ian Collins.

Carsten Aulbert

unread,

Aug 13, 2008, 5:07:32 AM8/13/08

to

Ian Collins wrote:
> Your pool configuration isn't a very good one (way too many drives in
> each vdev), try more, smaller vdevs. Try 6 drives, one per controller
> in each vdev.
>

exactly this benchmark is running right now. When using raidz2 I get:

Average Speed (MB/s) | partition layout - number of disks per vdev
97.6 46
131.6 23 23
151.6 16 15 15
162.4 12 12 11 11
169.0 10 9 9 9 9
165.8 8 8 8 8 7 7
159.0 7 7 7 7 6 6 6
147.6 6 6 6 6 6 6 5 5
140.4 6 5 5 5 5 5 5 5 5

You see that the sweetspot here is reached with 10 or 9 disks per vdev
and it's getting slower after that :(

Since I'm using 46 disks I was hoping to get a speed of at least, say,
300-350 MB/s.

Cheers

Carsten

Ian Collins

unread,

Aug 13, 2008, 5:22:31 AM8/13/08

to

Carsten Aulbert wrote:
> Ian Collins wrote:
>> Your pool configuration isn't a very good one (way too many drives in
>> each vdev), try more, smaller vdevs. Try 6 drives, one per controller
>> in each vdev.
>>
>
> exactly this benchmark is running right now. When using raidz2 I get:
>
> Average Speed (MB/s) | partition layout - number of disks per vdev
> 97.6 46
> 131.6 23 23
> 151.6 16 15 15
> 162.4 12 12 11 11
> 169.0 10 9 9 9 9
> 165.8 8 8 8 8 7 7
> 159.0 7 7 7 7 6 6 6
> 147.6 6 6 6 6 6 6 5 5
> 140.4 6 5 5 5 5 5 5 5 5
>
> You see that the sweetspot here is reached with 10 or 9 disks per vdev
> and it's getting slower after that :(
>

That's to be expected and why I suggested more, smaller vdevs.

> Since I'm using 46 disks I was hoping to get a speed of at least, say,
> 300-350 MB/s.
>

If you want performance rather than capacity, stripe 23 2 way mirrors.

--
Ian Collins.

Carsten Aulbert

unread,

Aug 13, 2008, 5:40:05 AM8/13/08

to

Ian Collins wrote:

>> You see that the sweetspot here is reached with 10 or 9 disks per vdev
>> and it's getting slower after that :(
>>
> That's to be expected and why I suggested more, smaller vdevs.

But not smaller than 10 or 9 per vdev ;)

>
>> Since I'm using 46 disks I was hoping to get a speed of at least, say,
>> 300-350 MB/s.
>>
> If you want performance rather than capacity, stripe 23 2 way mirrors.
>

No I want both and I don't understand that ZFS seems to be so
inefficient with the given resources. With a standard Linux box, a
16channel Areca Hardware in RAID6 setup I get easily more than 350 MB/s
out of xfsdump. I know it's almost comparing apples with pears, but still.

Cheers

Carsten