Dedup performance

196 views
Skip to first unread message

vvatashki

unread,
Apr 13, 2010, 5:37:40 AM4/13/10
to zfs-fuse
What is your write performance with enabled dedup ?
My build is "restore zdb -l block-device"
7117902d92b9e35663f12c67be2cfdbc3e376b30
On my raidz setup (3x2Tb) with dedup=on over dm-crypt I get this:

dd if=/dev/frandom of=/zfs/test.bin bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 138.079 s, 7.6 MB/s

With dedup=off the result is this

dd if=/dev/frandom of=/zfs/test.bin bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 11.0236 s, 95.1 MB/s

My /dev/frandom device produce random bits at 223 MB/s. My system is
debian sid, E5200, 3Gb Ram

dd if=/dev/frandom of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 4.70152 s, 223 MB/s

My zfsrc is at defaults. What is your performance with
dedup={on,sha256,verify} ?
Later I'll detailed post bonnie++ and iostat results.

Emmanuel Anne

unread,
Apr 13, 2010, 5:55:52 AM4/13/10
to zfs-...@googlegroups.com
With dedup=verify it would be even slower !
It's not extremely surprising. Dedup performance is directly dependant on the number of cores + the power of your cpu (and the ram available). But even with a super system you would still be slower than with dedup=off.

Don't know about the real performance numbers though.

If someone has some time to loose in running some benchmark program (bonnie, iostat...), it would be interesting to have a comparison of :
 - no zfsrc at all, no special command line
 - default zfsrc as it is currently in my git
 - very latest git from yesterday with patch for default_permissions + command line -a 3600 -e 3600 -o default_permissions
 - and eventually the fastest of the above with dedup=on

2010/4/13 vvatashki <vvat...@gmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

To unsubscribe, reply using "remove me" as the subject.



--
zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

Fajar A. Nugraha

unread,
Apr 13, 2010, 6:08:54 AM4/13/10
to zfs-...@googlegroups.com
On Tue, Apr 13, 2010 at 4:55 PM, Emmanuel Anne <emmanu...@gmail.com> wrote:
> With dedup=verify it would be even slower !
> It's not extremely surprising. Dedup performance is directly dependant on
> the number of cores + the power of your cpu (and the ram available). But
> even with a super system you would still be slower than with dedup=off.
>
> Don't know about the real performance numbers though.

Even on opensolaris list there's been no exact guideline as to "dedup
will incure x % performance penalty" or "you should have y amount of
memory to get decent performance".

The general guideline is that LOTS of RAM and SSD as L2ARC will
greatly increase dedup performance. Since we can't have L2ARC (yet),
dedup on zfs-fuse would probably be unusable (performance-wise) for
any large-enough dataset.

--
Fajar

Aneurin Price

unread,
Apr 14, 2010, 8:22:08 AM4/14/10
to zfs-...@googlegroups.com
On Tue, Apr 13, 2010 at 11:08, Fajar A. Nugraha <fa...@fajar.net> wrote:
> Even on opensolaris list there's been no exact guideline as to "dedup
> will incure x % performance penalty" or "you should have y amount of
> memory to get decent performance".
>

This post gives the best information I've seen:
http://blogs.sun.com/roch/entry/dedup_performance_considerations1

In particular, the section titled 'So how large is the dedup table?'
is informative.

Nye

jafo

unread,
Apr 15, 2010, 1:37:13 AM4/15/10
to zfs-fuse
On Apr 13, 3:37 am, vvatashki <vvatas...@gmail.com> wrote:
> What is your write performance with enabled dedup ?

Very poor.

I've been running some testing of it this week, and here's what I've
found:

System: 4 7200RPM SATA drives in hardware RAID-10. Quad 2.6GHz
Core 2
CPU. 8GB RAM.

Virtual machine: 5.5GB RAM, 5x200GB virtual discs (on the above
physical
discs), quad CPU.

I've noticed very little difference between dedup=on and
dedup=verify.

I'm copying some real, live data on it from another machine. It's
taken
2 days 20 hours to copy 235GB. That's around 1MB/sec.

The system is largely idle, with one CPU reporting mostly wait
time, and
the rest just entirely idle. For example:

root@zfsbackup1:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 1 0 46220 14924 4536360 0 0 129 370 44 7 1
2 75 23
0 1 0 46088 14924 4536364 0 0 138 2 1077 1369 0
1 75 25
0 1 0 46088 14924 4536364 0 0 140 252 1275 1904 0
1 74 25
0 1 0 46088 14924 4536364 0 0 117 282 1052 1670 0
0 75 24
0 0 0 46088 14924 4536364 0 0 75 154 799 1316 1
1 75 23
0 1 0 46088 14924 4536364 0 0 96 0 626 989 0
0 77 23
0 1 0 46088 14924 4536364 0 0 88 0 592 973 0
0 76 24
0 1 0 46088 14924 4536364 0 0 92 0 603 1049 0
1 76 23

This seems to be entirely disc bounded, for example "iostat -x 10"
shows:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.10 38.60 41.00 71.60 827.50
11.30 2.20 27.68 9.89 78.70
sdc 0.00 0.30 38.50 41.00 84.50 828.30
11.48 2.35 29.60 10.47 83.20
sdd 0.00 0.40 38.30 40.80 70.50 826.80
11.34 2.37 30.09 10.80 85.40
sde 0.00 0.10 38.70 41.00 59.60 827.00
11.12 2.19 27.53 9.71 77.40
sdf 0.00 0.00 37.40 41.50 69.60 824.50
11.33 2.33 29.54 10.56 83.30

sda is the system disc. The other discs are 80% utilized. Average
wait
time is around 30ms, and time to service was around 10ms. These
discs
have around a 10ms average access time, so dedup is largely seek
bounded, about about evenly spread across reads and writes.
Thought it
is writing more data at a time than it's reading by about 10x.

So, yeah, dedup is seek-crazy. Which isn't entirely surprising.
However,
I'll admit that I'm surprised that it's *THIS* slow.

That's what I've got for the moment.

Sean

Emmanuel Anne

unread,
Apr 15, 2010, 4:23:11 AM4/15/10
to zfs-...@googlegroups.com
Your numbers seem crazy, but don't ask me to tell you why, I am not everywhere I can't check what happens on any system without even checking it directly.

Anyway I did a quick test of dedup performance here on an old system :
amd64 2.2 GHz, 1 core, from an old 80 Gb hd connected by usb2 of a 5 Gb file :
from the external hd to jfs : 19 Mb/s
After umount/mount again to clear the cache :
the same file to zfs without dedup, but with default_permissions + patch, and with arc_size = 300M :
17.6 Mb/s. Ideally this should be closer to 19 Mb/s, it would be nice to know where it's loosing its speed here.
The same still after umount/mount to zfs + dedup=on : 13 Mb/s

That's with 1 core only, I'll probably try the same on a faster system one day (at least with 2 cores to see if it helps).

Now notice that if you did that same test with 5000 files of 1 Mb instead of 1 file of 5 Gb, it would probably be much slower because of the much greater low level system calls through fuse.
5000 files of 1 Mb each ? I might try that too...

2010/4/15 jafo <jaf...@gmail.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

To unsubscribe, reply using "remove me" as the subject.

Emmanuel Anne

unread,
Apr 15, 2010, 4:24:08 AM4/15/10
to zfs-...@googlegroups.com
To be more precise : the speed is measured using rsync -av, and for dedup=on it's precisely :
13676916.63 bytes/sec

2010/4/15 Emmanuel Anne <emmanu...@gmail.com>

Emmanuel Anne

unread,
Apr 15, 2010, 5:11:40 AM4/15/10
to zfs-...@googlegroups.com
Dedup tests part 2 :
Same thing with rsync on a folder of 5000 files of 1 Mb, filled with zero, with dedup=on (so dedup should work almost all the time then, I didn't want to bother to install frandom to fill the files with random stuff) : 11.6 Mb/s.

Then I discovered that actually all these tests were without any zfsrc at all (this is what happens when you test too many things at the same time, I had forgotten that I had removed this zfsrc file).
So the same test with the default_permissions zfsrc (300 Mb arc_size, default_permissions + patch, -a 3600, -e 3600) : 13.6 Mb/s
The 5 Gb file with this same zfsrc file and dedup=on :
16.9 Mb/s (much better !).

By the way the numbers are very low on this system because when copying the 5000 files the system was pausing while flushing the cache of the writing disk (even though it was reading from some usb disk !!!). So I should get much better numbers with some decent system which does not pause all the time. Overall, compared to the jfs basic performance, it's not so bad.

2010/4/15 Emmanuel Anne <emmanu...@gmail.com>

Emmanuel Anne

unread,
Apr 15, 2010, 5:42:18 AM4/15/10
to zfs-...@googlegroups.com
part 3 : on a more recent dual core amd x2 4200+ system which does not pause all the time to flush its cache.

Here I got amazing numbers with dedup (always with the zfsrc defaul_permissions file) :
copying the 5000 files from usb2 without dedup : 17053168.91 bytes/s
Same thing after umount/mount with dedup=on  : 17164809.95 bytes/s !!!

So dedup=on is actually faster here than dedup=off !!!
Here it's probably because the files are all filled with 0, so they are not copyed, it shows that the old system should really be upgraded now ! ;-)
I might try to install frandom later to do a test with 5000 different files, but not before quite some time.

As a side note, the 5 Gb file with dedup=on : 18727624.42 bytes/s.
Here the reading hd was on all the time (no pauses at all), so the bottleneck here is probably the speed of the very old usb drive (80 Gb !).

2010/4/15 Emmanuel Anne <emmanu...@gmail.com>

vvatashki

unread,
Apr 16, 2010, 8:32:27 AM4/16/10
to zfs-fuse
In my tests I never had anything higher than 7-12MB/s for non zero
data. ZFS dedup is hungry for RAM. Is zfs-fuse L2ARC cache
functional ? That can be solution.
My best result was when I set ARC size to max RAM and turned off
caching of data in ARC to leave room for checksums with:
debian:~# zfs set primarycache=metadata zpool

I found some cool usage for zdb in measuring exactly how much RAM does
your system needs.
This prints your real DeDup table:

debian:~# zdb -DD zpool
DDT-sha256-zap-duplicate: 4526 entries, size 1610 on disk, 1388 in
core
DDT-sha256-zap-unique: 246330 entries, size 323 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket allocated referenced
______ ______________________________
______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE
DSIZE
------ ------ ----- ----- ----- ------ ----- -----
-----
1 241K 30.0G 30.0G 29.9G 241K 30.0G 30.0G
29.9G
2 1.35K 110M 110M 110M 2.70K 220M 220M
220M
4 3.05K 17.7M 17.7M 18.4M 12.2K 71.4M 71.4M
74.0M
8 16 169K 169K 174K 133 1.95M 1.95M
1.98M
16 3 384K 384K 384K 59 7.38M 7.38M
7.37M
32 2 1K 1K 1.50K 115 57.5K 57.5K
86.0K
64 1 512 512 766 73 36.5K 36.5K
54.6K
128 1 512 512 766 231 116K 116K
173K
32K 1 128K 128K 128K 39.1K 4.89G 4.89G
4.88G
Total 245K 30.1G 30.1G 30.1G 295K 35.1G 35.1G
35.1G

dedup = 1.17, compress = 1.00, copies = 1.00, dedup * compress /
copies = 1.17

You can estimate how much space you will get by dedup of all your data
on pool with(takes a while and a lot of RAM):
debian:~# zdb -S zfstest
Simulated DDT histogram:

bucket allocated referenced
______ ______________________________
______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE
DSIZE
------ ------ ----- ----- ----- ------ ----- -----
-----
1 6.87K 879M 724M 724M 6.87K 879M 724M
724M
2 2.59K 332M 214M 214M 5.22K 668M 428M
429M
4 16 2M 352K 358K 103 12.9M 2.23M
2.27M
8 8 1M 191K 192K 64 8M 1.49M
1.50M
Total 9.48K 1.19G 939M 939M 12.3K 1.53G 1.13G
1.13G

dedup = 1.23, compress = 1.36, copies = 1.00, dedup * compress /
copies = 1.67

The performance will be bad if those tables are not in RAM or cache
device.
The amount of RAM required is Total blocks * 250 bytes for block.Here
is 9.48K * 250 = 2 370K

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Subscription settings: http://groups.google.com/group/zfs-fuse/subscribe?hl=en

Emmanuel Anne

unread,
Apr 16, 2010, 9:14:41 AM4/16/10
to zfs-...@googlegroups.com
2010/4/16 vvatashki <vvat...@gmail.com>

In my tests I never had anything higher than 7-12MB/s for non zero
data. ZFS dedup is hungry for RAM. Is zfs-fuse L2ARC cache
functional ? That can be solution.

Probably but here I disabled it since I have no super fast support for it.
 
My best result was when I set ARC size to max RAM and turned off
caching of data in ARC to leave room for checksums with:
debian:~# zfs set primarycache=metadata zpool

I found some cool usage for zdb in measuring exactly how much RAM does
your system needs.
This prints your real DeDup table:

debian:~# zdb -DD zpool
DDT-sha256-zap-duplicate: 4526 entries, size 1610 on disk, 1388 in
core
DDT-sha256-zap-unique: 246330 entries, size 323 on disk, 155 in core

Excellent, I hope they will document all of this one day, quite a useful tool, really !

Which is not that much, really, 2.3 Mb then ?
But it might explain the big difference between my old system and the super fast and slow one, ram usage. After all I only deduped 5 Gb which is quite ridiculous when you think about it !

vvatashki

unread,
Apr 16, 2010, 10:20:32 AM4/16/10
to zfs-fuse
> > In my tests I never had anything higher than 7-12MB/s for non zero
> > data. ZFS dedup is hungry for RAM. Is zfs-fuse L2ARC cache
> > functional ? That can be solution.
>
> Probably but here I disabled it since I have no super fast support for it.
I think we should enable it. It will reduce random seeks from the
storage disks,
and if the cache device is SSD it will serve random seeks very good.
I can test it with one intel x-25m 80gb.
This was from sample pool. My main pool is around 27M hashes, this is
around 6GB RAM :)

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Subscription settings: http://groups.google.com/group/zfs-fuse/subscribe?hl=en

jafo

unread,
Apr 17, 2010, 1:32:52 AM4/17/10
to zfs-fuse
Ok, my problem was probably that I was using the stock 100MB size for
the
ARC. On a >400GB data-set that I was testing. Using the zdb trick
that
watashki mentioned (thanks!), I determined that I currently have
around
650MB in my dedup tables ("Total 2.31M" * 250 bytes). Which is
actually
less than I was expecting.

I'll probably need to do that thing to make the ARC only hold the
metadata,
as I only have around 4.5GB that I could give to the ARC, at most.

So, I'll probably try setting that, and try another run to see how it
goes.

As I said, my system where I was seeing this issue was entirely
disc-bounded, and was only getting around 150 IOs per second. Part of
that
is because it's on a virtual machine with the RAID-Z1 existing on 5
virtual
drives that live on a 4-drive RAID-10 array. But, in any case I
really
need to be pulling from cache and not disc for this to be realistic
for
serious data-sets. Or pulled off an SSD.

Sean

Emmanuel Anne

unread,
Apr 17, 2010, 2:05:30 AM4/17/10
to zfs-...@googlegroups.com
No I meant I disabled it from the zfs command :
zfs set secondarycache=none pool
or something equivalent (I am not in front of the same machine, so I can't check the exact setting but it was probably this one).

2010/4/16 vvatashki <vvat...@gmail.com>
Reply all
Reply to author
Forward
0 new messages