DEDUP state of affairs

5 views
Skip to first unread message

sgheeren

unread,
Aug 13, 2010, 8:54:22 AM8/13/10
to zfs-...@googlegroups.com
[Discussion teleported from bug tracker:]

Okay, I realize this is not a discussion forum, so I apologize for all
this chattery business, but I'm wondering if I may ask just one more
question.

Being convinced that I should do the upgrade this weekend, I'm reading
up on issues, e.g. the dedup faq that is linked to under the release
notes for 0.6.9. I'm however confused as to how I can know which fixes
are included with the zfs-fuse. The version semantics used for the
Solaris stuff are the developer builds, like snv132 (i.e. that one
must've made it into the latest osol dev iso which is 134 AFAIR), while
zfs-fuse version of course are completely different. Only the "pool
version" is reported, e.g. 23 for 0.6.9, but I have no clue if that
would mean that the issues listed under the dedup faq apply to this
version of zfs-fuse or not.

Hope I'm not being too clingy here :)

(D.S.B.)

sgheeren

unread,
Aug 13, 2010, 8:55:52 AM8/13/10
to zfs-...@googlegroups.com
On 08/13/2010 02:54 PM, D.S.B. wrote:
> Hope I'm not being too clingy here :)
>
> (D.S.B.)
>
>
Only a bit :)

You're right that there is a bit of a lack of transparency as to what
upstream versions have been integrated. We currently don't keep an
organized list of those. Last time I tried to continue the work of
Emmanuel, who graciously does all the work for us there, I got tangled
up in the same questions.

The best you can do, is inspect the git log (e.g.
http://gitweb.zfs-fuse.net/?p=official;a=shortlog) and look for entries
starting with 'hg commit')
These refer to numbers that can be googled into solaris's bug trackers

That all being said, I have a distinct recollection of Emmanuel
confirming that the important Dedup fixes should have been merged.
Then again, my personal preference is to _not_ use dedup because of
severe performance degradation (probably even more so on linux) and the
fact that it only really works for (very) specific purposes.

All my purposes are best dealt with by using snapshot/clone/promote.

I'd recommend starting with dedup on a fresh, small pool. There is no
reverting the enabling of the DDT once it has ever been enabled on your
pool. Caveat emptor.

Seth

Emmanuel Anne

unread,
Aug 13, 2010, 9:52:45 AM8/13/10
to zfs-...@googlegroups.com
I have looked for tags in the onnv repository. Here is the results :
hg tags
tip                            13018:1ff0c65b2b90
onnv_146                       13000:51b1767a74cb
onnv_145                       12866:87e07d18c459
onnv_144                       12779:96016f1d9837
onnv_143                       12671:2fee57289adb
onnv_142                       12581:18307efc4636
onnv_141                       12488:810a15c88f06
onnv_140                       12378:fd645929e06e
onnv_139                       12265:f199783d527a
onnv_138                       12149:607008ac563e
onnv_137                       12039:4ba188c68c06
onnv_136                       11930:e86ade140716
onnv_135                       11836:8c4dbbb43e4c
onnv_134                       11666:76124d955ac7
...

0.6.9 includes hg commit 11938, so I guess the closest tag it has is onnv_136.
Now notice that these tags are related to opensolaris revisions, they are not specific to zfs. Here for example, this onnv_136 is at commit 11930, but we don't have the commit 11930 in the zfs tree, we take only a subset of these commits.

Do you think it would be a good idea to import these tags in the git repository ? I am not sure about that...

2010/8/13 sgheeren <sghe...@hotmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/



--
zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

sgheeren

unread,
Aug 13, 2010, 10:04:50 AM8/13/10
to zfs-...@googlegroups.com
On 08/13/2010 03:52 PM, Emmanuel Anne wrote:
I have looked for tags in the onnv repository. Here is the results :
hg tags
tip                            13018:1ff0c65b2b90
onnv_146                       13000:51b1767a74cb
onnv_145                       12866:87e07d18c459
onnv_144                       12779:96016f1d9837
onnv_143                       12671:2fee57289adb
onnv_142                       12581:18307efc4636
onnv_141                       12488:810a15c88f06
onnv_140                       12378:fd645929e06e
onnv_139                       12265:f199783d527a
onnv_138                       12149:607008ac563e
onnv_137                       12039:4ba188c68c06
onnv_136                       11930:e86ade140716
onnv_135                       11836:8c4dbbb43e4c
onnv_134                       11666:76124d955ac7
...

0.6.9 includes hg commit 11938, so I guess the closest tag it has is onnv_136.
Thanks for this. That is useful information


Now notice that these tags are related to opensolaris revisions, they are not specific to zfs. Here for example, this onnv_136 is at commit 11930, but we don't have the commit 11930 in the zfs tree, we take only a subset of these commits.

Do you think it would be a good idea to import these tags in the git repository ? I am not sure about that...

Yess, but only IFF they are fully merged.

Daniel Smedegaard Buus

unread,
Aug 13, 2010, 11:30:02 AM8/13/10
to zfs-fuse
Thank you both for responding to this. You truly are gentlemen :)

On a note, I actually reconsidered the whole dedup affair when on my
bicycle heading for home (especially considering how buggy it still
seems to be), and really I don't see much use for it in my scenario in
the long run. There are a couple of cases in particular where I really
want deduplication, namely for emulation ROMs, such as Amiga/C64/PSP
images which often have tons of duplicated sectors. But, for this I
was already planning on using cromfs which also offers LZMA
compression to give extremely impressive results (better than 7-zip in
ultra mode with as large a block size as my 4G of memory allows, in
case you're wondering).

Whatever you settle on in regards to git tags vs. osol revisisions, I
for one would really like to be clingy just long enough to get some
more transparency, as you say ;) Hehe

I'm not much of a source code reader once I leave work, but I do try
to track issues with stuff like ZFS when I'm considering upgrading,
which involves checking with (Open)Solaris bug trackers and with zfs-
fuse.net. So any kind of (web) transparency that allows people like me
to see which issues have been resolved where would be much
appreciated.

Of course, beggars can't be choosers, and this beggar is mainly
extremely grateful for the work you're doing!

So once again, thanks for all the help, and have a great weekend! I
have a sixpack of beer, an SFV checker script to write, and about 8TB
of data to organize :D

Cheers,
Daniel (aka DSB)

Emmanuel Anne

unread,
Aug 13, 2010, 11:48:03 AM8/13/10
to zfs-...@googlegroups.com
There are not any bugs known in dedup for now (since 0.6.9), but yes if you can avoid it, then avoid it.
And notice that dedup works on blocks boundaries, so if you use compression on your disk images you will probably loose alignement of the same blocks, and so dedup will become mostly useless.

Thanks for the good words anyway, I think I'll go and have a beer now, good idea ! :)

2010/8/13 Daniel Smedegaard Buus <danie...@gmail.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,
Aug 13, 2010, 11:50:44 AM8/13/10
to zfs-...@googlegroups.com
On 08/13/2010 05:48 PM, Emmanuel Anne wrote:
> And notice that dedup works on blocks boundaries, so if you use
> compression on your disk images you will probably loose alignement of
> the same blocks, and so dedup will become mostly useless.
This seems disputable to me. Of course ZFS could compress logical blocks
of fixed size (what would they have recordsize for e.g.) and hash the
logicalblock before compression? This is an interesting question for
zfs-discuss, perhaps

Emmanuel Anne

unread,
Aug 13, 2010, 12:57:05 PM8/13/10
to zfs-...@googlegroups.com
Well if you are lucky there might still be enough alignement left for dedup to work. If you are unlucky and there is a difference in the 1st sector which doesn't get compressed at the same ratio, you will loose all the alignement for the whole image after that.

Maybe it works better if you use internal zfs compression instead, but I am not even sure about that... Maybe yes.

2010/8/13 sgheeren <sghe...@hotmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Daniel Smedegaard Buus

unread,
Aug 13, 2010, 1:04:19 PM8/13/10
to zfs-fuse
On Aug 13, 5:48 pm, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> There are not any bugs known in dedup for now (since 0.6.9), but yes if you
> can avoid it, then avoid it.

Well, AFAICT, there's still this one? (though I'm not sure I agree
with the "bricking" part?)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924824

> And notice that dedup works on blocks boundaries, so if you use compression
> on your disk images you will probably loose alignement of the same blocks,
> and so dedup will become mostly useless.
>
> Thanks for the good words anyway, I think I'll go and have a beer now, good
> idea ! :)
>

LOL :D And here, three brown ales later, damn that BASH variable scope
when doing for...in! :O

> 2010/8/13 Daniel Smedegaard Buus <danielb...@gmail.com>
> > To visit our Web site, click onhttp://zfs-fuse.net/

sgheeren

unread,
Aug 13, 2010, 1:28:09 PM8/13/10
to zfs-...@googlegroups.com
On 08/13/2010 07:04 PM, Daniel Smedegaard Buus wrote:
> On Aug 13, 5:48 pm, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
>
>> There are not any bugs known in dedup for now (since 0.6.9), but yes if you
>> can avoid it, then avoid it.
>>
> Well, AFAICT, there's still this one? (though I'm not sure I agree
> with the "bricking" part?)
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924824
>
>
Yeah that one is still a pain - seeing that the state is not 'fix
released'. It will not brick your system, but it will prevent
filesystems from coming online (being mountable/mounted) after an
import. I.e. you can (slooooooooowly) keep using your system but
(a) a lot of zpool operations will be blocked
(b) you must not reboot/re-import because on import all operations on
the pool will be suspended until the work is completed.

Not nice (TM) but slightly less invasive in linux (unless you have root
on zfs-fuse, which _can_ be done but is not recommended)

Reply all
Reply to author
Forward
0 new messages