Dedup in ZFS

24 views
Skip to first unread message

Ricardo M. Correia

unread,
Nov 2, 2009, 5:07:00 AM11/2/09
to zfs-...@googlegroups.com
Hi,

BTW, since I'm posting to the list, I might as well let you know that
ZFS 'dedup' functionality was just integrated into build 128 of
OpenSolaris (i.e. the current build, which closes on Nov 9).

Cheers,
Ricardo


Emmanuel Anne

unread,
Nov 2, 2009, 5:30:00 AM11/2/09
to zfs-...@googlegroups.com
Hi,

You seem to be the only one who knows how to merge the source from opensolaris with what we have to upgrade zfs-fuse... so I am wondering, do you plan to continue to do it finally, if not it would be interesting that you explain how to do that...

2009/11/2 Ricardo M. Correia <Ricardo....@sun.com>

Ricardo M. Correia

unread,
Nov 2, 2009, 5:34:10 AM11/2/09
to zfs-...@googlegroups.com

Well, might as well let you know that in build 128, it was also
integrated a new 'force' option to 'zpool import' and 'zpool clear' that
automatically does a rollback to an older uberblock in case the pool was
corrupted due to disk drivers not respecting the 'flush cache' command.

Cheers,
Ricardo


Ricardo M. Correia

unread,
Nov 2, 2009, 5:45:36 AM11/2/09
to zfs-...@googlegroups.com
Hi Emmanuel,

I believe there are also other people who in the past merged the ZFS
sources from upstream into zfs-fuse, but in any case, I am attaching my
scripts here.

These scripts copy the ZFS sources from OpenSolaris into the zfs-fuse
tree.

Note that these scripts haven't been updated in more than a year, so I'm
sure they are missing some new ZFS files. Also, the 'umem' sources are
not supposed to be copied from the OpenSolaris mercurial tree, they are
supposed to be copied from the Linux libumem port.

The fixfiles.py is a python script which is run at the end and is
supposed to remove a few unimportant pragmas from the source code and
all the comments from OpenSolaris assembly sources. This was done
because Linux's gcc/gas doesn't like them. The script is a bit of an
abomination in terms of text processing, but hey, it worked for me :p

Feel free to use them however you like.

Cheers,
Ricardo
config.rc
copysolaris.sh
copyumem.sh
fixfiles.py

Ricardo M. Correia

unread,
Nov 2, 2009, 6:01:35 AM11/2/09
to zfs-...@googlegroups.com
On Seg, 2009-11-02 at 11:30 +0100, Emmanuel Anne wrote:
> You seem to be the only one who knows how to merge the source from
> opensolaris with what we have to upgrade zfs-fuse... so I am
> wondering, do you plan to continue to do it finally, if not it would
> be interesting that you explain how to do that...

I forgot to say I don't have any plans to continue merging the sources
from OpenSolaris, but if anyone needs help doing it, feel free to ask me
questions.

Cheers,
Ricardo


Emmanuel Anne

unread,
Nov 2, 2009, 8:36:50 AM11/2/09
to zfs-...@googlegroups.com
Wow, thanks for all that, it doesn't seem extremely easy, but I couldn't expect anything better !
Ok, I'll try all this later.
I guess you also subscribed to a mailing list to know when there is a release in their tree and when it's a good idea to try to resync the sources ?

2009/11/2 Ricardo M. Correia <Ricardo....@sun.com>

Emmanuel Anne

unread,
Nov 2, 2009, 8:53:42 AM11/2/09
to zfs-...@googlegroups.com
And I'll complete your mail by saying that apparently I must run first the hg command given here :
http://hub.opensolaris.org/bin/view/Community+Group+tools/hg_help
to get the initial onv-gate directory - that's
hg clone ssh://an...@hg.opensolaris.org/hg/onnv/onnv-gate
Ok, ok, all this is new for me...

2009/11/2 Emmanuel Anne <emmanu...@gmail.com>

Emmanuel Anne

unread,
Nov 2, 2009, 11:30:42 AM11/2/09
to zfs-...@googlegroups.com
There seems to be a lot of changes since your latest tarball (2009-06-03, only 5 months ago ?).
some files were moved to a new dir (/usr/src/common/zfs), which is not handled yet.

Oh well, at least all this looks very interesting, I'll spend more time on it later...
Thanks again !

2009/11/2 Emmanuel Anne <emmanu...@gmail.com>

Rudd-O

unread,
Nov 3, 2009, 1:04:34 PM11/3/09
to zfs-fuse
If I may interject here.

I think the best way to do a clean merge, WITHOUT losing our changes
(or, at least, with a possibility to evaluate our changes), is
actually to:

1. create a new branch, starting at the commit when we merged the
sources. at this point, our tree is guaranteed to ONLY contain the
ONNV src.
2. in that branch, put the new files there. commit the branch.
4. merge the branch into trunk. this may be problematic since we have
done some changes in the OSOL source directories, but AT LEAST with
this method we have a guarantee that we can review each one of our
changes we have made in the past, and see if they make any sense in
the context of the new sources, rite?

I can work on this with whomever feels like learning / teaching /
having a blast. We can share a screen session anytime. Whoever wants
to "get his patch" on, let me know!


On Nov 2, 8:30 am, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> There seems to be a lot of changes since your latest tarball (2009-06-03,
> only 5 months ago ?).
> some files were moved to a new dir (/usr/src/common/zfs), which is not
> handled yet.
>
> Oh well, at least all this looks very interesting, I'll spend more time on
> it later...
> Thanks again !
>
> 2009/11/2 Emmanuel Anne <emmanuel.a...@gmail.com>
>
> > And I'll complete your mail by saying that apparently I must run first the
> > hg command given here :
> >http://hub.opensolaris.org/bin/view/Community+Group+tools/hg_help
> > to get the initial onv-gate directory - that's
>
> > hg clone ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate
>
> > Ok, ok, all this is new for me...
>
> > 2009/11/2 Emmanuel Anne <emmanuel.a...@gmail.com>
>
> > Wow, thanks for all that, it doesn't seem extremely easy, but I couldn't
> >> expect anything better !
> >> Ok, I'll try all this later.
> >> I guess you also subscribed to a mailing list to know when there is a
> >> release in their tree and when it's a good idea to try to resync the sources
> >> ?
>
> >> 2009/11/2 Ricardo M. Correia <Ricardo.M.Corr...@sun.com>

Emmanuel Anne

unread,
Nov 3, 2009, 1:19:20 PM11/3/09
to zfs-...@googlegroups.com
On my side I have started to look for some ways to make git and hg to work together, and surprisingly there doesn't seem to be a lot of programs to do that.
I found an hg plugin to clone an hg repo into a git repo, but that's not what I want.
I'd like to be able to convert the commits from the opensolaris hg repo to some git commits here, keeping at least the dates and the authors.

hg has a few functions to export commits (hg export seems to do a nice work), but I found no way to import this into git, even with a manual conversion... Using git-fast-import seems quite complicated for that, if someone has an idea to do that, it would be good... The idea is to be able to follow the changes in the opensolaris repo more easily and to merge the changes more easily after that (just this one merge is going to be hard, I'd like the following merges to be easier if possible even if I spend a lot of time on this one).

2009/11/3 Rudd-O <rud...@rudd-o.com>

dev...@web.de

unread,
Nov 3, 2009, 2:17:09 PM11/3/09
to zfs-...@googlegroups.com
>On my side I have started to look for some ways to make git and hg to
> work together, and surprisingly there doesn't seem to be a lot of
> programs to do that.
> I found an hg plugin to clone an hg repo into a git repo, but that's
> not what I want.

do you refer to this one: http://hg-git.github.com/ ?
anyway, i would probably bring that up on git mailing list at g...@vger.kernel.org

regards
roland
______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

Emmanuel Anne

unread,
Nov 3, 2009, 2:31:04 PM11/3/09
to zfs-...@googlegroups.com
Yep it was this one.
I am wondering if it wouldn't be easier to switch back to hg, just to be able to communicate more easily with the solaris repo... Anyway, too early to tell, there are probably other possibilities...

2009/11/3 <dev...@web.de>

Rudd-O

unread,
Nov 3, 2009, 3:07:49 PM11/3/09
to zfs-fuse
That sounds more reasonable to me. I will look into it.

Rudd-O

unread,
Nov 3, 2009, 3:08:40 PM11/3/09
to zfs-fuse
I am not sure. HG is considerably less usable than Git for many
things -- you cannot do rebase easily, reorder commits locally, a
bunch of stuff that really gets the project history right. So far
what have you found regarding injecting commits from their repo into
ours?

On Nov 3, 11:31 am, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> Yep it was this one.
> I am wondering if it wouldn't be easier to switch back to hg, just to be
> able to communicate more easily with the solaris repo... Anyway, too early
> to tell, there are probably other possibilities...
>
> 2009/11/3 <devz...@web.de>

Rudd-O

unread,
Nov 3, 2009, 3:10:37 PM11/3/09
to zfs-fuse
hg-git sounds viable. instead of pulling from the git repo, we'd
check out the hg repo and push into the git repo, then push to the
public git repo. the only thing that is KEY is to start pushing from
the revision we last check in, and to push it into a branch so we can
merge that branch into trunk. we do not want to push the commits
directly into trunk because we'd then pile up their changes atop OURS
directly, and that can bring history problems.

On Nov 3, 11:31 am, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> Yep it was this one.
> I am wondering if it wouldn't be easier to switch back to hg, just to be
> able to communicate more easily with the solaris repo... Anyway, too early
> to tell, there are probably other possibilities...
>
> 2009/11/3 <devz...@web.de>

Will Ashford

unread,
Nov 3, 2009, 3:42:13 PM11/3/09
to zfs-...@googlegroups.com
On Tue, Nov 3, 2009 at 12:08 PM, Rudd-O <rud...@rudd-o.com> wrote:
>
> I am not sure.  HG is considerably less usable than Git for many
> things -- you cannot do rebase easily, reorder commits locally, a
> bunch of stuff that really gets the project history right.

In the interest of combating FUD I feel obliged to point out that Hg
can do all of these things by enabling a few plugins that are part of
the standard distribution and merely disabled by default. Git and Hg
are actually almost identical in their underlying design as a result
of parallel evolution, most of the confusion about which is "better"
stems from the fact that they use different (and conflicting) words to
describe the same thing. Which one you use, of course, remains a
matter of personal preference.

Will

Emmanuel Anne

unread,
Nov 3, 2009, 4:36:02 PM11/3/09
to zfs-...@googlegroups.com
Yes well anyway, since I wanted to experiment with all this stuff anyway, I have started with something slow, but which works.
For now the principle is :
hg log -g -p file to see what happened for a particular file
then
hg export -g id > export_file
where id is the commit id we are interested in.
Then it produces a very nice file, except that it's totally unusable by git as it is...
So...
I used a perl script to :
 - generate a nice commit message from the info there
 - extract the author
 - save the diff to a separate file (and convert the file names from their repo to ours)
 - apply the diff
then print out the commands to either commit everything or revert to the previous state and you copy and paste what you want depending on how the patching worked.

It would be cool to run this thing automatically but for now it's totally impossible, I even found the 1st commit which was already partially merged in our repository (but only for some files, not all of them !).

So it's paintfully slow, but it's interesting.
Of course I commit all this to a separate branch, new-solaris
You can browse the 2 commits commited so far there :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
it compiles, I didn't test it, and it's considered highly experimental for now of course, but it's interesting.

From the commit messages, there were some quite serious bug fixes, and it might fix some problems we had.
Anyway if someone has an idea to do the same thing as that but more easily, I am ready to test it, otherwise, I'll continue with this later !

2009/11/3 Will Ashford <ashin...@gmail.com>

Mike Hommey

unread,
Nov 3, 2009, 5:12:57 PM11/3/09
to zfs-...@googlegroups.com
On Tue, Nov 03, 2009 at 07:19:20PM +0100, Emmanuel Anne wrote:
> On my side I have started to look for some ways to make git and hg to work
> together, and surprisingly there doesn't seem to be a lot of programs to do
> that.
> I found an hg plugin to clone an hg repo into a git repo, but that's not
> what I want.
> I'd like to be able to convert the commits from the opensolaris hg repo to
> some git commits here, keeping at least the dates and the authors.
>
> hg has a few functions to export commits (hg export seems to do a nice
> work), but I found no way to import this into git, even with a manual
> conversion... Using git-fast-import seems quite complicated for that, if
> someone has an idea to do that, it would be good... The idea is to be able
> to follow the changes in the opensolaris repo more easily and to merge the
> changes more easily after that (just this one merge is going to be hard, I'd
> like the following merges to be easier if possible even if I spend a lot of
> time on this one).

There is work under way to have hg remotes support added to git. The
premises to that work have been posted on the git mailing list a few
days ago.

There is also http://repo.or.cz/w/hg2git.git , but with the above, it is
due to disappear.

Mike

sehe

unread,
Nov 16, 2009, 6:02:03 AM11/16/09
to zfs-fuse
Hi everyone I'm interested in starting this approach.

To be honest, I find it very hard to review what Emmanuel has been
merging, especially how conflicts have been handled; this is partly
due to the 'other' things that have happened, but in a major way
because all the commits have been 'handled' one-by-one without any
guarantee as to what 'handled' means.

My experience tells me this is inevitable going to lead to errors. It
also leads to a lot of good insight of course. I might do the 'other'
approach Rudd-O had proposed and we could compare the results (no
pissing contest, just a QA measure). Apart from that, Rudd-O's
approach clearly has the benefit of being much easier to track, repeat
and safer in so many ways.

What I have now is
1. onnv_gate tip (11066:cebb50cbe4f9 Friday Nov 13th)
2. latest git repo(s)
3. wizy's 'latest'? hg repo: 375:008c531499cd (Oct 30th 2008)
4. wizy's onnv->zfs-fuse merge scripts from this thread

As far as I can see (log/CHANELOG) the latest merge Riccardo (wizy)
had done was

2008-09-12 - Release 0.5.0
--------------------------------------------------
* Updated ZFS code to Nevada build 98.

What do you guys propose would be a good branch point to start merging
from? Perhaps you can locate a trustworthy revision in your current
Rudd-O git repo?

PS. I think these scripts (at least the ones actively used/developed)
should absolutely go into our repo (in a script or tools dir) as we
cannot really afford to have everyone hunt them done on this group and
get an old version anyway...

Seth

Emmanuel Anne

unread,
Nov 16, 2009, 6:30:26 AM11/16/09
to zfs-...@googlegroups.com
I used 2009-06-03 as a starting point (the last version Ricardo uploaded here was conveniently names with this date).

The idea I had was also to check the differences between the onnv and ours, to see along the way if there were unexpected differences. There were a few, like some parts of some sources which didn't match the 2009-06-03 version, but very few. There are very few surprises finally, and very few conflicts. The handling idea is to always stay as close as the onnv sources as possible so that future updates are as easy as possible, unless of course there is a clear impossibility.

Now it might be a good idea to try your other approach.
The problem is that the sources changed a lot between 06-03 and now, so if you try to apply our diff from say 06-03 to the latest onnv version, you'll probably bump into a lot of problems, and they'll be much harder to identify, it's easier to do it progressively. The hardest part is probably to get a starting diff.
Actually as I already said, the biggest problem is to handle external libs used by sun, they love to add external stuff all the time, and I wouldn't like to try to handle this again !

Good to know there is one more dev, I was starting to feel lonely ! ;-)

2009/11/16 sehe <sghe...@hotmail.com>

sghe...@hotmail.com

unread,
Nov 16, 2009, 6:37:34 AM11/16/09
to zfs-...@googlegroups.com
Emmanuel Anne wrote:
>
> Good to know there is one more dev, I was starting to feel lonely ! ;-)
Just remember, I'll still be the dev-with-no-time-to-spend-but-in-denial

Reply all
Reply to author
Forward
0 new messages