Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Git repository containing the entire Mozilla history

188 views
Skip to first unread message

Ehsan Akhgari

unread,
Aug 23, 2011, 1:57:22 PM8/23/11
to dev-pl...@lists.mozilla.org, Blake Winton, Chris Double
I'm happy to announce https://github.com/ehsan/mozilla-history, which is
a Git repository containing *the entire* history of the Mozilla project.
It is very useful to get blames which don't end at the Mercurial
migration date, and go through all of the history. I hope that it would
be useful to make our developers more productive when looking at our
source code.

I was planning to write a blog post about this but I didn't have time,
so I just posted something here:
<https://github.com/ehsan/mozilla-history-tools/blob/master/initial_conversaion/README.md>.
I will try to set up something to keep this repository up to date with
our changes on mozilla-central.

Cheers,
Ehsan

Zack Weinberg

unread,
Aug 23, 2011, 4:43:06 PM8/23/11
to

Hm! Could we back-convert this to Mercurial and replace m-c with the
result? (I still very much prefer the Mercurial UX, and this would mean
that we got the full blame on hg.m.o.)

zw

Bobby Holley

unread,
Aug 23, 2011, 6:15:29 PM8/23/11
to Zack Weinberg, dev-pl...@lists.mozilla.org
On Tue, Aug 23, 2011 at 1:43 PM, Zack Weinberg <za...@panix.com> wrote:

> Hm! Could we back-convert this to Mercurial and replace m-c with the
> result? (I still very much prefer the Mercurial UX, and this would mean
> that we got the full blame on hg.m.o.)
>

I don't think so. Every mercurial commit is a cryptographic function of its
ancestors, so all of the SHA1 revision ids would change. This would
invalidate dependent repositories and any hg.m.o links, and there's quite a
few of those strewn about.

Someone could, however, create a separate hg clone of mozilla-history
without too much trouble.

-bholley

Paul Biggar

unread,
Aug 23, 2011, 6:40:01 PM8/23/11
to Bobby Holley, dev-pl...@lists.mozilla.org, Zack Weinberg
On Tue, Aug 23, 2011 at 15:15, Bobby Holley <bobby...@gmail.com> wrote:

> I don't think so. Every mercurial commit is a cryptographic function of its
> ancestors, so all of the SHA1 revision ids would change. This would
> invalidate dependent repositories and any hg.m.o links, and there's quite a
> few of those strewn about.

I think it should be possible to insert an extra commit, just before
the existing hg commit 0, such that the new version of commit 0 has
the same hash as the old one (though perhaps we don't want to go down
this route).

Paul

--
Paul Biggar
Compiler Geek
pbi...@mozilla.com
@paulbiggar

Ehsan Akhgari

unread,
Aug 23, 2011, 7:21:03 PM8/23/11
to Paul Biggar, Zack Weinberg, dev-pl...@lists.mozilla.org, Bobby Holley
On 11-08-23 6:40 PM, Paul Biggar wrote:
> On Tue, Aug 23, 2011 at 15:15, Bobby Holley<bobby...@gmail.com> wrote:
>
>> I don't think so. Every mercurial commit is a cryptographic function of its
>> ancestors, so all of the SHA1 revision ids would change. This would
>> invalidate dependent repositories and any hg.m.o links, and there's quite a
>> few of those strewn about.
>
> I think it should be possible to insert an extra commit, just before
> the existing hg commit 0, such that the new version of commit 0 has
> the same hash as the old one (though perhaps we don't want to go down
> this route).

If I remember correctly, a revision's parent is part of the data used to
generate the SHA1 identifier for the revision. So reparenting a
revision changes its SHA1 (and consequently, all of the descendents' too).

Ehsan

Joshua Cranmer

unread,
Aug 23, 2011, 7:31:37 PM8/23/11
to
On 8/23/2011 5:40 PM, Paul Biggar wrote:
> On Tue, Aug 23, 2011 at 15:15, Bobby Holley<bobby...@gmail.com> wrote:
>
>> I don't think so. Every mercurial commit is a cryptographic function of its
>> ancestors, so all of the SHA1 revision ids would change. This would
>> invalidate dependent repositories and any hg.m.o links, and there's quite a
>> few of those strewn about.
> I think it should be possible to insert an extra commit, just before
> the existing hg commit 0, such that the new version of commit 0 has
> the same hash as the old one (though perhaps we don't want to go down
> this route).
As far as I know, SHA-1 is still considered secure against even
collision attacks, let alone reproducing a message for a given SHA-1 id.
If we could insert such a commit, I think the cryptography community
would be ecstatic to hear about it.

Nicholas Nethercote

unread,
Aug 23, 2011, 10:49:02 PM8/23/11
to Joshua Cranmer, dev-pl...@lists.mozilla.org
On Wed, Aug 24, 2011 at 9:31 AM, Joshua Cranmer <Pidg...@verizon.net> wrote:
>
> As far as I know, SHA-1 is still considered secure against even collision
> attacks, let alone reproducing a message for a given SHA-1 id. If we could
> insert such a commit, I think the cryptography community would be ecstatic
> to hear about it.

Someone should file a good-first-bug.

Nick

Steve Fink

unread,
Aug 24, 2011, 1:33:07 AM8/24/11
to Ehsan Akhgari, Paul Biggar, Zack Weinberg, dev-pl...@lists.mozilla.org, Bobby Holley

What about taking the current m-c repo and a "full history" repo that
ends up with exactly the same bits for the source tree but gets there a
different way (adding in the old history), then merging the two
together? People with a pre-merge m-c checkout would need to merge their
changes in instead of simply committing them, but as soon as everyone
rebased/merged on top of the new repo we'd be ok.

The trick would be to do the merge in the right direction so that hg
annotate/blame would follow the "full history" fork of the graph, not
the current truncated one.

Am I talking sense? It sounds plausible to me, but I'm probably missing
something obvious. I don't know how hg bisect would deal with not having
a common ancestor before the merge, but it seems like you'd have to give
it a starting point on one side or the other anyway.

Steve Fink

unread,
Aug 24, 2011, 1:33:07 AM8/24/11
to Ehsan Akhgari, dev-pl...@lists.mozilla.org, Paul Biggar, Zack Weinberg, Bobby Holley
On 08/23/2011 04:21 PM, Ehsan Akhgari wrote:

What about taking the current m-c repo and a "full history" repo that

Mike Hommey

unread,
Aug 24, 2011, 3:21:29 AM8/24/11
to Steve Fink, Zack Weinberg, Ehsan Akhgari, dev-pl...@lists.mozilla.org, Paul Biggar, Bobby Holley
On Tue, Aug 23, 2011 at 10:33:07PM -0700, Steve Fink wrote:
> On 08/23/2011 04:21 PM, Ehsan Akhgari wrote:
> >On 11-08-23 6:40 PM, Paul Biggar wrote:
> >>On Tue, Aug 23, 2011 at 15:15, Bobby Holley<bobby...@gmail.com> wrote:
> >>
> >>>I don't think so. Every mercurial commit is a cryptographic function
> >>>of its
> >>>ancestors, so all of the SHA1 revision ids would change. This would
> >>>invalidate dependent repositories and any hg.m.o links, and there's
> >>>quite a
> >>>few of those strewn about.
> >>
> >>I think it should be possible to insert an extra commit, just before
> >>the existing hg commit 0, such that the new version of commit 0 has
> >>the same hash as the old one (though perhaps we don't want to go down
> >>this route).
> >
> >If I remember correctly, a revision's parent is part of the data used to
> >generate the SHA1 identifier for the revision. So reparenting a revision
> >changes its SHA1 (and consequently, all of the descendents' too).

More than that, changes are also part of the data used to generate the
SHA1 identifier.

> What about taking the current m-c repo and a "full history" repo
> that ends up with exactly the same bits for the source tree but gets
> there a different way (adding in the old history), then merging the
> two together? People with a pre-merge m-c checkout would need to
> merge their changes in instead of simply committing them, but as
> soon as everyone rebased/merged on top of the new repo we'd be ok.

I think that would be a terrible idea. Most operations that deal with
history are already unbearably slow with the current 70000 changeset
(being an order of magnitude slower than git *is* unbearable). Adding
more changesets is not going to help.

Another reason is that mercurial is really bad at storing these things,
and a clone would take a whole lot more space than it currently does.
For instance, my mozilla-central .hg directory is currently already
bigger than ehsan's mozilla-history.git repository...

Mike

Joshua Cranmer

unread,
Aug 24, 2011, 9:18:46 AM8/24/11
to
On 8/24/2011 12:33 AM, Steve Fink wrote:
> What about taking the current m-c repo and a "full history" repo that
> ends up with exactly the same bits for the source tree but gets there
> a different way (adding in the old history), then merging the two
> together? People with a pre-merge m-c checkout would need to merge
> their changes in instead of simply committing them, but as soon as
> everyone rebased/merged on top of the new repo we'd be ok.
>
> The trick would be to do the merge in the right direction so that hg
> annotate/blame would follow the "full history" fork of the graph, not
> the current truncated one.
>
> Am I talking sense? It sounds plausible to me, but I'm probably
> missing something obvious. I don't know how hg bisect would deal with
> not having a common ancestor before the merge, but it seems like you'd
> have to give it a starting point on one side or the other anyway.

How long does it take you to hg clone mozilla-central from scratch? For
me, on my VM, it takes a few minutes. Now triple that length of time
(assuming mozilla-history is ~2x the size of m-c right now). The time
spent doing simple hg operations is bad enough right now; your proposal
makes it untenably bad. I've seen proposals in hg to avoid checking out
full history, and a few partial implementations... maybe we should get
somebody to actually implement those.

Cedric Vivier

unread,
Aug 24, 2011, 10:19:16 AM8/24/11
to Ehsan Akhgari, Blake Winton, Chris Double, dev-pl...@lists.mozilla.org
Great stuff.

It would be nice if you could push the mapfile as well in a separate
branch so that anyone can possibly set up hg-git with it :)
[eg. to keep it updated]


Btw, I've also pushed some time ago a git mirror, which has the
particularity of not only cloning m-c but all of our main Mercurial
repositories (m-c, m-i, fx-team, devtools, ux and so on) within one
single repo using git's lightweight branches:

https://github.com/neonux/mozilla-all

It might be interesting to people who prefer day-to-day git as well
but do not usually work on m-c directly.


Cheers,


On Wed, Aug 24, 2011 at 01:57, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> I'm happy to announce https://github.com/ehsan/mozilla-history, which is a
> Git repository containing *the entire* history of the Mozilla project.  It
> is very useful to get blames which don't end at the Mercurial migration
> date, and go through all of the history.  I hope that it would be useful to
> make our developers more productive when looking at our source code.
>
> I was planning to write a blog post about this but I didn't have time, so I
> just posted something here:
> <https://github.com/ehsan/mozilla-history-tools/blob/master/initial_conversaion/README.md>.
>  I will try to set up something to keep this repository up to date with our
> changes on mozilla-central.
>

> Cheers,
> Ehsan
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Kyle Huey

unread,
Aug 24, 2011, 10:21:19 AM8/24/11
to Joshua Cranmer, dev-pl...@lists.mozilla.org
On Wed, Aug 24, 2011 at 9:18 AM, Joshua Cranmer <Pidg...@verizon.net>wrote:

> How long does it take you to hg clone mozilla-central from scratch? For me,
> on my VM, it takes a few minutes. Now triple that length of time (assuming
> mozilla-history is ~2x the size of m-c right now). The time spent doing
> simple hg operations is bad enough right now; your proposal makes it
> untenably bad. I've seen proposals in hg to avoid checking out full history,
> and a few partial implementations... maybe we should get somebody to
> actually implement those.


Why can't we get somebody to just fix Mercurial to not be slow in the first
place?

- Kyle

Neil

unread,
Aug 24, 2011, 10:37:58 AM8/24/11
to
Kyle Huey wrote:

>Why can't we get somebody to just fix Mercurial to not be slow in the first place?
>
>

Why are there 70000 changesets anyway? Is that typical?

--
Warning: May contain traces of nuts.

Robert Kaiser

unread,
Aug 24, 2011, 10:38:38 AM8/24/11
to
Zack Weinberg schrieb:

> Hm! Could we back-convert this to Mercurial and replace m-c with the
> result? (I still very much prefer the Mercurial UX, and this would mean
> that we got the full blame on hg.m.o.)

One significant problem I'm seeing there is that an m-c clone is already
pretty huge (half a GB for the .hg data only) and with that would just
grow way more huge. That was one of the major reasons why we even
started with zero when switching to Mercurial.

If "shallow clones" (i.e. with reduced history) would work, that would
be easier but we are not there yet, AFAIK.

git has a more efficient and compact storage backend but I guess that
mozilla-history clone is still pretty large.

I actually still like the bonsai blame UI over the hgweb and the github
one, though at least the github one, as space-inefficient it is on
display, provides a bit more data readily visible about the commits that
changed certain lines.

Robert Kaiser.


--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Robert Kaiser

unread,
Aug 24, 2011, 10:41:36 AM8/24/11
to
Kyle Huey schrieb:

> Why can't we get somebody to just fix Mercurial to not be slow in the first
> place?

Because then it would just be a git implementation in python (or maybe
it's even python that contributes to it being slow for such large
repos). ;-)

The reason why git is so much faster is very probably mainly because its
(history) storage backend is so much more efficient.

Robert Kaiser

Zack Weinberg

unread,
Aug 24, 2011, 11:02:54 AM8/24/11
to
On 2011-08-24 7:41 AM, Robert Kaiser wrote:
> Kyle Huey schrieb:
>> Why can't we get somebody to just fix Mercurial to not be slow in the
>> first
>> place?
>
> Because then it would just be a git implementation in python (or maybe
> it's even python that contributes to it being slow for such large
> repos). ;-)

Hey, I said I didn't like git's *user interface*, not its speed or disk
usage :) Give me Mercurial's CLI and extension hooks on top of Git's
storage model and I'll be happy.

(Caveat: I have read, but cannot presently find where I read, that Git
does not record branch information permanently in its revision history;
branches are apparently just ephemeral pointers to tip revisions. If
that's true, it might be a serious problem when doing archaeology.)

zw

Nathan Froyd

unread,
Aug 24, 2011, 11:40:11 AM8/24/11
to dev-pl...@lists.mozilla.org
On 8/24/2011 10:37 AM, Neil wrote:
> Why are there 70000 changesets anyway? Is that typical?

For a project of Mozilla's size and age, 70k changesets is on the small
side. As rough points of comparison, GCC's repository is almost 180k
revisions and LLVM's repository (which hosts several projects) is
approaching 140k revisions.

-Nathan

Mike Hommey

unread,
Aug 24, 2011, 11:57:32 AM8/24/11
to Nathan Froyd, dev-pl...@lists.mozilla.org
On Wed, Aug 24, 2011 at 11:40:11AM -0400, Nathan Froyd wrote:
> On 8/24/2011 10:37 AM, Neil wrote:
> >Why are there 70000 changesets anyway? Is that typical?
>
> For a project of Mozilla's size and age, 70k changesets is on the
> small side. As rough points of comparison, GCC's repository is
> almost 180k revisions and LLVM's repository (which hosts several
> projects) is approaching 140k revisions.

The Linux kernel is 263K revisions, and only starts in 2005.

Mike

Hiroyuki Ikezoe

unread,
Aug 24, 2011, 6:46:32 PM8/24/11
to Ehsan Akhgari, Chris, Blake Winton, Double, dev-pl...@lists.mozilla.org
Wow! Wonderful job! I wanted something like this for a long time.

Do you have a plan to setup *comm-history* too?

--
hiro

smaug

unread,
Aug 25, 2011, 7:26:46 AM8/25/11
to
On 08/24/2011 06:02 PM, Zack Weinberg wrote:
> On 2011-08-24 7:41 AM, Robert Kaiser wrote:
>> Kyle Huey schrieb:
>>> Why can't we get somebody to just fix Mercurial to not be slow in the
>>> first
>>> place?
>>
>> Because then it would just be a git implementation in python (or maybe
>> it's even python that contributes to it being slow for such large
>> repos). ;-)
>
> Hey, I said I didn't like git's *user interface*, not its speed or disk
> usage :) Give me Mercurial's CLI and extension hooks on top of Git's
> storage model and I'll be happy.

And give me also CVS blame/bonsai like UI, then I'd be happy.

It is surprising that both hg and git are missing good UI for
blame/annotate.

Martijn

unread,
Aug 25, 2011, 8:49:47 AM8/25/11
to smaug, dev-pl...@lists.mozilla.org
On Thu, Aug 25, 2011 at 1:26 PM, smaug <sm...@welho.com> wrote:
> And give me also CVS blame/bonsai like UI, then I'd be happy.
>
> It is surprising that both hg and git are missing good UI for
> blame/annotate.

Yes, I'm missing that too.
With regression range finding, I need to see what files are being
changed. Current UI is not really helpful for that.

Regards,
Martijn

>>
>> (Caveat: I have read, but cannot presently find where I read, that Git
>> does not record branch information permanently in its revision history;
>> branches are apparently just ephemeral pointers to tip revisions. If
>> that's true, it might be a serious problem when doing archaeology.)
>>
>> zw
>

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

--
Martijn Wargers - Help Mozilla!
http://quality.mozilla.org/
http://wiki.mozilla.org/Mozilla_QA_Community
irc://irc.mozilla.org/qa - /nick mw22

Robert Kaiser

unread,
Aug 25, 2011, 9:47:41 AM8/25/11
to
Nathan Froyd schrieb:

> On 8/24/2011 10:37 AM, Neil wrote:
>> Why are there 70000 changesets anyway? Is that typical?
>
> For a project of Mozilla's size and age, 70k changesets is on the small
> side.

That's because in March of 2007, after roughly 9 years of Mozilla
history in CVS, we splitted everything other than the core platform and
Firefox into their separate repositories and started the new
mozilla-central repo in hg without any history.

Incidentally, this is the topic the OP started this thread with, as he
connected the old CVS history to the new hg history in a common git
repo, and there it's "a few" more than 70k changesets, I guess...

Robert Kaiser

unread,
Aug 25, 2011, 9:48:31 AM8/25/11
to
Zack Weinberg schrieb:

> Hey, I said I didn't like git's *user interface*, not its speed or disk
> usage :) Give me Mercurial's CLI and extension hooks on top of Git's
> storage model and I'll be happy.

Seconded.

L. David Baron

unread,
Aug 25, 2011, 9:55:52 AM8/25/11
to Martijn, dev-pl...@lists.mozilla.org, smaug
On Thursday 2011-08-25 14:49 +0200, Martijn wrote:
> On Thu, Aug 25, 2011 at 1:26 PM, smaug <sm...@welho.com> wrote:
> > And give me also CVS blame/bonsai like UI, then I'd be happy.
> >
> > It is surprising that both hg and git are missing good UI for
> > blame/annotate.
>
> Yes, I'm missing that too.
> With regression range finding, I need to see what files are being
> changed. Current UI is not really helpful for that.

You can get that from the command line pretty easily, though, with
"hg log -v". hg log can also take revision ranges and revsets (for
some fun with revsets, see http://www.selenic.com/blog/?p=744 ).

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla Corporation http://www.mozilla.com/ 𝄂

Ehsan Akhgari

unread,
Aug 27, 2011, 5:02:02 PM8/27/11
to Cedric Vivier, Blake Winton, Chris Double, dev-pl...@lists.mozilla.org
On 11-08-24 10:19 AM, Cedric Vivier wrote:
> Great stuff.
>
> It would be nice if you could push the mapfile as well in a separate
> branch so that anyone can possibly set up hg-git with it :)
> [eg. to keep it updated]

I have published the fixed git-mapfile here:
<https://github.com/ehsan/mozilla-history-tools/blob/master/initial_conversaion/git-mapfile.new>.
I'm planning to create a script which does the syncing automatically
and also automatically update the git-mapfile for others to grab.

There seems to be a few bugs with hg-git corrupting the author line. I
haven't gotten to debug it yet, but I'll try to look into it next week.

Cheers,
Ehsan

Enrico Weigelt

unread,
Dec 7, 2011, 1:50:56 AM12/7/11
to dev-pl...@lists.mozilla.org
* Robert Kaiser <ka...@kairo.at> wrote:

> The reason why git is so much faster is very probably mainly because its
> (history) storage backend is so much more efficient.

The performance also comes from its data model.
(which also allows easy implementation of several sophisticated
operations).


cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/

phone: +49 36207 519931 email: wei...@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

Enrico Weigelt

unread,
Dec 7, 2011, 1:49:42 AM12/7/11
to dev-pl...@lists.mozilla.org
* Zack Weinberg <za...@panix.com> wrote:

> (Caveat: I have read, but cannot presently find where I read, that Git
> does not record branch information permanently in its revision history;
> branches are apparently just ephemeral pointers to tip revisions. If
> that's true, it might be a serious problem when doing archaeology.)

Exactly. Thats a primary design aspect, as it works with local
namespaces. Having some kind of global/permanent branches would
require some globally coordinated namespace handling, defeating
the very purpose of git as a fully distributed VCS.

If you *really* want to add such information (actually, I've never
seen a really good use case for this - anydone knows some ?),
you can easily add it to the commit headers.
0 new messages