Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Version Control Software

109 views
Skip to first unread message

cutems93

unread,
Jun 12, 2013, 7:27:22 PM6/12/13
to
I am looking for an appropriate version control software for python development, and need professionals' help to make a good decision. Currently I am considering four software: git, SVN, CVS, and Mercurial. Of course, I already did some research on different characteristics of version software, but I concluded that listening to personal experiences and opinions from the professionals will help me a lot. What version control software do you like the most and why? What is the difference between git and Mercurial? Also, if anyone can help me by doing google-chat or skype, please let me know.

Thanks in advance!

Mark Janssen

unread,
Jun 12, 2013, 7:36:33 PM6/12/13
to cutems93, pytho...@python.org
> I am looking for an appropriate version control software for python development, and need professionals' help to make a good decision. Currently I am considering four software: git, SVN, CVS, and Mercurial.

I'm not real experienced, but I understand that SVN is good if your
hosting your own code base, and CVS is hardly used anymore as it
doesn't support atomic commits (when having many developers work on
the same code base). Git and hg have ben vying for several years with
no clear winner, yet

--
MarkJ
Tacoma, Washington

Joel Goldstick

unread,
Jun 12, 2013, 7:52:01 PM6/12/13
to Mark Janssen, cutems93, pytho...@python.org
git or hg.  but git is most popular and very easy to learn.  Its also great for distributed develpment


On Wed, Jun 12, 2013 at 7:36 PM, Mark Janssen <dreamin...@gmail.com> wrote:
> I am looking for an appropriate version control software for python development, and need professionals' help to make a good decision. Currently I am considering four software: git, SVN, CVS, and Mercurial.

I'm not real experienced, but I understand that SVN is good if your
hosting your own code base, and CVS is hardly used anymore as it
doesn't support atomic commits (when having many developers work on
the same code base).  Git and hg have ben vying for several years with
no clear winner, yet



--

Chris Angelico

unread,
Jun 12, 2013, 8:04:14 PM6/12/13
to pytho...@python.org
On Thu, Jun 13, 2013 at 9:27 AM, cutems93 <ms2...@cornell.edu> wrote:
> I am looking for an appropriate version control software for python development, and need professionals' help to make a good decision. Currently I am considering four software: git, SVN, CVS, and Mercurial. Of course, I already did some research on different characteristics of version software, but I concluded that listening to personal experiences and opinions from the professionals will help me a lot. What version control software do you like the most and why? What is the difference between git and Mercurial? Also, if anyone can help me by doing google-chat or skype, please let me know.

Don't touch CVS unless you absolutely have to. SVN is also distinctly
old now. The three most popular modern source control systems are git,
hg, and bzr (Bazaar). Of the three, I would remove Bazaar from
consideration unless you're posting to a Canonical repository;
Mercurial and git are superior, in my experience.

Between those two (hg and git), though, it's really hard to call. I'm
personally familiar with git, and it serves me well; others have the
same experience with hg. Either will do you fine. They have some
different features, eg git detects file moves after the event while hg
prefers to be told about them up-front, but for normal daily tasks,
either is fine. Pick based on which one other people near you are
familiar with, so that you can get help when things go wrong - for
instance, I would be utterly useless when it comes to hg (I can't even
make patch files, which I can do just fine with git).

But above all, do use source control. The difference between that and
not is way WAY more than the difference between one system and another
:)

ChrisA

Tim Chase

unread,
Jun 12, 2013, 10:41:58 PM6/12/13
to Chris Angelico, pytho...@python.org
[much of my reply echos Chris but elaborate]

On 2013-06-13 10:04, Chris Angelico wrote:
> On Thu, Jun 13, 2013 at 9:27 AM, cutems93 <ms2...@cornell.edu>
> wrote:
> > Currently I am considering four software: git, SVN,
> > CVS, and Mercurial.
>
> Don't touch CVS unless you absolutely have to. SVN is also
> distinctly old now.

SVN had its place, but branching/merging is a pain (well, branching is
pretty easy, it's the merging that hurts).

> Mercurial and git are superior, in my experience.
>
> Between those two (hg and git), though, it's really hard to call.
> I'm personally familiar with git, and it serves me well; others
> have the same experience with hg. Either will do you fine.

A few pros (+) and cons (-) from my experiences:

+hg: much easier to transition from CVS/SVN as the command-line
syntax/structure matches much more closely

-git: the command-line interface feels rather distant from the
CVS/SVN classics

+hg: better cross-platform (i.e., including Win32) support

-git: a bit persnickity on Win32

-hg: last I checked, can't do octopus merges (merges with more than
two parents)

+git: can do octopus merges

-/+ hg: certain power-user functionality is relegated to plugins that
you need to activate (though many come standard, you have to activate
them) This can be a plus if you don't want to have a foot-gun within
easy reach; this can be an annoyance if you regularly use those sorts
of tools appropriately (particularly the partial-commit that "git add
-p" provides)

+git: the internal data model is pretty simple making it easy to
understand where things stand and the status of various branches

+git: having multiple remotes and managing them feel a little easier
to me than with Mercurial (YMMV)

+hg: written in Python (with optional C component for some
CPU-intensive work, but can run without it if you don't have
compile-rights on a particular machine that does already have Python
installed)

-git: a hodge-podge of C, Perl, shell-scripts and other madness.
This is part of the Win32 ding above.

+hg: Python devs have chosen Mercurial as their VCS of choice

+hg: bitbucket hosting

+git: github, gitorious, bitbucket hosting

+git, +hg: both have lots of big-name projects using them

+git, +hg: both have reasonably painless ways of talking to
repositories of other flavors (git can talk to CVS/SVN/hg repos; hg
can talk to CVS/SVN/git repos)

+git, +hg: documentation on both is top-notch (git's available
documentation has radically improved since it's grand suckage before
1.6; once 1.6 landed, git was far less user-hostile)


Given the choice, I eventually settled on git (after about 3-4
serious attempts to learn it, then giving up for a couple months and
retrying) unless I have to involve Win32 machines, as I like the
power it provides and how easy it is to understand in my head.
On Win32, I tend to bias towards Mercurial. There are still some
aspects of Mercurial's internal models that leave me scratching my
head and rummaging through the docs (public vs. private branches,
bookmarking, preferring cloning to make branches) and surrendering
occasionally on more obscure things I know that I *should* be able to
do. That said, if you just want solid VCS behavior and already know
CVS/SVN, Mercurial will give you an easier transition.

And I believe most of what can be said about Mercurial can also be
said about Bazaar (bzr), though it seems to have less mindshare,
except perhaps among Ubuntu developers, as it has tighter integration
with LaunchPad.

Fortunately, since git/hg/bzr are all free, you can download them all
and kick the tires to see which one fits YOU (the OP) best.

-tkc






Ben Finney

unread,
Jun 12, 2013, 10:30:40 PM6/12/13
to pytho...@python.org
cutems93 <ms2...@cornell.edu> writes:

> I am looking for an appropriate version control software for python
> development, and need professionals' help to make a good decision.

> Currently I am considering four software: git, SVN, CVS, and
> Mercurial.

These days there is no good reason to use CVS nor Subversion for new
projects. They are not distributed (the D in DVCS), and they have
specific design flaws that often cause insidious problems with common
version control workflows. As a salient example, branching and merging
are so painful with these tools that many users have learned the
terrible habit of never doing it at all.

Bazaar, Git, and Mercurial are all excellent DVCS systems (and all have
excellent branching and merging support). For someone new to version
control, I would highly recommend Bazaar, or Mercurial if that's not an
option. I would not recommend Git for new work.

It helps that all of these are free software. Avoid proprietary tools
for development work, especially tools that control access to your data.

> What version control software do you like the most and why?

Bazaar. It has, in my experience, by far the easiest default workflow to
learn. It is also very flexible for the odd wrinkles in preferred
workflow that most beginners don't even know enough to realise they have.

(Examples of Bazaar features that make it IMO superior are: default to
view only the main-line revisions without the “merge noise” that would
happens with other VCSes; easily serve a branch from just about any
shared file storage; easily choose a centralised repository for
particular purposes without any other user needing to do anything
different).

Mercurial is relatively easy to learn, and full-featured; it is somewhat
more restrictive than Bazaar but not enough to recommend against.


Git is hugely capable and is the most popular, but still has some
annoying restrictions (e.g. it can't hide merged revisions, encouraging
poor practice like re-writing history when merging a branch).

But my main reason to recommend against Git is that its native interface
is far too baroque: it exposes its innards and requires the user to know
a huge range of internal concepts to avoid making mistakes.

You should be wary of GitHub, a very popular Git hosting site. It uses
what amount to proprietary protocols, which encourage using GitHub's
specific interface instead of native Git for your operations and hide a
lot of the needless complexity; but this results in a VCS repository
that is difficult to use *without* being tied to that specific site,
killing one of the best reasons to use a DVCS in the first place.

Gitorious is a Git hosting site that does not have this problem, and may
for that reason be a good choice for hosting your Git repositories. It
is also based on free software (unlike GitHub), so if the service goes
away for any reason, anyone else can produce a functionally identical
service from the same server code. This makes it a better bet for
hosting your repositories.

Neither Mercurial nor Bazaar suffer from Git's baroque complexity, and
with Bazaar's command interface being IME the easiest and most intuitive
to teach, I would recommend Bazaar for any new VCS user.


A sad caveat, though: Bazaar suffers from a foolishly limited
development pool (Canonical are the main copyright holder, and, instead
of accepting contributions under the same license they grant to others,
they obstinately insist on having special exclusive powers over the
code). Also, Bazaar's early versions did not impress large projects like
Linux or Python; improvements have long since erased the reasons for
that, but too late for widespread popularity.

So Bazaar's popularity never gained as much as Git or Mercurial. Worse,
development of Bazaar appears to have stagnated at Canonical — and,
because they insisted on being in a privileged copyright position,
no-one else is in a good position to easily carry on development.

Bazaar is still my recommendation of primary VCS tool, for its
flexibility, speed, wealth of plug-ins, ability to view revision history
sensible, and straightforward command interface. But you should go into
it aware that it may be a little more difficult to find fellow users of
Bazaar than of Mercurial.

--
\ “The lift is being fixed for the day. During that time we |
`\ regret that you will be unbearable.” —hotel, Bucharest |
_o__) |
Ben Finney

Tim Chase

unread,
Jun 12, 2013, 10:48:09 PM6/12/13
to pytho...@python.org
On 2013-06-12 16:27, cutems93 wrote:
> I am looking for an appropriate version control software for python
> development, and need professionals' help to make a good decision.

While I'm generally a git user (see my other email), I'll also put in
a plug for Fossil <http://fossil-scm.org/> which has a single binary
(making it easily installed), as well as an integrated bug-tracker &
wiki, and can be dropped onto a server as a CGI program with almost no
effort. And it's primary author, Richard Hipp is famous for creating
sqlite, and for the rigorous testing under which both tools go.

-tkc


Roy Smith

unread,
Jun 12, 2013, 10:51:51 PM6/12/13
to
In article <98c13a55-dbf2-46a7...@googlegroups.com>,
cutems93 <ms2...@cornell.edu> wrote:

> I am looking for an appropriate version control software for python
> development, and need professionals' help to make a good decision. Currently
> I am considering four software: git, SVN, CVS, and Mercurial.

CVS is hopelessly obsolete. SVN pretty much the same.

Git and Mercurial are essentially identical in terms of features; which
you like is as much a matter of personal preference as anything else.
Pick one and learn it.

cutems93

unread,
Jun 13, 2013, 2:00:33 AM6/13/13
to
Thank you everyone for such helpful responses! Actually, I have one more question. Does anybody have experience with closed source version control software? If so, why did you buy it instead of downloading open source software? Does closed source vcs have some benefits over open source in some part?

Thanks!
MinS

rusi

unread,
Jun 13, 2013, 2:43:46 AM6/13/13
to
Not too many people who buy expensive software use it.
Those who use it, have usually not been party to buying it.

The first are usually called 'boss'.
The second, 'programmer' or some euphemism for that like 'software-
engineer.'

As to your question about vcs, there is also fossil:
http://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.wiki

Serhiy Storchaka

unread,
Jun 13, 2013, 3:20:42 AM6/13/13
to pytho...@python.org
13.06.13 05:41, Tim Chase написав(ла):
> -hg: last I checked, can't do octopus merges (merges with more than
> two parents)
>
> +git: can do octopus merges

Actually it is possible in Mercurial. I just have made a merge of two
files in CPython test suite (http://bugs.python.org/issue18048).


Roy Smith

unread,
Jun 13, 2013, 7:08:37 AM6/13/13
to
In article <2644d0de-9a81-41aa...@googlegroups.com>,
This really doesn't have anything to do with python. Someplace like
http://en.wikipedia.org/wiki/Comparison_of_version_control_systems would
be a good starting point for further research.

If I were to buy a closed-source VCS today, I would look at Perforce
(www.perforce.com). I used it for several years. For small teams, you
can download and use it for free, so you can play with it without
commitment.

Perforce tries to solve a somewhat larger problem than just version
control. They also do configuration management. You can set up a
config-spec which says, "Give me this bunch of files from branch A, that
bunch of files from branch B, and some third bunch of files which have
some specific tag. And, while you're at it, remap the path names so the
directory structure looks like I want it to".

This configuration management can be a powerful tool when working on a
huge project. We threw *everything* into our p4 repo, including the all
the compilers, development toolchains, and pre-built binaries for all
the third-party libraries we used. We also used a single repo shared by
all the development groups (many 100's of developers on three
continents). I would never want to do that in a system like git or hg;
every developer would have to drag down 100's of GB of crap they didn't
need. With p4, we could build people config-specs so they got just the
parts they needed.

It is also a bit of a steep learning curve to figure out. Only a few
people were trusted to do things like build config-specs and create
shared branches.

As a company, Perforce is a dream to work with. Their tech support was
pretty awesome. I would shoot off an email to sup...@perforce.com, and
I don't think it ever took more than 5 or 10 minutes for me to get a
response back from somebody. And that somebody would inevitably be
somebody who knew enough to solve my problem, not just some first-line
support drone.

The costs aren't outrageous, either. The pricing is a little
complicated (initial license, annual renewal, various support options,
of them on a sliding scale based on quantity). I seem to remember it
working out to about $100/developer/year for us, but we were buying in
fairly large quantities.

MRAB

unread,
Jun 13, 2013, 7:26:04 AM6/13/13
to pytho...@python.org
I've used Microsoft SourceSafe. I didn't like it (does anyone? :-)).

rusi

unread,
Jun 13, 2013, 7:46:21 AM6/13/13
to

rusi

unread,
Jun 13, 2013, 7:54:27 AM6/13/13
to
On Jun 13, 7:30 am, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
>
> You should be wary of GitHub, a very popular Git hosting site. It uses
> what amount to proprietary protocols, which encourage using GitHub's
> specific interface instead of native Git for your operations and hide a
> lot of the needless complexity; but this results in a VCS repository
> that is difficult to use *without* being tied to that specific site,
> killing one of the best reasons to use a DVCS in the first place.

bitbucket -- originally only Hg based -- now supports Hg or git.
And for small private (non open source) repos its more affordable
http://tilomitra.com/bitbucket-vs-github/

Tim Chase

unread,
Jun 13, 2013, 8:34:53 AM6/13/13
to Serhiy Storchaka, pytho...@python.org
On 2013-06-13 10:20, Serhiy Storchaka wrote:
> 13.06.13 05:41, Tim Chase написав(ла):
> > -hg: last I checked, can't do octopus merges (merges with more
> > than two parents)
> >
> > +git: can do octopus merges
>
> Actually it is possible in Mercurial.

Okay, then that moots this pro/con pair. I seem to recall that at
one point in history, Mercurial required you to do pairwise merges
rather than letting you merge multiple branches in one pass.

-tkc



Rui Maciel

unread,
Jun 13, 2013, 8:43:58 AM6/13/13
to
Roy Smith wrote:

> In article <98c13a55-dbf2-46a7...@googlegroups.com>,
> cutems93 <ms2...@cornell.edu> wrote:
>
>> I am looking for an appropriate version control software for python
>> development, and need professionals' help to make a good decision.
>> Currently I am considering four software: git, SVN, CVS, and Mercurial.
>
> CVS is hopelessly obsolete. SVN pretty much the same.

I would say that SVN does have its uses, but managing software repositories
isn't one of them due to the wealth of available alternatives out there
which are far better than it.


> Git and Mercurial are essentially identical in terms of features; which
> you like is as much a matter of personal preference as anything else.
> Pick one and learn it.

I agree, but there is a feature Git provides right out of the box which is
extremelly useful but Mercurial supports only as a non-standard module: the
git stash feature.


Rui Maciel

Roy Smith

unread,
Jun 13, 2013, 8:52:18 AM6/13/13
to
In article <mailman.3185.1371126...@python.org>,
So, I guess the next questions is, why would you *want* to merge
multiple branches in one pass? What's the use case? I've been using
VCSs for a long time (I've used RCS, CVS, ClearCase, SVN (briefly),
Perforce, Git, and hg). I can't ever remember a time when I've wanted
to do such a thing. Maybe it's the kind of thing that makes sense on a
huge distributed project with hundreds of people committing patches
willy-nilly?

How would hg even represent such a multi-way merge? Doesn't every
revision have exactly one or two parents?

Grant Edwards

unread,
Jun 13, 2013, 1:06:59 PM6/13/13
to
On 2013-06-13, Ben Finney <ben+p...@benfinney.id.au> wrote:
> cutems93 <ms2...@cornell.edu> writes:
>
>> I am looking for an appropriate version control software for python
>> development, and need professionals' help to make a good decision.
>
>> Currently I am considering four software: git, SVN, CVS, and
>> Mercurial.
>
> These days there is no good reason to use CVS nor Subversion for new
> projects. They are not distributed (the D in DVCS), and they have
> specific design flaws that often cause insidious problems with common
> version control workflows. As a salient example, branching and merging
> are so painful with these tools that many users have learned the
> terrible habit of never doing it at all.

I agree that branch/merge handling in svn is primitive compared to git
(haven't used hg enough to comment).

The last time we made the choice (4-5 years ago), Windows support for
get, bzr, and hg was definitely lacking compared to svn. The lack of
something like tortoisesvn for hg/git/bzr was a killer. It looks like
the situation has improved since then, but I'd be curious to hear from
people who do their development on Windows.

--
Grant Edwards grant.b.edwards Yow! I wonder if there's
at anything GOOD on tonight?
gmail.com

Chris Angelico

unread,
Jun 13, 2013, 5:26:17 PM6/13/13
to pytho...@python.org
On Fri, Jun 14, 2013 at 3:06 AM, Grant Edwards <inv...@invalid.invalid> wrote:
> The last time we made the choice (4-5 years ago), Windows support for
> get, bzr, and hg was definitely lacking compared to svn. The lack of
> something like tortoisesvn for hg/git/bzr was a killer. It looks like
> the situation has improved since then, but I'd be curious to hear from
> people who do their development on Windows.

I do almost exclusively Linux dev, but occasionally nip onto Windows
for one reason or another (possibly inside a virtual machine). It's
possible to get git for Windows, including gitk and 'git gui' (not
sure about any other graphical tools, they're the only two I use), but
the most convenient way to use them is from a ported bash.
Fortunately, the installer will provide all of that, putting a 'Git
Bash' entry into the Start menu, and for someone who's come from Linux
anyway, working in bash is quite welcome.

ChrisA

Grant Edwards

unread,
Jun 13, 2013, 5:53:34 PM6/13/13
to
Unfortunately, something that requires typing commands would not fly.
I mostly use svn via command line and sometimes via meld, but for some
others (even one Linux developer), if it can't be done done entirely
from a GUI, then it isn't going to get done.

If it wasn't for Cygwin, I'd never be able to accomplish much of
anything in Windows. :)

--
Grant Edwards grant.b.edwards Yow! Oh my GOD -- the
at SUN just fell into YANKEE
gmail.com STADIUM!!

Chris Angelico

unread,
Jun 13, 2013, 5:59:42 PM6/13/13
to pytho...@python.org
On Fri, Jun 14, 2013 at 7:53 AM, Grant Edwards <inv...@invalid.invalid> wrote:
> On 2013-06-13, Chris Angelico <ros...@gmail.com> wrote:
>> On Fri, Jun 14, 2013 at 3:06 AM, Grant Edwards <inv...@invalid.invalid> wrote:
>> I do almost exclusively Linux dev, but occasionally nip onto Windows
>> for one reason or another (possibly inside a virtual machine). It's
>> possible to get git for Windows, including gitk and 'git gui' (not
>> sure about any other graphical tools, they're the only two I use)
>
> Unfortunately, something that requires typing commands would not fly.
> I mostly use svn via command line and sometimes via meld, but for some
> others (even one Linux developer), if it can't be done done entirely
> from a GUI, then it isn't going to get done.
>
> If it wasn't for Cygwin, I'd never be able to accomplish much of
> anything in Windows. :)

Check out 'git gui' then - and in the Windows build, that's in the
Start menu directly. I usually use git gui only for partial commits
(it's more convenient than 'git add -p' when the parts to commit and
the parts to not-commit are right next to each other), but it can be
your full console. For those who like the graphical things in life,
it's a good choice.

That and gitk for viewing the repo. I use gitk *all the time*, at work
and on my own projects, because it is excellent. (Actually I use a
minorly-patched gitk; must remember to submit the patch upstream some
day.)

ChrisA

Fábio Santos

unread,
Jun 13, 2013, 6:15:19 PM6/13/13
to Chris Angelico, pytho...@python.org


On 13 Jun 2013 22:34, "Chris Angelico" <ros...@gmail.com> wrote:
> [...]


> It's
> possible to get git for Windows, including gitk and 'git gui' (not

> sure about any other graphical tools, they're the only two I use), but
> the most convenient way to use them is from a ported bash.

I must disagree. I used git a lot on windows this past year, on a Console shell (which is basically a CMD.EXE shell with tabs and appropriate select/copy/paste) and it was quite useful.

I must although say that I wasn't doing any merges and such. I was just committing, pushing and diffing to check what I'd done.

I used gitk and the git commands. You can't "git diff" or "git show" or "git log" because paging will suck terribly. But gitk was a nice substitute for all that.

YMMV 

Chris Angelico

unread,
Jun 13, 2013, 6:17:24 PM6/13/13
to pytho...@python.org
On Fri, Jun 14, 2013 at 8:15 AM, Fábio Santos <fabiosa...@gmail.com> wrote:
> I must disagree. I used git a lot on windows this past year, on a Console
> shell (which is basically a CMD.EXE shell with tabs and appropriate
> select/copy/paste) and it was quite useful.

Maybe that's changed since the last time I installed it, then. Though
bash is still preferable to me, since that's what I use on Linux.

ChrisA

Zero Piraeus

unread,
Jun 13, 2013, 6:20:40 PM6/13/13
to pytho...@python.org
:

On 13 June 2013 17:53, Grant Edwards <inv...@invalid.invalid> wrote:
>
> Unfortunately, something that requires typing commands would not fly.

I haven't used it (very rarely use GUI dev tools), but Tortoise Hg
<http://tortoisehg.bitbucket.org/> seems to have a decent reputation
for Mercurial (and is at least somewhat cross-platform).

-[]z.

Benjamin Kaplan

unread,
Jun 13, 2013, 6:24:23 PM6/13/13
to Python List

There's a TortoiseHg now that works well. http://tortoisehg.bitbucket.org

I haven't used it very much, but github has released a git client for Windows.  The underlying library is the same one Microsoft uses for the Visual Studio git integration, so I assume it's fairly robust at this point.
http://windows.github.com

Neil Hodgson

unread,
Jun 13, 2013, 6:53:49 PM6/13/13
to
Grant Edwards:

> The last time we made the choice (4-5 years ago), Windows support for
> get, bzr, and hg was definitely lacking compared to svn. The lack of
> something like tortoisesvn for hg/git/bzr was a killer. It looks like
> the situation has improved since then, but I'd be curious to hear from
> people who do their development on Windows.

GUIs for Hg/Git are now much more usable. On Windows, OS X, and
Linux my GUI/command line use split is about 80/20.

For Hg, TortoiseHg is quite good on Windows and Linux and so is
SourceTree on OS X. I don't use Git as much but SourceTree works well on
OS X.

SourceTree is in beta on Windows and doesn't yet support Hg there.

http://tortoisehg.bitbucket.org/
http://www.sourcetreeapp.com/

Neil

Terry Reedy

unread,
Jun 13, 2013, 8:09:12 PM6/13/13
to pytho...@python.org
On 6/13/2013 6:20 PM, Zero Piraeus wrote:
> :
>
> On 13 June 2013 17:53, Grant Edwards <inv...@invalid.invalid> wrote:
>>
>> Unfortunately, something that requires typing commands would not fly.
>
> I haven't used it (very rarely use GUI dev tools), but Tortoise Hg
> <http://tortoisehg.bitbucket.org/> seems to have a decent reputation
> for Mercurial (and is at least somewhat cross-platform).

I use the tortoisehg context menus and HgWorkbench (gui access) and am
mostly happy with it.


--
Terry Jan Reedy

Anssi Saari

unread,
Jun 14, 2013, 8:06:39 AM6/14/13
to
cutems93 <ms2...@cornell.edu> writes:

> Thank you everyone for such helpful responses! Actually, I have one more question. Does anybody have experience with closed source version control software? If so, why did you buy it instead of downloading open source software? Does closed source vcs have some benefits over open source in some part?

I have some experience with ClearCase. I don't know why anyone would buy
it since it's bloated and slow and hard to use and likes to take over
your computer. I was very happy to dump it when my team was allowed to
use whatever we wanted but then we were not doing software either.

ClearCase is also admin heavy for the above reasons. I guess big
businesses buy things like that because other big businesses buy things
like that. Presumably they keep it because it's cheaper to pay
maintenance than move all source to some other system.

Now granted, Linux development went to commercial Bitkeeper for a while
since Linus Torvalds found it superior to CVS sometime over a decade
ago. When the agreement ended, Torvalds himself developed Git to be what
he needs. Other projects sprang up around the same time to get that job,
this means at least Mercurial if Wikipedia is to be believed.

Oh, as far as I know, commercial software vendors always ban their
customers from publishing any kinds of benchmarks or other comparisons
so it's unlikely you can find anything concrete for your commercial
vs. free choice.

Roy Smith

unread,
Jun 14, 2013, 8:32:00 AM6/14/13
to
In article <vg3obb8...@coffee.modeemi.fi>, Anssi Saari <a...@sci.fi>
wrote:

> I have some experience with ClearCase. I don't know why anyone would buy
> it since it's bloated and slow and hard to use and likes to take over
> your computer.

ClearCase was the right solution to certain specific problems which
existed 20 years ago. It does have a couple of cool features.

1) Every revision of every file exists simultaneously in the file system
namespace (CC exports its repo as a quasi-NFS file system). That means
you can look at every revision with all your normal command-line tools
(diff, grep, whatever).

2) It ships with an integrated build tool which can automatically learn
your dependency graph. This is paired with a feature called "winking
in". Let's say I'm building a humungous C++ project which takes hours
to compile. And I'm part of a team of 50 developers, all working on the
same code.

If I need foo.o, and some other developer has already compiled a foo.o
with exactly the same dependency graph (including what versions of the
toolchain and option flags), I just instantly and transparently get a
copy of their file instead of having to build it myself. This can
potentially save a huge amount of build time.

All that being said, it is, as Anssi points out, a horrible, bloated,
overpriced, complicated mess which requires teams of specially trained
ClearCase admins to run. In other words, it's exactly the sort of thing
big, stupid, Fortune-500 companies buy because the IBM salesperson plays
golf with the CIO.

Grant Edwards

unread,
Jun 14, 2013, 10:24:56 AM6/14/13
to
On 2013-06-14, Roy Smith <r...@panix.com> wrote:

> All that being said, it is, as Anssi points out, a horrible, bloated,
> overpriced, complicated mess which requires teams of specially
> trained ClearCase admins to run. In other words, it's exactly the
> sort of thing big, stupid, Fortune-500 companies buy because the IBM
> salesperson plays golf with the CIO.

Years ago, I worked at one largish company where a couple of the
embedded development projects used ClearCase. The rest of us used CVS
or RCS or some other cheap commercial systems. Judging by those
results, ClearCase requires a full-time administrator for every 10 or
so users. The other systems seemed to require almost no regular
administration, and what was required was handled by the developers
themselves (mayby a couple hours per month). The cost of ClearCase
was also sky-high.

--
Grant Edwards grant.b.edwards Yow! VICARIOUSLY experience
at some reason to LIVE!!
gmail.com

Dave Angel

unread,
Jun 14, 2013, 4:55:20 PM6/14/13
to pytho...@python.org
On 06/14/2013 10:24 AM, Grant Edwards wrote:
> On 2013-06-14, Roy Smith <r...@panix.com> wrote:
>
>> All that being said, it is, as Anssi points out, a horrible, bloated,
>> overpriced, complicated mess which requires teams of specially
>> trained ClearCase admins to run. In other words, it's exactly the
>> sort of thing big, stupid, Fortune-500 companies buy because the IBM
>> salesperson plays golf with the CIO.
>
> Years ago, I worked at one largish company where a couple of the
> embedded development projects used ClearCase. The rest of us used CVS
> or RCS or some other cheap commercial systems. Judging by those
> results, ClearCase requires a full-time administrator for every 10 or
> so users. The other systems seemed to require almost no regular
> administration, and what was required was handled by the developers
> themselves (mayby a couple hours per month). The cost of ClearCase
> was also sky-high.
>

if I remember rightly, it was about two-thousand dollars per seat. And
the people I saw using it were using XCOPY to copy the stuff they needed
onto their local drives, then disabling the ClearCase service so they
could get some real work done. Compiles were about 10x slower with the
service active.

Now that was on Windows NT, when Clearcase was first porting from Unix.
So perhaps things have improved.


--
DaveA
Message has been deleted

Tim Delaney

unread,
Jun 15, 2013, 1:39:48 AM6/15/13
to Dave Angel, Python-List
On 15 June 2013 06:55, Dave Angel <da...@davea.name> wrote:
On 06/14/2013 10:24 AM, Grant Edwards wrote:
On 2013-06-14, Roy Smith <r...@panix.com> wrote:

All that being said, it is, as Anssi points out, a horrible, bloated,
overpriced, complicated mess which requires teams of specially
trained ClearCase admins to run.  In other words, it's exactly the
sort of thing big, stupid, Fortune-500 companies buy because the IBM
salesperson plays golf with the CIO.

Years ago, I worked at one largish company where a couple of the
embedded development projects used ClearCase.  The rest of us used CVS
or RCS or some other cheap commercial systems.  Judging by those
results, ClearCase requires a full-time administrator for every 10 or
so users.  The other systems seemed to require almost no regular
administration, and what was required was handled by the developers
themselves (mayby a couple hours per month).  The cost of ClearCase
was also sky-high.


if I remember rightly, it was about two-thousand dollars per seat.  And the people I saw using it were using XCOPY to copy the stuff they needed onto their local drives, then disabling the ClearCase service so they could get some real work done.  Compiles were about 10x slower with the service active.

I can absolutely confirm how much ClearCase slows things down. I completely refused to use dynamic views for several reasons - #1 being that if you lost your network connection you couldn't work at all, and #2 being how slow they were. Static views were slightly better as you could at least hijack files in that situation and keep working (and then be very very careful when you were back online).

And then of course there was ClearCase Remote Client. I was working from home much of the time, so I got to use CCRC. It worked kinda well enough, and in that situation was much better than the native client. Don't ever ever try to use ClearCase native over a non-LAN connection. I can't stress this enough. The ClearCase protocol is unbelievably noisy, even if using static views.

CCRC did have one major advantage over the native client though. I had the fun task when I moved my local team from CC to Mercurial of keeping the Mercurial and CC clients in sync. Turns out that CCRC was the best option, as I was able to parse its local state files and work out what timestamp ClearCase thought its files should be, set it appropriately from a Mercurial extension and convince CCRC that really, only these files have changed, not the thousand or so that just had their timestamp changed ... CCRC at least made that possible, even if it was a complete accident by the CCRC developers.

Tim Delaney

Chris Angelico

unread,
Jun 15, 2013, 1:53:42 AM6/15/13
to pytho...@python.org
On Sat, Jun 15, 2013 at 3:39 PM, Tim Delaney
<timothy....@gmail.com> wrote:
> I can absolutely confirm how much ClearCase slows things down. I completely
> refused to use dynamic views for several reasons - #1 being that if you lost
> your network connection you couldn't work at all...

And that right there is why modern source control systems are
distributed, not centralized. It's so much easier with git; we lost
our central hub at one point, and another dev and I simply pulled from
each other for a bit until we got a new Scaphio online. With
centralized version control, that would have basically meant a
complete outage until the new box was up.

ChrisA

Roy Smith

unread,
Jun 15, 2013, 10:16:15 AM6/15/13
to
In article <mailman.3359.1371275...@python.org>,
The advantage of DVCS is that everybody has a full copy of the repo.
The disadvantage of the DVCS is that every MUST have a full copy of the
repo. When a repo gets big, you may not want to pull all of that data
just to get the subtree you need.

Giorgos Tzampanakis

unread,
Jun 15, 2013, 11:29:27 AM6/15/13
to
On 2013-06-15, Roy Smith wrote:

>> And that right there is why modern source control systems are
>> distributed, not centralized. It's so much easier with git; we lost
>> our central hub at one point, and another dev and I simply pulled from
>> each other for a bit until we got a new Scaphio online. With
>> centralized version control, that would have basically meant a
>> complete outage until the new box was up.
>>
>> ChrisA
>
> The advantage of DVCS is that everybody has a full copy of the repo.
> The disadvantage of the DVCS is that every MUST have a full copy of the
> repo. When a repo gets big, you may not want to pull all of that data
> just to get the subtree you need.

Also, is working without connection to the server such big an issue? One
would expect that losing access to the central server would indicate
significant problems that would impact development anyway.

--
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/

Dan Sommers

unread,
Jun 15, 2013, 2:29:43 PM6/15/13
to
On Sat, 15 Jun 2013 15:29:27 +0000, Giorgos Tzampanakis wrote:

> Also, is working without connection to the server such big an issue?
> One would expect that losing access to the central server would
> indicate significant problems that would impact development anyway.

Everyone and every device is connected to the internet all the time, or
else the universe comes to an end.

Get off my lawn! ;-)

Being able to work remotely is a huge win. Anarchy is not. Somewhere
in between, reality sets in, and I can work appropriately for different
use cases.

Tim Delaney

unread,
Jun 15, 2013, 5:49:12 PM6/15/13
to Giorgos Tzampanakis, Python-List
On 16 June 2013 01:29, Giorgos Tzampanakis <giorgos.t...@gmail.com> wrote:
On 2013-06-15, Roy Smith wrote:

Also, is working without connection to the server such big an issue? One
would expect that losing access to the central server would indicate
significant problems that would impact development anyway.

I work almost 100% remotely (I chose to move back to a country town). Most of the time I have a good internet connection. But sometimes my clients are in other countries (I'm in Australia, my current client is in the US) and the VPN is slow or doesn't work (heatwaves have taken down their systems a few times). Sometimes I'm on a train going to Sydney and mobile internet is pretty patchy much of the way. Sometimes my internet connection dies - we had a case where someone put a backhoe through the backhaul and my backup mobile internet was also useless.

But so long as at some point I can sync the repositories, I can work away (on things that are not dependent on something new from upstream).

Tim Delaney 

Chris Angelico

unread,
Jun 15, 2013, 7:01:13 PM6/15/13
to pytho...@python.org
On Sun, Jun 16, 2013 at 4:29 AM, Dan Sommers <d...@tombstonezero.net> wrote:
> On Sat, 15 Jun 2013 15:29:27 +0000, Giorgos Tzampanakis wrote:
>
>> Also, is working without connection to the server such big an issue?
>> One would expect that losing access to the central server would
>> indicate significant problems that would impact development anyway.
>
> Everyone and every device is connected to the internet all the time, or
> else the universe comes to an end.
>
> Get off my lawn! ;-)

So some of us think that version control is a single-player game, but
CVS-Box One thinks always-on gaming is a reasonable thing?

*ducks*

ChrisA

Chris Angelico

unread,
Jun 15, 2013, 7:14:34 PM6/15/13
to pytho...@python.org
On Sun, Jun 16, 2013 at 12:16 AM, Roy Smith <r...@panix.com> wrote:
> The advantage of DVCS is that everybody has a full copy of the repo.
> The disadvantage of the DVCS is that every MUST have a full copy of the
> repo. When a repo gets big, you may not want to pull all of that data
> just to get the subtree you need.

Yeah, and depending on size, that can be a major problem. While git
_will_ let you make a shallow clone, it won't let you push from that,
so it's good only for read-only repositories (we use git to manage
software deployments at work - shallow clones are perfect) or for
working with patch files.

Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
Mercurial compress its content? A tar.gz of each comes down, but only
to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
already compressed. But 200MB for cpython seems like a lot.

Anyway, this problem is a good reason for dividing a repository up
into logically-separate parts. If you'll often want only one subtree,
maybe that shouldn't be a subtree of a monolithic repository.

ChrisA

rusi

unread,
Jun 15, 2013, 11:55:11 PM6/15/13
to
On Jun 16, 4:14 am, Chris Angelico <ros...@gmail.com> wrote:
> On Sun, Jun 16, 2013 at 12:16 AM, Roy Smith <r...@panix.com> wrote:
> > The advantage of DVCS is that everybody has a full copy of the repo.
> > The disadvantage of the DVCS is that every MUST have a full copy of the
> > repo.  When a repo gets big, you may not want to pull all of that data
> > just to get the subtree you need.
>
> Yeah, and depending on size, that can be a major problem. While git
> _will_ let you make a shallow clone, it won't let you push from that,
> so it's good only for read-only repositories (we use git to manage
> software deployments at work - shallow clones are perfect) or for
> working with patch files.
>
> Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
> Mercurial compress its content? A tar.gz of each comes down, but only
> to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
> already compressed. But 200MB for cpython seems like a lot.

[I am assuming that you have run "git gc --aggressive" before giving
those figures]

Your data would tell me that python is about twice as large a project
as pike in terms of number of commits. Isn't this a natural conclusion?

Chris Angelico

unread,
Jun 16, 2013, 12:13:13 AM6/16/13
to pytho...@python.org
On Sun, Jun 16, 2013 at 1:55 PM, rusi <rusto...@gmail.com> wrote:
> On Jun 16, 4:14 am, Chris Angelico <ros...@gmail.com> wrote:
>> On Sun, Jun 16, 2013 at 12:16 AM, Roy Smith <r...@panix.com> wrote:
>> > The advantage of DVCS is that everybody has a full copy of the repo.
>> > The disadvantage of the DVCS is that every MUST have a full copy of the
>> > repo. When a repo gets big, you may not want to pull all of that data
>> > just to get the subtree you need.
>>
>> Yeah, and depending on size, that can be a major problem. While git
>> _will_ let you make a shallow clone, it won't let you push from that,
>> so it's good only for read-only repositories (we use git to manage
>> software deployments at work - shallow clones are perfect) or for
>> working with patch files.
>>
>> Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
>> Mercurial compress its content? A tar.gz of each comes down, but only
>> to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
>> already compressed. But 200MB for cpython seems like a lot.
>
> [I am assuming that you have run "git gc --aggressive" before giving
> those figures]

They're both clones done for the purpose of building, so I hadn't run
any sort of garbage collect.

> Your data would tell me that python is about twice as large a project
> as pike in terms of number of commits. Isn't this a natural conclusion?

I didn't think there would be that much difference, tbh. Mainly, I'm
just seeing cpython as not being 200MB of history, or so I'd thought.
Pike has ~30K commits (based on 'git log --oneline|wc -l'); CPython
has roughly 80K (based on 'hg log|grep changeset|wc -l' - there's
likely an easier way but I don't know Mercurial). So yeah, okay, it's
been doing more. But I still don't see 200MB in that. Seems a lot of
content.

ChrisA

Steven D'Aprano

unread,
Jun 16, 2013, 1:20:29 AM6/16/13
to
On Sun, 16 Jun 2013 14:13:13 +1000, Chris Angelico wrote:

> I didn't think there would be that much difference, tbh. Mainly, I'm
> just seeing cpython as not being 200MB of history, or so I'd thought.
> Pike has ~30K commits (based on 'git log --oneline|wc -l'); CPython has
> roughly 80K (based on 'hg log|grep changeset|wc -l' - there's likely an
> easier way but I don't know Mercurial). So yeah, okay, it's been doing
> more. But I still don't see 200MB in that. Seems a lot of content.

If you're bringing in the *entire* CPython code base, as shown here:

http://hg.python.org/

keep in mind that it includes the equivalent of four independent
implementations:

- CPython 2.x
- CPython 3.x
- Stackless
- Jython


plus various other bits and pieces.


Plus, no offence intended at Pike which I'm sure is an awesome language,
but it may not be quite as much active development as Python... as you
point out yourself, there are nearly three times as many commits to
CPython as to Pike, which coincidentally (or not) corresponds to the
CPython repo being nearly three times as large as the Pike repo.



--
Steven

Chris Angelico

unread,
Jun 16, 2013, 1:29:34 AM6/16/13
to pytho...@python.org
On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Sun, 16 Jun 2013 14:13:13 +1000, Chris Angelico wrote:
>
>> I didn't think there would be that much difference, tbh. Mainly, I'm
>> just seeing cpython as not being 200MB of history, or so I'd thought.
>> Pike has ~30K commits (based on 'git log --oneline|wc -l'); CPython has
>> roughly 80K (based on 'hg log|grep changeset|wc -l' - there's likely an
>> easier way but I don't know Mercurial). So yeah, okay, it's been doing
>> more. But I still don't see 200MB in that. Seems a lot of content.
>
> If you're bringing in the *entire* CPython code base, as shown here:
>
> http://hg.python.org/
>
> keep in mind that it includes the equivalent of four independent
> implementations:
>
> - CPython 2.x
> - CPython 3.x
> - Stackless
> - Jython

Hrm. Why are there other Pythons in the cpython repository? Yes,
CPython 2.x and 3.x, but why the other two?

> Plus, no offence intended at Pike which I'm sure is an awesome language,
> but it may not be quite as much active development as Python... as you
> point out yourself, there are nearly three times as many commits to
> CPython as to Pike, which coincidentally (or not) corresponds to the
> CPython repo being nearly three times as large as the Pike repo.

Yeah. Actually, I suspect that what's going on here, and what led to
my confusion, is that Pike wasn't always done using git, so quite a
few of the earlier versions simply aren't here. So it's an error in my
perceptions rather than any real difference.

However, comparisons aside, 200MB is still a fair bit to fetch before
doing anything with Python. Does Mercurial have any equivalent of
git's shallow clone feature?

ChrisA

Terry Reedy

unread,
Jun 16, 2013, 5:15:43 AM6/16/13
to pytho...@python.org
On 6/16/2013 1:29 AM, Chris Angelico wrote:
> On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano

>> If you're bringing in the *entire* CPython code base, as shown here:
>>
>> http://hg.python.org/

This is the python.org collection of repositories, not just cpython.

>> keep in mind that it includes the equivalent of four independent
>> implementations:
>>
>> - CPython 2.x
>> - CPython 3.x

>> - Stackless
>> - Jython
>
> Hrm. Why are there other Pythons in the cpython repository?

There are not. The cpython repository
http://hg.python.org/cpython/
only contains cpython. As I write, the last revision is 84110. Windows
says that my cpython clone has about 1400 folders, 15000 files, and 500
million bytes

--
Terry Jan Reedy

Chris Angelico

unread,
Jun 16, 2013, 5:51:51 AM6/16/13
to pytho...@python.org
On Sun, Jun 16, 2013 at 7:15 PM, Terry Reedy <tjr...@udel.edu> wrote:
> On 6/16/2013 1:29 AM, Chris Angelico wrote:
>>
>> On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano
>>> keep in mind that it includes the equivalent of four independent
>>> implementations:
>>>
>>> - CPython 2.x
>>> - CPython 3.x
>
>
>>> - Stackless
>>> - Jython
>>
>>
>> Hrm. Why are there other Pythons in the cpython repository?
>
>
> There are not. The cpython repository
> http://hg.python.org/cpython/
> only contains cpython. As I write, the last revision is 84110. Windows says
> that my cpython clone has about 1400 folders, 15000 files, and 500 million
> bytes

Ah, well it's this one that I have. So it should have only CPython in it.

ChrisA

Chris “Kwpolska” Warrick

unread,
Jun 16, 2013, 9:30:05 AM6/16/13
to Chris Angelico, pytho...@python.org
On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico <ros...@gmail.com> wrote:
> Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
> Mercurial compress its content? A tar.gz of each comes down, but only
> to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
> already compressed. But 200MB for cpython seems like a lot.

Next time, do a more fair comparison.

I created an empty git and hg repository, and created a file promptly
named “file” with DIGIT ONE (0x31; UTF-8/ASCII–encoded) and commited
it with “c1” as the message, then I turned it into “12” and commited
as “c2” and did this one more time, making the file “123” at commit
named “c3”.

[kwpolska@kwpolska-lin .hg@default]% cat * */* */*/* 2>/dev/null | wc -c
1481
[kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* 2>/dev/null | wc -c
16860 ← WRONG!

There is just one problem with this: an empty git repository starts at
15216 bytes, due to some sample hooks. Let’s remove them and try
again:

[kwpolska@kwpolska-lin .git@master]% rm hooks/*
[kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* */*/*/*
2>/dev/null | wc -c
2499

which is a much more sane number. This includes a config file (in the
ini/configparser format) and such. According to my maths skils (or
rather zsh’s skills), new commits are responsible for 1644 bytes in
the git repo and 1391 bytes in the hg repo.

(I’m using wc -c to count the bytes in all files there are. du is
unaccurate with files smaller than 4096 bytes.)

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail | always bottom-post
http://asciiribbon.org | http://caliburn.nl/topposting.html

Roy Smith

unread,
Jun 16, 2013, 9:50:19 AM6/16/13
to
In article <mailman.3442.1371389...@python.org>,
Chris メKwpolskaモ Warrick <kwpo...@gmail.com> wrote:

> (I’m using wc -c to count the bytes in all files there are. du is
> unaccurate with files smaller than 4096 bytes.)

It's not that du is not accurate, it's that it's measuring something
different. It's measuring how much disk space the file is using. For
most files, that's the number of characters in the file rounded up to a
full block. For large files, I believe it also includes the overhead of
indirect blocks or extent trees. And, finally, for sparse files, it
takes into account that some logical blocks in the file may not be
mapped to any physical storage.

So, whether you want to use "du" or "wc -c" depends on what you're
trying to measure. If you want to know how much disk space you're
using, du is the right tool. If you want to know how much data will be
transmitted if the file is serialized (i.e. packed in a tarball or sent
via a "{hg,git} clone" operation), then "wc-c" is what you want.

All that being said, for the vast majority of cases (and I would be
astonished if this was not true for any real-life vcs repo), the
difference between what wc and du tell you is not worth worrying about.
And du is going to be a heck of a lot faster.

Lele Gaifax

unread,
Jun 16, 2013, 11:48:31 AM6/16/13
to pytho...@python.org
Roy Smith <r...@panix.com> writes:

> In article <mailman.3442.1371389...@python.org>,
> Chris Kwpolska Warrick <kwpo...@gmail.com> wrote:
>
>> (I�™m using wc -c to count the bytes in all files there are. du is
>> unaccurate with files smaller than 4096 bytes.)
>
> It's not that du is not accurate, it's that it's measuring something
> different. It's measuring how much disk space the file is using. For
> most files, that's the number of characters in the file rounded up to a
> full block.

I think “du -c” emits a number very close to “wc -c”.

Jason Swails

unread,
Jun 16, 2013, 12:39:30 PM6/16/13
to Chris “Kwpolska” Warrick, python list
This is not a fair comparison, either.  If we want to do a fair comparison pertinent to this discussion, let's convert the cpython mercurial repository into a git repository and allow the git repo to repack the diffs the way it deems fit.

I'm using the git-remote-hg.py script [https://github.com/felipec/git/blob/fc/master/contrib/remote-helpers/git-remote-hg.py] to clone a mercurial repo into a native git repo.  Then, in one of the rare cases, using git gc --aggressive. [1]

The result:

Git:
cpython_git/.git $ du -h --max-depth=1
40K ./hooks
145M ./objects
20K ./logs
24K ./refs
24K ./info
146M .

Mercurial:
cpython/.hg $ du -h --max-depth=1
209M ./store
20K ./cache
209M .


And to help illustrate the equivalence of the two repositories:

Git:

cpython_git $ git log | head; git log | tail

commit 78f82bde04f8b3832f3cb6725c4bd9c8d705d13b
Author: Brett Cannon <br...@python.org>
Date:   Sat Jun 15 23:24:11 2013 -0400

    Make test_builtin work when executed directly

commit a7b16f8188a16905bbc1d49fe6fd940078dd1f3d
Merge: 346494a af14b7c
Author: Gregory P. Smith <gr...@krypto.org>
Date:   Sat Jun 15 18:14:56 2013 -0700
Author: Guido van Rossum <gu...@python.org>
Date:   Mon Sep 10 11:15:23 1990 +0000

    Warning about incompleteness.

commit b5e5004ae8f54d7d5ddfa0688fc8385cafde0e63
Author: Guido van Rossum <gu...@python.org>
Date:   Thu Aug 9 14:25:15 1990 +0000

    Initial revision

Mercurial:

cpython $ hg log | head; hg log | tail

changeset:   84163:5b90da280515
bookmark:    master
tag:         tip
user:        Brett Cannon <br...@python.org>
date:        Sat Jun 15 23:24:11 2013 -0400
summary:     Make test_builtin work when executed directly

changeset:   84162:7dee56b6ff34
parent:      84159:5e8b377942f7
parent:      84161:7e06a99bb821
user:        Guido van Rossum <gu...@python.org>
date:        Mon Sep 10 11:15:23 1990 +0000
summary:     Warning about incompleteness.

changeset:   0:3cd033e6b530
branch:      legacy-trunk
user:        Guido van Rossum <gu...@python.org>
date:        Thu Aug 09 14:25:15 1990 +0000
summary:     Initial revision

They both appear to have the same history.  In this particular case, it seems that git does a better job in terms of space management, probably due to the fact that it doesn't store duplicate copies of identical source code that appears in different files (it tracks content, not files).

That being said, from what I've read both git and mercurial have their advantages, both in the performance arena and the features/usability arena (I only know how to really use git).  I'd certainly take a DVCS over a centralized model any day.

All the best,
Jason

[1] I know I just posted in this thread about --aggressive being bad, but the packing from the translation was horrible --> the translated git repo was ~2 GB in size.  An `aggressive' repacking was necessary to allow git to decide how to pack the diffs.

Terry Reedy

unread,
Jun 16, 2013, 1:02:42 PM6/16/13
to pytho...@python.org
On 6/16/2013 11:48 AM, Lele Gaifax wrote:
> Roy Smith <r...@panix.com> writes:
>
>> In article <mailman.3442.1371389...@python.org>,
>> Chris Kwpolska Warrick <kwpo...@gmail.com> wrote:
>>
>>> (I��m using wc -c to count the bytes in all files there are. du is
>>> unaccurate with files smaller than 4096 bytes.)
>>
>> It's not that du is not accurate, it's that it's measuring something
>> different. It's measuring how much disk space the file is using. For
>> most files, that's the number of characters in the file rounded up to a
>> full block.
>
> I think “du -c” emits a number very close to “wc -c”.

In Windows Explorer, the Properties box displays both the Size and 'Size
on disk', in both (KB or MB) and bytes. The block size for the disk I am
looking at is 4KB, so the Size on disk in KB is a multiple of that.

--
Terry Jan Reedy


0 new messages