[PATCH 0 of 3] Author file + branch bookmark suffix proposals

40 views
Skip to first unread message

Mike Bayer

unread,
Dec 19, 2011, 12:19:38 PM12/19/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
Hi -

This patch contains the previously mentioned "author file" change,
as well as a potential enhancement regarding the translation of hg bookmarks
to git branches.

The motivation behind these changes regards my evaluation of a potential move
of my projects (SQLAlchemy, Mako Templates for Python) over to Github.

The two features are as follows:

1. an "authors file" option, specified as "authors=<somefile>" underneath
[git]. This file is consulted for translations when converting a mercurial
author name into a Git author name. In particular, lots of commits in
SQLAlchemy/Mako only have usernames, which show up in Github very unpleasantly
as "invalid email address". This filter allows them to be displayed correctly.
This is an issue I see raised a lot and I think this is an easy feature add.

2. A "branch/bookmark suffix". This feature is probably more controversial
than the author file. SQLAlchemy's hg repo uses named branches for maintenance
series - such as the rel_0_5 branch, the rel_0_6 branch. hg-git can't create a
git repo with these branch names, as only bookmark names can be used, and it
is not possible in mercurial to replace the branch names with same-named
bookmarks. As I really don't want these names to change (sorry, I'm very
picky), the "branch_bookmark_suffix" specification under the [git] section
will specify a suffix that will be stripped as hg bookmarks are converted to
branches on the git side. On the git->hg side, the suffix will be re-applied
to bookmark names if the incoming name has been detected as an existing named
branch.

The branch bookmark suffix appears to work, though I'm not planning on having
a lot of cross-motion between the two systems; we'll probably move to git and
that will be mostly it.

I'm sending these changes both to see if you had any comments or feedback
(such as, "this isn't going to work because you're forgetting...."), as well
as if either feature might be useful for the hg-git product itself.

thanks and thanks for the useful product !

- mike

Mike Bayer

unread,
Dec 19, 2011, 12:19:40 PM12/19/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
# HG changeset patch
# User Mike Bayer <mik...@zzzcomputing.com>
# Date 1324252456 18000
# Node ID 1032f0e6de584cc5de593d13b420d88e78180182
# Parent 22006dd106fcbc68809e764b37a62be409b9a418
- add "branch_bookmark_names" parameter. this allows bookmarks
that mimic a branchname to be maintained on the git side without
a particular suffix - e.g. if the hg repo had a branch "release_05",
and a bookmark created onto it "release_05_bookmark", the branch on the
git side would be named "release_05". When pulling branches back from
git, if an hg named branch of that name exists, the suffix is appended
back onto the name before creating a bookmark on the hg side.

This is strictly so that a git repo can be generated that has the
same "branch names" as an older hg repo that has named branches, and
has had bookmarks added in to mirror the branch names.
This is given the restrictions that
A. hg named branches can never be renamed and B. hg-git only supports
hg bookmarks, not branches

diff -r 22006dd106fc -r 1032f0e6de58 hggit/git_handler.py
--- a/hggit/git_handler.py Sun Dec 18 17:27:46 2011 -0500
+++ b/hggit/git_handler.py Sun Dec 18 18:54:16 2011 -0500
@@ -83,6 +83,8 @@
self.init_author_file()
self.paths = ui.configitems('paths')

+ self.branch_bookmark_suffix = ui.config('git', 'branch_bookmark_suffix')
+
self.load_map()
self.load_tags()

@@ -738,7 +740,10 @@
for rev in revs:
ctx = self.repo[rev]
if getattr(ctx, 'bookmarks', None):
- labels = lambda c: ctx.tags() + ctx.bookmarks()
+ labels = lambda c: ctx.tags() + [
+ fltr for fltr, bm
+ in self._filter_for_bookmarks(ctx.bookmarks())
+ ]
else:
labels = lambda c: ctx.tags()
prep = lambda itr: [i.replace(' ', '_') for i in itr]
@@ -856,13 +861,25 @@
self.git.refs['refs/tags/' + tag] = self.map_git_get(hex(sha))
self.tags[tag] = hex(sha)

+ def _filter_for_bookmarks(self, bms):
+ if not self.branch_bookmark_suffix:
+ return [(bm, bm) for bm in bms]
+ else:
+ def _filter_bm(bm):
+ if bm.endswith(self.branch_bookmark_suffix):
+ return bm[0:-(len(self.branch_bookmark_suffix))]
+ else:
+ return bm
+ return [(_filter_bm(bm), bm) for bm in bms]
+
def local_heads(self):
try:
if getattr(bookmarks, 'parse', None):
bms = bookmarks.parse(self.repo)
else:
bms = self.repo._bookmarks
- return dict([(bm, hex(bms[bm])) for bm in bms])
+ return dict([(filtered_bm, hex(bms[bm])) for
+ filtered_bm, bm in self._filter_for_bookmarks(bms)])
except AttributeError: #pragma: no cover
return {}

@@ -903,6 +920,7 @@
bms = bookmarks.parse(self.repo)
else:
bms = self.repo._bookmarks
+
heads = dict([(ref[11:],refs[ref]) for ref in refs
if ref.startswith('refs/heads/')])

@@ -920,6 +938,22 @@
if bm.ancestor(self.repo[hgsha]) == bm:
# fast forward
bms[head] = hgsha
+
+ # if there's a branch bookmark suffix,
+ # then add it on to all bookmark names
+ # that would otherwise conflict with a branch
+ # name
+ if self.branch_bookmark_suffix:
+ real_branch_names = self.repo.branchmap()
+ bms = dict(
+ (
+ bm_name + self.branch_bookmark_suffix
+ if bm_name in real_branch_names
+ else bm_name,
+ bms[bm_name]
+ )
+ for bm_name in bms
+ )
if heads:
if oldbm:
bookmarks.write(self.repo, bms)

Mike Bayer

unread,
Dec 19, 2011, 12:19:39 PM12/19/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
# HG changeset patch
# User Mike Bayer <mik...@zzzcomputing.com>
# Date 1324247266 18000
# Node ID 22006dd106fcbc68809e764b37a62be409b9a418
# Parent 32afa497834da2a702f14ff1100acfea3cd19cdc
- add an "authors" option to [git]. Specifies the name
of a text file which contains hg username/email->git username/email translations, one
per line. Each line is delimited on a tab character. Is most
useful when converting a Mercurial repository where not all usernames
have email addresses.

diff -r 32afa497834d -r 22006dd106fc hggit/git_handler.py
--- a/hggit/git_handler.py Fri Nov 11 16:25:56 2011 +0100
+++ b/hggit/git_handler.py Sun Dec 18 17:27:46 2011 -0500
@@ -80,6 +80,7 @@
else:
self.gitdir = self.repo.join('git')

+ self.init_author_file()


self.paths = ui.configitems('paths')

self.load_map()
@@ -93,6 +94,18 @@
os.mkdir(self.gitdir)
self.git = Repo.init_bare(self.gitdir)

+ def init_author_file(self):
+ self.author_map = {}
+ if self.ui.config('git', 'authors'):
+ with open(self.repo.wjoin(
+ self.ui.config('git', 'authors')
+ )) as f:
+ for line in f:
+ if "\t" not in line:
+ continue
+ from_, to = line.split("\t", 2)
+ self.author_map[from_] = to
+
## FILE LOAD AND SAVE METHODS

def map_set(self, gitsha, hgsha):
@@ -356,6 +369,10 @@
# hg authors might not have emails
author = ctx.user()

+ # see if a translation exists
+ if author in self.author_map:
+ author = self.author_map[author]
+
# check for git author pattern compliance
regex = re.compile('^(.*?) ?\<(.*?)(?:\>(.*))?$')
a = regex.match(author)

Mike Bayer

unread,
Dec 19, 2011, 12:19:41 PM12/19/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
# HG changeset patch
# User Mike Bayer <mik...@zzzcomputing.com>
# Date 1324313330 18000
# Node ID 1bce24c5534ff6d505de3e306d550cbcd1bde158
# Parent 1032f0e6de584cc5de593d13b420d88e78180182
document authors and branch_bookmark_suffix

diff -r 1032f0e6de58 -r 1bce24c5534f README.md
--- a/README.md Sun Dec 18 18:54:16 2011 -0500
+++ b/README.md Mon Dec 19 11:48:50 2011 -0500
@@ -149,3 +149,63 @@

[git]
intree = True
+
+git.authors
+-----------
+
+Git uses a strict convention for "author names" when representing changesets,
+using the form `[realname] [email address]`. Mercurial encourages this
+convention as well but is not as strict, so it's not uncommon for a Mercurial
+repo to have authors listed as simple usernames. hg-git by default will
+translate such names using the email address `none@none`, which then shows up
+unpleasantly on GitHub as "illegal email address".
+
+The `git.authors` option provides for an "authors translation file" that will
+be used during outgoing transfers from mercurial to git only, by modifying
+`hgrc` as such:
+
+ [git]
+ authors = authors.txt
+
+Where `authors.txt` is the name of a text file containing author name translations,
+one per each line, with the Mercurial username to Git name separated by tabs:
+
+ johnny John Smith <jsm...@foo.com>
+ dougie Doug Johnson <dou...@bar.com>
+
+It should be noted that **this translation is on the hg->git side only**. Changesets
+coming from Git back to Mercurial will not translate back into hg usernames, so
+it's best that the same username/email combination be used on both the hg and git sides;
+the author file is mostly useful for translating legacy changesets.
+
+git.branch_bookmark_suffix
+---------------------------
+
+hg-git currently does not recognize Mercurial named branches; it only supports Mercurial
+bookmarks. Therefore, when translating an hg repo over to git, you typically need
+to create bookmarks to mirror all the named branches that you'd like to see transferred
+over to git. The major caveat with this is that you can't use the same name for your
+bookmark as that of the named branch, and furthermore there's no feasible way to rename
+a branch in Mercurial. For the use case where one would like to transfer an hg
+repo over to git, and maintain the same named branches as are present on the hg side,
+the `branch_bookmark_suffix` might be all that's needed. This presents a string
+"suffix" that will be recognized on each bookmark name, and stripped off as the
+bookmark is translated to a git branch:
+
+ [git]
+ branch_bookmark_suffix=_bookmark
+
+Above, if an hg repo had a named branch called `release_6_maintenance`, you could
+then link it to a bookmark called `release_6_maintenance_bookmark`. hg-git will then
+strip off the `_bookmark` suffix from this bookmark name, and create a git branch
+called `release_6_maintenance`. When pulling back from git to hg, the `_bookmark`
+suffix is then applied back, if and only if an hg named branch of that name exists.
+E.g., when changes to the `release_6_maintenance` branch are checked into git, these
+will be placed into the `release_6_maintenance_bookmark` bookmark on hg. But if a
+new branch called `release_7_maintenance` were pulled over to hg, and there was
+not a `release_7_maintenance` named branch already, the bookmark will be named
+`release_7_maintenance` with no usage of the suffix.
+
+The `branch_bookmark_suffix` option is, like the `authors` option, intended for
+migrating legacy hg named branches. Going forward, an hg repo that is to
+be linked with a git repo should only use bookmarks for named branching.
\ No newline at end of file

Augie Fackler

unread,
Dec 23, 2011, 10:22:04 PM12/23/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
On Dec 19, 2011, at 11:19 AM, Mike Bayer wrote:
>
> # HG changeset patch
> # User Mike Bayer <mik...@zzzcomputing.com>
> # Date 1324247266 18000
> # Node ID 22006dd106fcbc68809e764b37a62be409b9a418
> # Parent 32afa497834da2a702f14ff1100acfea3cd19cdc
> - add an "authors" option to [git]. Specifies the name
> of a text file which contains hg username/email->git username/email translations, one
> per line. Each line is delimited on a tab character. Is most
> useful when converting a Mercurial repository where not all usernames
> have email addresses.

hgsubversion and convert already have a common format for this. Can I persuade you to use that format instead of defining a new one?

augie% hg help convert | grep authormap --after 10 | head -n 9
The authormap is a simple text file that maps each source commit author to
a destination commit author. It is handy for source SCMs that use unix
logins to identify authors (eg: CVS). One line per author mapping and the
line format is:

source author = destination author

Empty lines and lines starting with a "#" are ignored.


Other than that complaint, I like this patch.

Augie Fackler

unread,
Dec 23, 2011, 10:23:17 PM12/23/11
to hg-...@googlegroups.com
On Dec 19, 2011, at 11:19 AM, Mike Bayer wrote:
>


Queued this one, thanks.

Augie Fackler

unread,
Dec 23, 2011, 10:25:25 PM12/23/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
On Dec 19, 2011, at 11:19 AM, Mike Bayer wrote:
>


Good docs, but I obviously can't queue it as-is. If need be, we can split it, but hopefully you're amenable to my request on patch 1 and we can just take the series as-is at that point.

Thanks!

Augie Fackler

unread,
Dec 23, 2011, 10:25:56 PM12/23/11
to hg-...@googlegroups.com, mik...@zzzcomputing.com
On Dec 19, 2011, at 11:19 AM, Mike Bayer wrote:
>

Looks pretty good, thanks. I had a couple of comments, and then I'll push the lot.

Michael Bayer

unread,
Dec 24, 2011, 2:20:24 AM12/24/11
to Augie Fackler, hg-...@googlegroups.com

On Dec 23, 2011, at 10:22 PM, Augie Fackler wrote:

> On Dec 19, 2011, at 11:19 AM, Mike Bayer wrote:
>>
>> # HG changeset patch
>> # User Mike Bayer <mik...@zzzcomputing.com>
>> # Date 1324247266 18000
>> # Node ID 22006dd106fcbc68809e764b37a62be409b9a418
>> # Parent 32afa497834da2a702f14ff1100acfea3cd19cdc
>> - add an "authors" option to [git]. Specifies the name
>> of a text file which contains hg username/email->git username/email translations, one
>> per line. Each line is delimited on a tab character. Is most
>> useful when converting a Mercurial repository where not all usernames
>> have email addresses.
>
> hgsubversion and convert already have a common format for this. Can I persuade you to use that format instead of defining a new one?
>
> augie% hg help convert | grep authormap --after 10 | head -n 9
> The authormap is a simple text file that maps each source commit author to
> a destination commit author. It is handy for source SCMs that use unix
> logins to identify authors (eg: CVS). One line per author mapping and the
> line format is:
>
> source author = destination author
>
> Empty lines and lines starting with a "#" are ignored.
>
>
> Other than that complaint, I like this patch.

Sure, the format is absolutely fine. Still OK that the usage/name of the "authors" file itself is inside of hgrc, or would this be via command line option (-A ?)


Augie Fackler

unread,
Dec 24, 2011, 10:16:26 AM12/24/11
to Michael Bayer, hg-...@googlegroups.com

Both seems reasonable to me, that's what we've got on hgsubversion. I'm fine with just an option in hgrc though.

Reply all
Reply to author
Forward
0 new messages