How are Change Id's generated ?

1,948 views
Skip to first unread message

Olivier Croquette

unread,
Dec 18, 2013, 10:25:54 AM12/18/13
to repo-d...@googlegroups.com
Hi,

I am writing a commit-msg hook in Groovy that does some useful stuff for our team. Now I need to integrate the generation of the "Change-Id" to it. I have 2 options:
either call the "official" shell-based commit-msg hook from Groovy, or generate the change-id directly from Groovy.

Doing it in Groovy is more elegant, therefore this is my preferred solution, however, it's not trivial to reverse-engineer the existing shell script. II didn't find either any tests to validate a potential new implementation.

So here are comes questions:
1) is the Change-Id generation process documented in any other form than a shell script?
2) are some other implementations available ?
3) couldn't the change-id just be generated randomly ? would it break anything ? will kittens be killed ?

Thanks in advance for your help!

Olivier

Shawn Pearce

unread,
Dec 18, 2013, 10:33:11 AM12/18/13
to Olivier Croquette, repo-discuss
On Wed, Dec 18, 2013 at 7:25 AM, Olivier Croquette <ocroq...@free.fr> wrote:
>
> Hi,
>
> I am writing a commit-msg hook in Groovy that does some useful stuff for our team. Now I need to integrate the generation of the "Change-Id" to it. I have 2 options:
> either call the "official" shell-based commit-msg hook from Groovy, or generate the change-id directly from Groovy.
>
> Doing it in Groovy is more elegant, therefore this is my preferred solution, however, it's not trivial to reverse-engineer the existing shell script. II didn't find either any tests to validate a potential new implementation.
>
> So here are comes questions:
> 1) is the Change-Id generation process documented in any other form than a shell script?
> 2) are some other implementations available ?

JGit has one, see
https://eclipse.googlesource.com/jgit/jgit/+/HEAD/org.eclipse.jgit/src/org/eclipse/jgit/util/ChangeIdUtil.java

> 3) couldn't the change-id just be generated randomly ? would it break anything ? will kittens be killed ?

Please don't do it randomly. We use SHA-1 here to give us a
pseudo-random values that are unlikely to have collisions, so long as
Git commits in that same repository won't have collisions.

Olivier Croquette

unread,
Dec 18, 2013, 11:20:05 AM12/18/13
to repo-d...@googlegroups.com, Olivier Croquette
On Wednesday, December 18, 2013 4:33:11 PM UTC+1, Shawn Pearce wrote:
> 2) are some other implementations available ?

JGit has one, see
https://eclipse.googlesource.com/jgit/jgit/+/HEAD/org.eclipse.jgit/src/org/eclipse/jgit/util/ChangeIdUtil.java

Thanks, Shawn, that's very useful.
 

> 3) couldn't the change-id just be generated randomly ? would it break anything ? will kittens be killed ?

Please don't do it randomly. We use SHA-1 here to give us a
pseudo-random values that are unlikely to have collisions, so long as
Git commits in that same repository won't have collisions.

The statement about the collisions is more an argument for not using the metadata, because there are corner cases producing collisions.
For instance, when you have just branched off "b" from "master", and you commit the same tree for both "b" and "master" locally, you will end up with the same Change-Id.

Anyway, if it's just about to get a pseudo-random value, why should it be based on some real metadata ?
What will happen concretely if it's not the case ? I am just curious.

Olivier


Shawn Pearce

unread,
Dec 18, 2013, 11:25:38 AM12/18/13
to Olivier Croquette, repo-discuss
On Wed, Dec 18, 2013 at 8:20 AM, Olivier Croquette <ocroq...@free.fr> wrote:
> On Wednesday, December 18, 2013 4:33:11 PM UTC+1, Shawn Pearce wrote:
>>
>> > 2) are some other implementations available ?
>>
>> JGit has one, see
>>
>> https://eclipse.googlesource.com/jgit/jgit/+/HEAD/org.eclipse.jgit/src/org/eclipse/jgit/util/ChangeIdUtil.java
>
>
> Thanks, Shawn, that's very useful.
>
>>
>>
>> > 3) couldn't the change-id just be generated randomly ? would it break
>> > anything ? will kittens be killed ?
>>
>> Please don't do it randomly. We use SHA-1 here to give us a
>> pseudo-random values that are unlikely to have collisions, so long as
>> Git commits in that same repository won't have collisions.
>
>
> The statement about the collisions is more an argument for not using the
> metadata, because there are corner cases producing collisions.
> For instance, when you have just branched off "b" from "master", and you
> commit the same tree for both "b" and "master" locally, you will end up with
> the same Change-Id.

If the first parent differs, its a different Change-Id.

If the first parent is the same, sure, its the same Change-Id. But at
that point its basically the same change, on the same parent, so sure,
same Change-Id.

Gerrit uses a triplet of (project, branch, Change-Id) to name a
change. So same change on different branches can use the same
Change-Id.

> Anyway, if it's just about to get a pseudo-random value, why should it be
> based on some real metadata ?

One could say the same thing about commit names in Git. Instead of
SHA-1s maybe they should just be randomly generated numbers...

Jonathan Nieder

unread,
Dec 18, 2013, 11:31:25 AM12/18/13
to Shawn Pearce, Olivier Croquette, repo-discuss
I guess generating the change-ids deterministically might be handy for
reproducible tests in some case, and it's also good protection against
the case of a no good random seed. Otherwise, I have the same question
as Olivier --- what would be the harm in a truly random change-id?

Commit names are special because there is no central server giving
meaning to them and because people rely on the cryptographic property
that a commit uniquely identifies the history that precedes it (when
using signed tags, for example). Change-ids can't have that property
because they are maintained when making a new version of the same
change.

Shawn Pearce

unread,
Dec 18, 2013, 11:43:35 AM12/18/13
to Jonathan Nieder, Olivier Croquette, repo-discuss
On Wed, Dec 18, 2013 at 8:31 AM, Jonathan Nieder <j...@google.com> wrote:
> Shawn Pearce wrote:
>> On Wed, Dec 18, 2013 at 8:20 AM, Olivier Croquette <ocroq...@free.fr> wrote:
>
>>> Anyway, if it's just about to get a pseudo-random value, why should it be
>>> based on some real metadata ?
>>
>> One could say the same thing about commit names in Git. Instead of
>> SHA-1s maybe they should just be randomly generated numbers...
>
> I guess generating the change-ids deterministically might be handy for
> reproducible tests in some case, and it's also good protection against
> the case of a no good random seed. Otherwise, I have the same question
> as Olivier --- what would be the harm in a truly random change-id?

Yes, I think you guys are correct. A random Change-Id is just as good
as any other Change-Id as far as Gerrit is concerned. A bad random
seeding process could lead two clients to produce the same Change-Id
at different times and have a conflict in the Gerrit server. We
currently bet that the same user name, same user email, with the same
commit message, and same file contents, and same first parent
commit... won't ever happen unless the change is in fact identical. We
also bet someone won't forge another person's identity unless well,
they are trying to forge their identity, so we don't need to worry
about getting unique random sequences between users.

A proper UUID would work just as well for Change-Id as Change-Id does.
A badly created random string wouldn't.

> Commit names are special because there is no central server giving
> meaning to them and because people rely on the cryptographic property
> that a commit uniquely identifies the history that precedes it (when
> using signed tags, for example). Change-ids can't have that property
> because they are maintained when making a new version of the same
> change.

Well, commits could have been randomly named with a SHA-1 hash of the
data wrapped around them. But I see your point.

Olivier Croquette

unread,
Dec 18, 2013, 11:44:03 AM12/18/13
to repo-d...@googlegroups.com, Olivier Croquette
On Wednesday, December 18, 2013 5:25:38 PM UTC+1, Shawn Pearce wrote:
If the first parent is the same, sure, its the same Change-Id. But at
that point its basically the same change, on the same parent, so sure,
same Change-Id.

Well, from a user perspective, it's not the same change. Or better said: the user will wonder why Gerrit thinks it's the same change when he tries to push it to different branches or repositories on the server. He will then have to change something (like the commit message) and push again. It's a corner case for sure, but it's not nice.
 
Gerrit uses a triplet of (project, branch, Change-Id) to name a
change. So same change on different branches can use the same
Change-Id.

I am not sure what you mean. If the commit has a "Change-Id" line, and Gerrit finds a corresponding change in its database, it will always attach the commit to the existing change (rejecting it if it's the same commit ID), AFAIK.

> Anyway, if it's just about to get a pseudo-random value, why should it be
> based on some real metadata ?

One could say the same thing about commit names in Git. Instead of
SHA-1s maybe they should just be randomly generated numbers...

Well, Git is a distributed VCS, and the commit id describes a snapshot of content, so it's understandable to have a deterministic behavior.
Gerrit uses the Change-Id to group the patch sets that belong together. There is no added value if it's deterministic, the only desired property is the lack of collisions.

I see the "nice-to-have" argument to use the same concept in Gerrit as in Git though.

Shawn Pearce

unread,
Dec 18, 2013, 11:50:21 AM12/18/13
to Olivier Croquette, repo-discuss
On Wed, Dec 18, 2013 at 8:44 AM, Olivier Croquette <ocroq...@free.fr> wrote:
> On Wednesday, December 18, 2013 5:25:38 PM UTC+1, Shawn Pearce wrote:
>>
>> If the first parent is the same, sure, its the same Change-Id. But at
>> that point its basically the same change, on the same parent, so sure,
>> same Change-Id.
>
>
> Well, from a user perspective, it's not the same change. Or better said: the
> user will wonder why Gerrit thinks it's the same change when he tries to
> push it to different branches or repositories on the server. He will then
> have to change something (like the commit message) and push again. It's a
> corner case for sure, but it's not nice.

You obviously didn't read or understand my message.

>> Gerrit uses a triplet of (project, branch, Change-Id) to name a
>> change. So same change on different branches can use the same
>> Change-Id.

^^

> I am not sure what you mean. If the commit has a "Change-Id" line, and
> Gerrit finds a corresponding change in its database, it will always attach
> the commit to the existing change (rejecting it if it's the same commit ID),
> AFAIK.

I'm tired of this discussion. If you think you have a more reliable
Change-Id generator that is simpler to implement than the one Gerrit
and JGit use, apparently you can do so, because we have concluded the
string doesn't matter.

Thomas Swindells (tswindel)

unread,
Dec 18, 2013, 11:53:05 AM12/18/13
to Shawn Pearce, Olivier Croquette, repo-discuss
[Thomas Swindells] From my memory hasn't the behaviour has changed/been fixed over different versions of Gerrit?
I thought at one point it was overly strict over how change-ids could be re-used as the poster states.

Shawn Pearce

unread,
Dec 18, 2013, 12:10:49 PM12/18/13
to Thomas Swindells (tswindel), Olivier Croquette, repo-discuss
Years ago they were unique either to the server or to the project, I
don't remember which. We realized this was a mistake and relaxed it to
also include the branch name. Cherry picks across branches were
reusing the existing Change-Id and users didn't want to run `git
commit --amend` just to delete the Change-Id line and have a new one
created.

So if you are still using a version of Gerrit that doesn't include the
branch name when searching for matching changes by Change-Id, I am
very sorry for you. That is a horribly ancient version and we have
fixed quite a few bugs and usability issues in the 3 or 4 years since
that version was released.

Olivier Croquette

unread,
Dec 19, 2013, 4:34:28 AM12/19/13
to repo-d...@googlegroups.com, Thomas Swindells (tswindel), Olivier Croquette
OK, sorry, I was not aware that Change-Id's can now be reused for different repositories or branches. I based my assumptions on my experience with previous versions and the current documentation, which are both outdated. The fact that Change-Id's are not unique anymore is also kind of counter-intuitive.

About the documentation, I propose the following change to bring it up-to-date:
https://gerrit-review.googlesource.com/#/c/53330/

I didn't generate the HTML yet because I don't know how to do it, but comments on the content are welcome.

Does anyone know when this change has been introduced ? I can also add this information to the documentation. I couldn't find it after a quick check in the release notes of the major releases.

About the use-case "Pushing the same change for different branches", it's OK from a Change-Id perspective, but the following check still gets in the way:
 ! [remote rejected] HEAD -> refs/publish/master2 (no new changes)
error: failed to push some refs to 'ssh://gerrit/repository'

This check should also be made branch specific, shouldn't it ?

Here is the protocol of the use case:
$ git log -1
commit c09a04b49cfb83801e64e1c142e51457c4328560
Author: Olivier Croquette <ocroq...@free.fr>
Date:   Thu Dec 19 09:18:54 2013 +0100

    Testing multiple uploads with the same Change-Id

    Change-Id: I6ae866873d590c61d3f921c5a633ba24fdf93ce9

$ git push origin HEAD:refs/publish/master
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 369 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2)
remote: Processing changes: new: 1, refs: 1, done
remote:
remote: New Changes:
remote:   https://gerrit/59
remote:
To ssh://gerrit/repository
 * [new branch]      HEAD -> refs/publish/master

$ git push origin HEAD:refs/publish/master2
Total 0 (delta 0), reused 0 (delta 0)
remote: Processing changes: refs: 1, done
To ssh://gerrit/repository
 ! [remote rejected] HEAD -> refs/publish/master2 (no new changes)
error: failed to push some refs to 'ssh://gerrit/repository'

$ git commit --amend

$ git log -1
commit 94dc6b16e048919e58f1e19696b882a4789b9bdf
Author: Olivier Croquette <ocroq...@free.fr>
Date:   Thu Dec 19 09:18:54 2013 +0100

    Testing multiple uploads with the same Change-Id

    Dummy line to get a new commit id

    Change-Id: I6ae866873d590c61d3f921c5a633ba24fdf93ce9

$ git push origin HEAD:refs/publish/master2
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 395 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2)
remote: Processing changes: new: 1, refs: 1, done
remote:
remote: New Changes:
remote:   https://gerrit/60
remote:
To ssh://gerrit/repository
 * [new branch]      HEAD -> refs/publish/master2






David Pursehouse

unread,
Dec 19, 2013, 4:42:41 AM12/19/13
to Olivier Croquette, repo-d...@googlegroups.com, Thomas Swindells (tswindel)
On 12/19/2013 06:34 PM, Olivier Croquette wrote:
> OK, sorry, I was not aware that Change-Id's can now be reused for
> different repositories or branches. I based my assumptions on my
> experience with previous versions and the current documentation, which
> are both outdated. The fact that Change-Id's are not unique anymore is
> also kind of counter-intuitive.
>
> About the documentation, I propose the following change to bring it
> up-to-date:
> https://gerrit-review.googlesource.com/#/c/53330/
>
This change is not visible to me. Have you uploaded it as a draft?

> I didn't generate the HTML yet because I don't know how to do it, but
> comments on the content are welcome.
>
To build only the docs:

$ buck build docs

or to build gerrit including the docs

$ buck build withdocs

Check the buck documentation (dev-buck.txt) for more info.

> Does anyone know when this change has been introduced ? I can also add
> this information to the documentation. I couldn't find it after a quick
> check in the release notes of the major releases.
>

Possibly issue 635 [1] which was in Gerrit 2.1.7

[1] https://code.google.com/p/gerrit/issues/detail?id=635
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to repo-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Olivier Croquette

unread,
Dec 19, 2013, 4:54:34 AM12/19/13
to repo-d...@googlegroups.com, Olivier Croquette, Thomas Swindells (tswindel)
On Thursday, December 19, 2013 10:42:41 AM UTC+1, David Pursehouse wrote:
> About the documentation, I propose the following change to bring it
> up-to-date:
> https://gerrit-review.googlesource.com/#/c/53330/
 This change is not visible to me.  Have you uploaded it as a draft?

Yes, sorry. It's fixed.

To build only the docs:

Thanks!

Reply all
Reply to author
Forward
0 new messages