changeset-chooser: a prototype tool that allows you to create code review requests after push

51 views
Skip to first unread message

George Miroshnykov

unread,
Mar 4, 2014, 12:37:08 PM3/4/14
to mozilla-c...@googlegroups.com, Ehsan Akhgari, Taras Glek
Hey everyone,

As we've discussed before on IRC, I've been working on a service that should allow users to create code review requests based on the changesets pushed to the special "review" repository.

For now, the workflow looks like this:
$ hg push -f review
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

You can see the UI by opening the link above.
I've already created a review request for that changeset, so you'll have to push your own changesets to play around with it.

Ehsan should already have access to this repository, here's the steps for anyone interested:

1. Send me your GitHub username (or just some login / SSH public key).
2. Once you have an account, clone the playground repository:


3. Make some changes, commit and push (force push?).
4. After push, you should see the URL to changeset-chooser, open it.
5. Choose the changesets to create review requests by clicking the corresponding table rows.
6. Click the "Create review request" button.
7. You should see the links to review requests in the rightmost column.

The code of the changeset-chooser is available here:

I've made this prototype as a separate tool using Node.js, but it should be
trivial to port it to Django and/or integrate into RB as soon as we're
comfortable with the general approach.

So the next roadblock I have is a problem with creating squashed review requests.
I couldn't find a way to squash arbitrary changesets without touching files
in the working repository.

But that would require some kind of exclusive lock on the repository
(so no simultaneous pushes from different users)
and would generally complicate things a lot.

Any suggestions on how to approach this are much appreciated.

Thanks,
George

Gregory Szorc

unread,
Mar 4, 2014, 1:57:37 PM3/4/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Ehsan Akhgari, Taras Glek
On 3/4/14, 9:37 AM, George Miroshnykov wrote:
> Hey everyone,
>
> As we've discussed before on IRC, I've been working on a service that
> should allow users to create code review requests based on the
> changesets pushed to the special "review" repository.
>
> For now, the workflow looks like this:
> $ hg push -f review
> pushing to ssh://review.infinity.com.ua//var/hg/playground-central
> searching for changes
> remote: adding changesets
> remote: adding manifests
> remote: adding file changes
> remote: added 1 changesets with 1 changes to 1 files
> remote: choose changesets to review:
> http://review.infinity.com.ua:5000/?dbb8d013a3f3c32f0bdd4373e128f8cab11c9cd2
>
> You can see the UI by opening the link above.
> I've already created a review request for that changeset, so you'll have
> to push your own changesets to play around with it.
>
> Ehsan should already have access to this repository, here's the steps
> for anyone interested:
>
> 1. Send me your GitHub username (or just some login / SSH public key).

IT was able to provide me a dump of all SSH keys for Mozillians in bug
928461. If you file a bug, they can probably do the same for you.

> 2. Once you have an account, clone the playground repository:
>
> hg clone ssh://user...@review.infinity.com.ua//var/hg/playground-central
>
> 3. Make some changes, commit and push (force push?).

FWIW, with Mercurial 2.9 and later, if you push a bookmark (hg push -B),
you don't need push -f.

If you are writing a Mercurial extension that does the pushing, you can
pass an argument to localrepo.push so the new heads warning/push
requirement isn't an issue.

> 4. After push, you should see the URL to changeset-chooser, open it.
> 5. Choose the changesets to create review requests by clicking the
> corresponding table rows.
> 6. Click the "Create review request" button.

Hopefully we can eventually do this client-side in the Mercurial client.
Please ensure there is an API for this so users don't need to open a
browser to initiate code review.

> 7. You should see the links to review requests in the rightmost column.
>
> The code of the changeset-chooser is available here:
> https://github.com/laggyluke/changeset-chooser
>
> I've made this prototype as a separate tool using Node.js, but it should be
> trivial to port it to Django and/or integrate into RB as soon as we're
> comfortable with the general approach.
>
> So the next roadblock I have is a problem with creating squashed review
> requests.
> I couldn't find a way to squash arbitrary changesets without touching files
> in the working repository.
>
> But that would require some kind of exclusive lock on the repository
> (so no simultaneous pushes from different users)
> and would generally complicate things a lot.

There is already a lock acquired on the repo during many parts of push.
I wouldn't be too worried about multiple clients racing for a lock - not
with our current load anyway.

That being said, you imply we're manipulating files in the working copy
(as opposed to the usually hidden store). We should not need to maintain
a checkout/working copy on the server. That will be problematic from a
performance perspective. Although, SSDs and Mercurial 2.6+ should go a
long way. We should be able to perform this squashing with Mercurial
APIs without touching the working directory. If we can't, long term we
should patch Mercurial to make that possible.

Could we offload the squashing to the client? We could have the client
generate a temporary squashed commit, push it, then strip it from history.

Taras Glek

unread,
Mar 4, 2014, 2:00:29 PM3/4/14
to Gregory Szorc, George Miroshnykov, mozilla-c...@googlegroups.com, Ehsan Akhgari
Greg, we are trying to not change developer hg workflow for v1. George started out with bookmarks, but we backtracked to this approach to let people continue using the currently prevalent mq workflow.

I'm 100% for considering hg extensions, etc in a followup, but for now we'd like to get something that requires minimum changes.

Taras

Tuesday, March 4, 2014 10:57
Tuesday, March 4, 2014 9:37
Hey everyone,

As we've discussed before on IRC, I've been working on a service that should allow users to create code review requests based on the changesets pushed to the special "review" repository.

For now, the workflow looks like this:
$ hg push -f review
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

You can see the UI by opening the link above.
I've already created a review request for that changeset, so you'll have to push your own changesets to play around with it.

Ehsan should already have access to this repository, here's the steps for anyone interested:

1. Send me your GitHub username (or just some login / SSH public key).
2. Once you have an account, clone the playground repository:


3. Make some changes, commit and push (force push?).
4. After push, you should see the URL to changeset-chooser, open it.
5. Choose the changesets to create review requests by clicking the corresponding table rows.
6. Click the "Create review request" button.
7. You should see the links to review requests in the rightmost column.

The code of the changeset-chooser is available here:

I've made this prototype as a separate tool using Node.js, but it should be
trivial to port it to Django and/or integrate into RB as soon as we're
comfortable with the general approach.

So the next roadblock I have is a problem with creating squashed review requests.
I couldn't find a way to squash arbitrary changesets without touching files
in the working repository.

But that would require some kind of exclusive lock on the repository
(so no simultaneous pushes from different users)
and would generally complicate things a lot.

Gregory Szorc

unread,
Mar 4, 2014, 2:16:23 PM3/4/14
to Taras Glek, Gregory Szorc, George Miroshnykov, mozilla-c...@googlegroups.com, Ehsan Akhgari
On 3/4/14, 11:00 AM, Taras Glek wrote:
> Greg, we are trying to not change developer hg workflow for v1. George
> started out with bookmarks, but we backtracked to this approach to let
> people continue using the currently prevalent mq workflow.
>
> I'm 100% for considering hg extensions, etc in a followup, but for now
> we'd like to get something that requires minimum changes.

This is acceptable.

I just want to ensure there is a path forward to modern Mercurial
workflows and for using a Mercurial extension to make the process
streamlined.

FWIW, I do worry that the current proposed workflow (push -f + open web
browser) is *more* work than my current workflow of `hg bzexport -e` or
`mach rbt post`. Some may complain about swallowing that pill.

Ehsan Akhgari

unread,
Mar 4, 2014, 2:40:45 PM3/4/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Taras Glek
On Tue, Mar 4, 2014 at 12:37 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:
Hey everyone,

As we've discussed before on IRC, I've been working on a service that should allow users to create code review requests based on the changesets pushed to the special "review" repository.

For now, the workflow looks like this:
$ hg push -f review
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

You can see the UI by opening the link above.
I've already created a review request for that changeset, so you'll have to push your own changesets to play around with it.

Thanks, this is great!
 
Ehsan should already have access to this repository, here's the steps for anyone interested:

1. Send me your GitHub username (or just some login / SSH public key).
2. Once you have an account, clone the playground repository:


3. Make some changes, commit and push (force push?).
4. After push, you should see the URL to changeset-chooser, open it.
5. Choose the changesets to create review requests by clicking the corresponding table rows.
6. Click the "Create review request" button.
7. You should see the links to review requests in the rightmost column.

One question: have you thought how this would map into choosing a review requestee?  It would be nice if we could do both in one go.
 
The code of the changeset-chooser is available here:

I've made this prototype as a separate tool using Node.js, but it should be
trivial to port it to Django and/or integrate into RB as soon as we're
comfortable with the general approach.

So the next roadblock I have is a problem with creating squashed review requests.
I couldn't find a way to squash arbitrary changesets without touching files
in the working repository.

But that would require some kind of exclusive lock on the repository
(so no simultaneous pushes from different users)
and would generally complicate things a lot.

Any suggestions on how to approach this are much appreciated.

You shouldn't need to use the working directory at all.  Can you please explain what you're trying to do?

Cheers,

George Miroshnykov

unread,
Mar 4, 2014, 4:19:02 PM3/4/14
to Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek, Ehsan Akhgari
On March 4, 2014 at 20:57:39 , Gregory Szorc (gsz...@mozilla.com) wrote:
On 3/4/14, 9:37 AM, George Miroshnykov wrote: 
> Hey everyone, 
> 
> As we've discussed before on IRC, I've been working on a service that 
> should allow users to create code review requests based on the 
> changesets pushed to the special "review" repository. 
> 
> For now, the workflow looks like this: 
> $ hg push -f review 
> pushing to ssh://review.infinity.com.ua//var/hg/playground-central 
> searching for changes 
> remote: adding changesets 
> remote: adding manifests 
> remote: adding file changes 
> remote: added 1 changesets with 1 changes to 1 files 
> remote: choose changesets to review: 
> http://review.infinity.com.ua:5000/?dbb8d013a3f3c32f0bdd4373e128f8cab11c9cd2 
> 
> You can see the UI by opening the link above. 
> I've already created a review request for that changeset, so you'll have 
> to push your own changesets to play around with it. 
> 
> Ehsan should already have access to this repository, here's the steps 
> for anyone interested: 
> 
> 1. Send me your GitHub username (or just some login / SSH public key). 

IT was able to provide me a dump of all SSH keys for Mozillians in bug 
928461. If you file a bug, they can probably do the same for you. 

Good idea, but is this the right bug number?
That bug was about adding a new SSH key, not providing any key dumps.

Actually, it would be great to have a place where you can grab all up-to-date public keys.

For now, I’ve added user “gps” with keys from here:
https://github.com/indygreg.keys

In case you need it, you have sudo access on that box.


> 2. Once you have an account, clone the playground repository: 
> 
> hg clone ssh://user...@review.infinity.com.ua//var/hg/playground-central 
> 
> 3. Make some changes, commit and push (force push?). 

FWIW, with Mercurial 2.9 and later, if you push a bookmark (hg push -B), 
you don't need push -f. 

If you are writing a Mercurial extension that does the pushing, you can 
pass an argument to localrepo.push so the new heads warning/push 
requirement isn't an issue.

> 4. After push, you should see the URL to changeset-chooser, open it. 
> 5. Choose the changesets to create review requests by clicking the 
> corresponding table rows. 
> 6. Click the "Create review request" button. 

Hopefully we can eventually do this client-side in the Mercurial client. 
Please ensure there is an API for this so users don't need to open a 
browser to initiate code review. 

Yup, that was the plan from day one.
The current UI is just a static HTML talking to that API.

Here’s the draft of the API:
https://github.com/laggyluke/changeset-chooser#api

There’s no auth yet, because the repo is public anyway.


> 7. You should see the links to review requests in the rightmost column. 
> 
> The code of the changeset-chooser is available here: 
> https://github.com/laggyluke/changeset-chooser 
> 
> I've made this prototype as a separate tool using Node.js, but it should be 
> trivial to port it to Django and/or integrate into RB as soon as we're 
> comfortable with the general approach. 
> 
> So the next roadblock I have is a problem with creating squashed review 
> requests. 
> I couldn't find a way to squash arbitrary changesets without touching files 
> in the working repository. 
> 
> But that would require some kind of exclusive lock on the repository 
> (so no simultaneous pushes from different users) 
> and would generally complicate things a lot. 

There is already a lock acquired on the repo during many parts of push. 
I wouldn't be too worried about multiple clients racing for a lock - not 
with our current load anyway. 

That being said, you imply we're manipulating files in the working copy 
(as opposed to the usually hidden store). We should not need to maintain 
a checkout/working copy on the server. That will be problematic from a 
performance perspective. Although, SSDs and Mercurial 2.6+ should go a 
long way. We should be able to perform this squashing with Mercurial 
APIs without touching the working directory. If we can't, long term we 
should patch Mercurial to make that possible. 

The current implementation talks to a local repo using the standard Mercurial CLI client.
I guess this is not a strict requirement, but it was easier to get this off the ground like that.

What we need is basically three commands:

1. Get the list of changesets we’re interested in.
Which changesets exactly we’re interested in is a tricky question in and of itself.
For now, we assume we’re interested in everything we’ve just pushed.
That means grabbing everything from the first pushed changeset to the tip:
hg log --rev <first-pushed-changeset>:tip

2. Get the diff of the individual changeset.
This is simple: hg diff --rev <rev>

3. Get the squashed diff of all the changesets that user have picked.
That’s what I’m struggling with.
So let’s say user has just pushed the following changesets:
A -> B -> C -> D
And he picked B and D for review.
We need to squash them somehow and I don’t know how to do this, apart from:
1. revert to A
2. apply B
3. apply D
4. get the diff
5. discard everything and revert to D

And we clearly need exclusive access to the (local?) repository to do this.

Could we offload the squashing to the client? We could have the client 
generate a temporary squashed commit, push it, then strip it from history. 

I guess we can, but it would probably require the same effort from client: the working copy should not have any local modifications etc.

I was hoping to find some less intrusive solutions, while keeping this approach as a last resort.

Thanks,
George

George Miroshnykov

unread,
Mar 4, 2014, 4:30:37 PM3/4/14
to Ehsan Akhgari, mozilla-c...@googlegroups.com, Taras Glek

I think this is something for us to discuss.
Do you want the user to be able to choose reviewer on changeset-chooser side instead of RB side?
We can explicitly tell RB to use particular reviewer or review group via its API.

 
The code of the changeset-chooser is available here:

I've made this prototype as a separate tool using Node.js, but it should be
trivial to port it to Django and/or integrate into RB as soon as we're
comfortable with the general approach.

So the next roadblock I have is a problem with creating squashed review requests.
I couldn't find a way to squash arbitrary changesets without touching files
in the working repository.

But that would require some kind of exclusive lock on the repository
(so no simultaneous pushes from different users)
and would generally complicate things a lot.

Any suggestions on how to approach this are much appreciated.

You shouldn't need to use the working directory at all.  Can you please explain what you're trying to do?

Please see my reply to Gregory.

Also, I’ve just found this, maybe we could use it:
http://cyberelk.net/tim/patchutils/man/rn01re02.html

Thanks,
George

Gregory Szorc

unread,
Mar 4, 2014, 4:39:02 PM3/4/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Taras Glek, Ehsan Akhgari
Fail. It's bug 969515.

I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world.
Yes, it is a tricky question.

The mechanism of only examining pushed changesets is somewhat flawed in that it will fail to detect shared ancestor changesets that didn't change between pushes. e.g. if I have a 10 part series consisting of changesets 1..10, if I push 1..10, then make changes to 5, the second push will only contain 5..10. If you rebase, you'll likely repush 1..10.

I had a mechanism for dealing with this in my experimental extension:

1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
2) Find the distance (in terms of changeset count) between the pushed head and the head in question
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head

This mechanism will correctly pick up partial pushes.

> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>

You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.

> 3. Get the squashed diff of all the changesets that user have picked.
> That’s what I’m struggling with.
> So let’s say user has just pushed the following changesets:
> A -> B -> C -> D
> And he picked B and D for review.
> We need to squash them somehow and I don’t know how to do this, apart from:
> 1. revert to A
> 2. apply B
> 3. apply D
> 4. get the diff
> 5. discard everything and revert to D

Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster.

George Miroshnykov

unread,
Mar 4, 2014, 5:18:01 PM3/4/14
to Gregory Szorc, Taras Glek, mozilla-c...@googlegroups.com, Ehsan Akhgari

"You are not authorized to access bug #969515.” :(


I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world. 

Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.
But I guess we already have most of the accounts set up on that box, so this is not a problem.

My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?

I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.


This mechanism will correctly pick up partial pushes. 

> 2. Get the diff of the individual changeset. 
> This is simple: hg diff --rev <rev> 

You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?

3. There is a command server thing I was planning to use:
http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.


> 3. Get the squashed diff of all the changesets that user have picked. 
> That’s what I’m struggling with. 
> So let’s say user has just pushed the following changesets: 
> A -> B -> C -> D 
> And he picked B and D for review. 
> We need to squash them somehow and I don’t know how to do this, apart from: 
> 1. revert to A 
> 2. apply B 
> 3. apply D 
> 4. get the diff 
> 5. discard everything and revert to D 

Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster. 

This does look bad, we just tried to cover some possible corner cases.

Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Thanks,
George

Ehsan Akhgari

unread,
Mar 4, 2014, 5:55:41 PM3/4/14
to George Miroshnykov, Gregory Szorc, Taras Glek, mozilla-c...@googlegroups.com
Let's not worry about authentication for now, that is not a tricky problem to solve, and the hg repo for this will most likely be hosted by Mozilla so we can just use our existing LDAP auth for this.

I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world. 

Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.
But I guess we already have most of the accounts set up on that box, so this is not a problem.

Again, let's not worry about auth for now.
I've already explained this before on IRC but let me repeat the gist here.  Please remember that we're interested in the contents of the push in terms of a review.  The only reasons why something should already be on the other side is:

1. It's something that comes from mozilla-central or the like, which has been pushed by somebody else to the shared repo before.  In this case, we don't need to worry about that commit because it's already reviewed given that it's coming from m-c.

2. It's something in your local branch which has not changed since the last time you pushed your local branch, which means that it's previously under review and you don't need to re-request review on the same commit.

Note that in addition, your push may contain other commits coming from m-c/etc. which you are not interested to ask review on, in which case you just ignore those commits, and they end up being in the #1 category above the next time someone pushes their branch containing that commit.
 
I had a mechanism for dealing with this in my experimental extension: 

1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default. 
2) Find the distance (in terms of changeset count) between the pushed head and the head in question 
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head 
Please see above, we don't need any of this complexity for the purposes of this tool.  Which is also why we won't need to deal with bookmarks, or try to sync up with the upstream repo(s).

My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?

George, I thought we already agreed as of last week that the above is not a problem for us.  What changed since then?

I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

None of the workflow I laid out above is Mozilla specific unless I'm missing something.  (And it would be fine if it were -- just trying to be clear!)

This mechanism will correctly pick up partial pushes. 

> 2. Get the diff of the individual changeset. 
> This is simple: hg diff --rev <rev>

I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory.  You probably want to specify two revisions there.

As a hint for the future, try running this command in the mercurial repo on the server:

hg update 0

This command basically updates to revision 0, which is fancy talk for wiping out the entire working directory.  That way, if you end up relying on the working directory accidentally, you'll know very fast!
 
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.

Agreed, let's focus on correctness first, and then on performance.

But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

No, we can choose whatever license we like.  Our RB changes will probably be MIT.  But where is this FAQ BTW? 

2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?

 I actually think that we should start off with the CLI interface, and then measure to see if it's fast enough for us.  I don't expect this tool to hit a ton of hg operations so the CLI interface may be enough.

What we should pay attention to is to not make choices which make the usage of the API unnecessarily difficult, but since the layer which talks to hg will be very small, I'm not too worried about that.

3. There is a command server thing I was planning to use:
http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.

There are no licensing issues we should be worried about here.  GPL is fine if that's what we have to use, although I generally prefer a more permissive license.

Also, please do not use CommandServer or anything fancy like that for now.  Let's keep focused on getting the basics right.

> 3. Get the squashed diff of all the changesets that user have picked. 
> That’s what I’m struggling with. 
> So let’s say user has just pushed the following changesets: 
> A -> B -> C -> D 
> And he picked B and D for review. 
> We need to squash them somehow and I don’t know how to do this, apart from: 
> 1. revert to A 
> 2. apply B 
> 3. apply D 
> 4. get the diff 
> 5. discard everything and revert to D 

Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster. 

This does look bad, we just tried to cover some possible corner cases.

Hmm, so this is actually quite a bad limitation.  The case which I'm worried about is for someone to have a branch like this:

* Bug 555555 - Part 1: build the infra
* Bug 666666 - Small unrelated fix
* Bug 555555 - Part 2: use the infra and be awesome

There is no automatic, error proof way of reordering the commits here which is guaranteed to succeed.  So let's not try to fix that problem.

Can you please talk to the #reviewboard folks to see what the squashed diff is used for?  And explain to them why that is unacceptable for our use case and perhaps ask to see if there is a workaround?

Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Even that won't work.  My suggestion is for now to try to see if there is a workaround for this RB limitation.  I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...
 
Thanks!
Ehsan

Ehsan Akhgari

unread,
Mar 4, 2014, 5:57:07 PM3/4/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Taras Glek
Yes, I think it would be very nice to set the reviewer in the same tool, so that you don't have multiple hoops to jump around for asking review.

gregor...@gmail.com

unread,
Mar 4, 2014, 6:04:29 PM3/4/14
to mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek, Ehsan Akhgari
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:

> I had a mechanism for dealing with this in my experimental extension: 
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default. 
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question 
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head 
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

I just thought of a more generic and possibly better way to do this: phases.

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.

$ hg log -r 'draft() & ancestors(tip)'

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.

> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

"It depends" and "ask a license geek" (like Gerv).

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?

If you have comprehensive test coverage it shouldn't be a problem.

> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.

Yes, python-hglib uses the command server.

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.

Ehsan Akhgari

unread,
Mar 4, 2014, 6:22:55 PM3/4/14
to gregor...@gmail.com, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
On Tue, Mar 4, 2014 at 6:04 PM, <gregor...@gmail.com> wrote:
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:

> I had a mechanism for dealing with this in my experimental extension: 
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default. 
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question 
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head 
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

I just thought of a more generic and possibly better way to do this: phases.

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.

$ hg log -r 'draft() & ancestors(tip)'

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.

Not sure if you saw my other reply to the thread, but this is not necessary, please see that email.
 
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

"It depends" and "ask a license geek" (like Gerv).

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?

If you have comprehensive test coverage it shouldn't be a problem.

> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.

Yes, python-hglib uses the command server.

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.

Hmm, ok so based on the above I definitely prefer the command server over directly using the API.  But I still think we should use the CLI interface for now while we get everything working and then we can start worrying about this if the performance of the CLI layer proves to be insufficient.

Cheers,
Ehsan
Message has been deleted

Ehsan Akhgari

unread,
Mar 4, 2014, 7:24:41 PM3/4/14
to gregory.szorc, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
The current code is node.js, so python-hglib is not an option.  I really really would like to urge everyone to try to stay on the same page, and focus on correctness now, while keeping perf in mind.  Also please remember that this will be a very small piece of code, there won't be too much technical debt.

Basically I'd rather keep our attention on fixing things like the workflow so that we're sure we're building the right tool before we think about optimizing it.

Hope this makes sense.

Cheers,



On Tue, Mar 4, 2014 at 7:11 PM, <gregor...@gmail.com> wrote:
On Tuesday, March 4, 2014 3:22:55 PM UTC-8, Ehsan Akhgari wrote:
> Hmm, ok so based on the above I definitely prefer the command server over directly using the API.  But I still think we should use the CLI interface for now while we get everything working and then we can start worrying about this if the performance of the CLI layer proves to be insufficient.

Or we can just use the command server now.

The command server will execute things faster and it avoids technical debt. Furthermore, if you use an existing library like python-hglib, it's already solved the technical problem of parsing output into data structures, which saves you even more time now.

gregor...@gmail.com

unread,
Mar 4, 2014, 10:07:12 PM3/4/14
to mozilla-c...@googlegroups.com, gregory.szorc, Gregory Szorc, Taras Glek, eh...@mozilla.com
On Tuesday, March 4, 2014 4:24:41 PM UTC-8, Ehsan Akhgari wrote:
> The current code is node.js, so python-hglib is not an option.  I really really would like to urge everyone to try to stay on the same page, and focus on correctness now, while keeping perf in mind.  Also please remember that this will be a very small piece of code, there won't be too much technical debt.
>
> Basically I'd rather keep our attention on fixing things like the workflow so that we're sure we're building the right tool before we think about optimizing it.

I'm more worried that slurping multiple `hg` processes via Node.js won't scale to the desired audience whereas the command server will.

If this thing can't scale at launch time, we've failed. If we're not going to use the command server, I insist we at least run some simple load tests ASAP to avoid roll-out delays.

Ehsan Akhgari

unread,
Mar 4, 2014, 10:12:19 PM3/4/14
to Gregory Szorc, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
Greg, we're not close to launching this.  This project literally got started last week, and we're currently trying to focus on the basics.  I appreciate you trying to ensure that we keep the bigger picture in mind, and it's definitely helpful to have you point this out when we forget, but I'm not saying that we should not use the command server.  All I'm saying is that we should not switch to using it _now_.  The current code may even need to be rewritten in python if we want it to be integrated with RB.  I think worrying about the performance of the code in its *current* shape is a mistake.

Does this address your concern?

Cheers,

--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

gregor...@gmail.com

unread,
Mar 4, 2014, 10:49:25 PM3/4/14
to mozilla-c...@googlegroups.com, Gregory Szorc, Gregory Szorc, Taras Glek, eh...@mozilla.com
On Tuesday, March 4, 2014 7:12:19 PM UTC-8, Ehsan Akhgari wrote:
> Greg, we're not close to launching this.  This project literally got started last week, and we're currently trying to focus on the basics.  I appreciate you trying to ensure that we keep the bigger picture in mind, and it's definitely helpful to have you point this out when we forget, but I'm not saying that we should not use the command server.  All I'm saying is that we should not switch to using it _now_.  The current code may even need to be rewritten in python if we want it to be integrated with RB.  I think worrying about the performance of the code in its *current* shape is a mistake.
>
> Does this address your concern?

It's not clear to me why we're writing code that won't ship.

It's not clear to me because the goals/phases of this project have not been explained to me. It sounded like we were trying to build something that would facilitate push-based, ReviewBoard-based code review to replace Bugzilla's Splinter and that we are on an aggressive timeline to have that rolled out. It sounds like part of that is incorrect.

Could you please state the goals and deliverables of this project so I can restrain the horse the next time it wants to leave the stable?

Ehsan Akhgari

unread,
Mar 4, 2014, 11:03:16 PM3/4/14
to Gregory Szorc, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
On Tue, Mar 4, 2014 at 10:49 PM, <gregor...@gmail.com> wrote:
On Tuesday, March 4, 2014 7:12:19 PM UTC-8, Ehsan Akhgari wrote:
> Greg, we're not close to launching this.  This project literally got started last week, and we're currently trying to focus on the basics.  I appreciate you trying to ensure that we keep the bigger picture in mind, and it's definitely helpful to have you point this out when we forget, but I'm not saying that we should not use the command server.  All I'm saying is that we should not switch to using it _now_.  The current code may even need to be rewritten in python if we want it to be integrated with RB.  I think worrying about the performance of the code in its *current* shape is a mistake.
>
> Does this address your concern?

It's not clear to me why we're writing code that won't ship.

That's how we build a lot of our internal tools.  We try out ideas and try to focus on correctness/speed of development at first, then try out the implemented base version of the idea, fix the problems we find out and iterate on it.  At some point we determine that it's ready for general use, and we announce it to the public.  And sometimes we end up adding essential features after people have started using the initial version.  I've been involved in several such projects here at Mozilla, and this idea has proven great every single time, examples include TBPL, Bugzilla Tweaks, DXR, The Gecko Profiler, etc.  I'm working past on my previous experience.
 
It's not clear to me because the goals/phases of this project have not been explained to me. It sounded like we were trying to build something that would facilitate push-based, ReviewBoard-based code review to replace Bugzilla's Splinter and that we are on an aggressive timeline to have that rolled out. It sounds like part of that is incorrect.

Nope, that is all correct.
 
Could you please state the goals and deliverables of this project so I can restrain the horse the next time it wants to leave the stable?

You've done a good job of listing the goals and the deliverable above.  There is no disagreement about that.  And I don't believe that the goals have ever been mis-communicated to anybody.

I hope that it's clear that there are various ways of writing software, and one of them is doing things in small iterations, focusing on the basics at first, and getting feedback from the previous cycle to the next one.  That tends to be my school of thought.  If you don't agree, then I think that is just a difference of opinion.  But even if that is the case I do have to continue to have your input here, it is greatly appreciated.  :-)

George Miroshnykov

unread,
Mar 5, 2014, 8:15:42 AM3/5/14
to Ehsan Akhgari, Gregory Szorc, Taras Glek, mozilla-c...@googlegroups.com

Got it.

Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.


I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

None of the workflow I laid out above is Mozilla specific unless I'm missing something.  (And it would be fine if it were -- just trying to be clear!)

I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).
Our current approach of only looking at the pushed changesets may be too simple, but yet generic.



This mechanism will correctly pick up partial pushes. 

> 2. Get the diff of the individual changeset. 
> This is simple: hg diff --rev <rev>

I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory.  You probably want to specify two revisions there.

Sorry, wrong command.
I’m actually using `hg export --git --rev <rev>` which should do the right thing.

As a hint for the future, try running this command in the mercurial repo on the server:

hg update 0

This command basically updates to revision 0, which is fancy talk for wiping out the entire working directory.  That way, if you end up relying on the working directory accidentally, you'll know very fast!

That’s a neat trick, thanks!


 
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.

Agreed, let's focus on correctness first, and then on performance.

But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

No, we can choose whatever license we like.  Our RB changes will probably be MIT.  But where is this FAQ BTW? 

Sorry, it wasn’t in the FAQ, it’s here:
http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."

Technically we don’t need a squashed diff at all, see below.


Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Even that won't work.  My suggestion is for now to try to see if there is a workaround for this RB limitation.  I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...

RB itself doesn’t need a squashed diff at all.
I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.

Thanks,
George


 
Thanks!
Ehsan

George Miroshnykov

unread,
Mar 5, 2014, 8:46:13 AM3/5/14
to gregor...@gmail.com, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek, Ehsan Akhgari

On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:

On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote: 

> I had a mechanism for dealing with this in my experimental extension:  
> 
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.  
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question  
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head  
> My bad - first time I couldn’t get this logic even after looking at the code. 
> Is this the approach you’re using now? Any false-positives or something? 
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow. 

I just thought of a more generic and possibly better way to do this: phases. 

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases 

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g. 

$ hg log -r 'draft() & ancestors(tip)' 

Wow, this is cool!
I’ll read it up, thanks.

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there. 

Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.


> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.  
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns. 
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker. 
> But it’s probably a good opportunity to raise a number of related questions: 
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL. 
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us? 

"It depends" and "ask a license geek" (like Gerv). 

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly. 
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it? 

If you have comprehensive test coverage it shouldn't be a problem. 


> 3. There is a command server thing I was planning to use: 
> http://mercurial.selenic.com/wiki/CommandServer 
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”. 
> So while it avoid possible licensing issues, it is still of no use to us. 

Yes, python-hglib uses the command server. 

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository. 

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements. 

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing. 

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it. 

As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.

Thanks,
George

Taras Glek

unread,
Mar 5, 2014, 12:37:41 PM3/5/14
to George Miroshnykov, Ehsan Akhgari, Gregory Szorc, mozilla-c...@googlegroups.com


Wednesday, March 5, 2014 5:15

On March 5, 2014 at 12:56:24 AM, Ehsan Akhgari (eh...@mozilla.com) wrote:

Got it.

Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.

I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).


Our current approach of only looking at the pushed changesets may be too simple, but yet generic.

Sorry, wrong command.


I’m actually using `hg export --git --rev <rev>` which should do the right thing.

That’s a neat trick, thanks!

Sorry, it wasn’t in the FAQ, it’s here:


http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."

Technically we don’t need a squashed diff at all, see below.

RB itself doesn’t need a squashed diff at all.


I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.

We don't need this squashed view. Since everything is kept in hg, this can be implemented via some hg web view later.

Taras

Thanks,
George

Tuesday, March 4, 2014 14:55

I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

None of the workflow I laid out above is Mozilla specific unless I'm missing something.  (And it would be fine if it were -- just trying to be clear!)
This mechanism will correctly pick up partial pushes. 

> 2. Get the diff of the individual changeset. 
> This is simple: hg diff --rev <rev>
I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory.  You probably want to specify two revisions there.

As a hint for the future, try running this command in the mercurial repo on the server:

hg update 0

This command basically updates to revision 0, which is fancy talk for wiping out the entire working directory.  That way, if you end up relying on the working directory accidentally, you'll know very fast!
 
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.

Agreed, let's focus on correctness first, and then on performance.

But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

No, we can choose whatever license we like.  Our RB changes will probably be MIT.  But where is this FAQ BTW? 

2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.

Let’s say you clone m-c, commit something, pull m-c, commit something again etc.


This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Even that won't work.  My suggestion is for now to try to see if there is a workaround for this RB limitation.  I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...
 
Thanks!
Ehsan
Tuesday, March 4, 2014 14:18

On March 4, 2014 at 23:39:03 , Gregory Szorc (gsz...@mozilla.com) wrote:

"You are not authorized to access bug #969515.” :(

Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.


But I guess we already have most of the accounts set up on that box, so this is not a problem.

My bad - first time I couldn’t get this logic even after looking at the code.


Is this the approach you’re using now? Any false-positives or something?

I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.

But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.


I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?

3. There is a command server thing I was planning to use:


http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.

This does look bad, we just tried to cover some possible corner cases.

Let’s say you clone m-c, commit something, pull m-c, commit something again etc.


This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Thanks,
George

--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Tuesday, March 4, 2014 13:39
Fail. It's bug 969515.

I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world.

Yes, it is a tricky question.

The mechanism of only examining pushed changesets is somewhat flawed in that it will fail to detect shared ancestor changesets that didn't change between pushes. e.g. if I have a 10 part series consisting of changesets 1..10, if I push 1..10, then make changes to 5, the second push will only contain 5..10. If you rebase, you'll likely repush 1..10.

I had a mechanism for dealing with this in my experimental extension:

1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
2) Find the distance (in terms of changeset count) between the pushed head and the head in question
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head

This mechanism will correctly pick up partial pushes.

2. Get the diff of the individual changeset.
This is simple: hg diff --rev <rev>
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.

3. Get the squashed diff of all the changesets that user have picked.
That’s what I’m struggling with.
So let’s say user has just pushed the following changesets:
A -> B -> C -> D
And he picked B and D for review.
We need to squash them somehow and I don’t know how to do this, apart from:
1. revert to A
2. apply B
3. apply D
4. get the diff
5. discard everything and revert to D
Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster.
Tuesday, March 4, 2014 13:19
On March 4, 2014 at 20:57:39 , Gregory Szorc (gsz...@mozilla.com) wrote:

Good idea, but is this the right bug number?
That bug was about adding a new SSH key, not providing any key dumps.

Actually, it would be great to have a place where you can grab all up-to-date public keys.

For now, I’ve added user “gps” with keys from here:
https://github.com/indygreg.keys

In case you need it, you have sudo access on that box.

Yup, that was the plan from day one.


The current UI is just a static HTML talking to that API.

Here’s the draft of the API:
https://github.com/laggyluke/changeset-chooser#api

There’s no auth yet, because the repo is public anyway.

The current implementation talks to a local repo using the standard Mercurial CLI client.


I guess this is not a strict requirement, but it was easier to get this off the ground like that.

What we need is basically three commands:

1. Get the list of changesets we’re interested in.
Which changesets exactly we’re interested in is a tricky question in and of itself.
For now, we assume we’re interested in everything we’ve just pushed.
That means grabbing everything from the first pushed changeset to the tip:
hg log --rev <first-pushed-changeset>:tip

2. Get the diff of the individual changeset.


This is simple: hg diff --rev <rev>

3. Get the squashed diff of all the changesets that user have picked.


That’s what I’m struggling with.
So let’s say user has just pushed the following changesets:
A -> B -> C -> D
And he picked B and D for review.
We need to squash them somehow and I don’t know how to do this, apart from:
1. revert to A
2. apply B
3. apply D
4. get the diff
5. discard everything and revert to D

And we clearly need exclusive access to the (local?) repository to do this.

I guess we can, but it would probably require the same effort from client: the working copy should not have any local modifications etc.

Ehsan Akhgari

unread,
Mar 5, 2014, 2:22:21 PM3/5/14
to George Miroshnykov, Gregory Szorc, Taras Glek, mozilla-c...@googlegroups.com
On Wed, Mar 5, 2014 at 8:15 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:
I had a mechanism for dealing with this in my experimental extension: 

1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default. 
2) Find the distance (in terms of changeset count) between the pushed head and the head in question 
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head 
Please see above, we don't need any of this complexity for the purposes of this tool.  Which is also why we won't need to deal with bookmarks, or try to sync up with the upstream repo(s).

My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?

George, I thought we already agreed as of last week that the above is not a problem for us.  What changed since then?

Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.

OK, fair enough.
 

I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.

None of the workflow I laid out above is Mozilla specific unless I'm missing something.  (And it would be fine if it were -- just trying to be clear!)

I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).
Our current approach of only looking at the pushed changesets may be too simple, but yet generic.

I hope that it's simple enough.  :-)


This mechanism will correctly pick up partial pushes. 

> 2. Get the diff of the individual changeset. 
> This is simple: hg diff --rev <rev>

I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory.  You probably want to specify two revisions there.

Sorry, wrong command.
I’m actually using `hg export --git --rev <rev>` which should do the right thing.

Yeah, that shouldn't need to touch the working directory.

 
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests. 

Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.

I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.

Agreed, let's focus on correctness first, and then on performance.

But it’s probably a good opportunity to raise a number of related questions:

1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?

No, we can choose whatever license we like.  Our RB changes will probably be MIT.  But where is this FAQ BTW? 

Sorry, it wasn’t in the FAQ, it’s here:
http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."

I see.  Thanks.


Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…

Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.

Even that won't work.  My suggestion is for now to try to see if there is a workaround for this RB limitation.  I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...

RB itself doesn’t need a squashed diff at all.
I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.

So, the reviewer wants to review what the author has submitted, which is their local branch.  We're assuming that we trust the author of the changes to understand what they're doing.  With that, we should assume that what the author puts in the branch is sane, so for viewing a sqashed diff, including the changes in the commits in the middle should be fine.


Cheers,
Ehsan

Ehsan Akhgari

unread,
Mar 5, 2014, 2:32:32 PM3/5/14
to George Miroshnykov, Gregory Szorc, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
On Wed, Mar 5, 2014 at 8:46 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:

On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote: 

> I had a mechanism for dealing with this in my experimental extension:  
> 
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.  
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question  
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head  
> My bad - first time I couldn’t get this logic even after looking at the code. 
> Is this the approach you’re using now? Any false-positives or something? 
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow. 

I just thought of a more generic and possibly better way to do this: phases. 

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases 

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g. 

$ hg log -r 'draft() & ancestors(tip)' 

Wow, this is cool!
I’ll read it up, thanks.

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there. 

Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)

You're not the first one!  :-)  Please keep asking questions in the future, and I and others here will be happy to help.

Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.

Yeah, one of the reasons why I came up with the design which I have proposed is that is really sticks to the absolute minimum, and doesn't make any assumptions on people's workflow.  That will give us a flexible solution which will stand robust in the face of people changing their workflows.  I'm still convinced that we can make the simple design work well.

(FWIW in case I have not mentioned it earlier, that design is based on how gerrit works, any smartness should be attributed to those folks, not myself!)


> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.  
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns. 
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker. 
> But it’s probably a good opportunity to raise a number of related questions: 
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL. 
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us? 

"It depends" and "ask a license geek" (like Gerv). 

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly. 
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it? 

If you have comprehensive test coverage it shouldn't be a problem. 


> 3. There is a command server thing I was planning to use: 
> http://mercurial.selenic.com/wiki/CommandServer 
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”. 
> So while it avoid possible licensing issues, it is still of no use to us. 

Yes, python-hglib uses the command server. 

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository. 

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements. 

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing. 

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it. 

As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.

Sounds good.

More about the squashed diff issue, please see what Taras suggested.  Basically, if you can, just ignore the squashed view.  It's not important as long as it doesn't stand in your way to make further progress.
 
Cheers,
Ehsan

George Miroshnykov

unread,
Mar 5, 2014, 2:58:30 PM3/5/14
to Ehsan Akhgari, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com

On March 5, 2014 at 21:33:16 , Ehsan Akhgari (eh...@mozilla.com) wrote:


On Wed, Mar 5, 2014 at 8:46 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:

On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote: 

> I had a mechanism for dealing with this in my experimental extension:  
> 
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.  
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question  
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head  
> My bad - first time I couldn’t get this logic even after looking at the code. 
> Is this the approach you’re using now? Any false-positives or something? 
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow. 

I just thought of a more generic and possibly better way to do this: phases. 

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases 

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g. 

$ hg log -r 'draft() & ancestors(tip)' 

Wow, this is cool!
I’ll read it up, thanks.

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there. 

Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)

You're not the first one!  :-)  Please keep asking questions in the future, and I and others here will be happy to help.

Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.

Yeah, one of the reasons why I came up with the design which I have proposed is that is really sticks to the absolute minimum, and doesn't make any assumptions on people's workflow.  That will give us a flexible solution which will stand robust in the face of people changing their workflows.  I'm still convinced that we can make the simple design work well.

(FWIW in case I have not mentioned it earlier, that design is based on how gerrit works, any smartness should be attributed to those folks, not myself!)

I was more concerned that the lowest common denominator may be so low that we won’t actually solve the problem.
But again, I’m watching from the sidelines, so if you say that our current approach is good enough, let’s stick to it.

I haven’t worked with gerrit before, I’ll look into it in my spare time.



> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.  
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns. 
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker. 
> But it’s probably a good opportunity to raise a number of related questions: 
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL. 
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us? 

"It depends" and "ask a license geek" (like Gerv). 

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly. 
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it? 

If you have comprehensive test coverage it shouldn't be a problem. 


> 3. There is a command server thing I was planning to use: 
> http://mercurial.selenic.com/wiki/CommandServer 
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”. 
> So while it avoid possible licensing issues, it is still of no use to us. 

Yes, python-hglib uses the command server. 

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository. 

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements. 

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing. 

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it. 

As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.

Sounds good.

More about the squashed diff issue, please see what Taras suggested.  Basically, if you can, just ignore the squashed view.  It's not important as long as it doesn't stand in your way to make further progress.

So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?

Ehsan Akhgari

unread,
Mar 5, 2014, 3:21:16 PM3/5/14
to George Miroshnykov, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com
OK sounds good!
Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?

Gregory Szorc

unread,
Mar 5, 2014, 3:28:46 PM3/5/14
to George Miroshnykov, Ehsan Akhgari, Taras Glek, mozilla-c...@googlegroups.com
> As a hint for the future, try running this command in the mercurial repo on
> the server:
>
> hg update 0
>
> This command basically updates to revision 0, which is fancy talk for wiping
> out the entire working directory.  That way, if you end up relying on the
> working directory accidentally, you'll know very fast!
> That’s a neat trick, thanks!

Revision 0 is the first commit in a repo and it is not empty!

You want to use `hg update null` to clear the working directory.

Ehsan Akhgari

unread,
Mar 5, 2014, 3:30:58 PM3/5/14
to Gregory Szorc, George Miroshnykov, Taras Glek, mozilla-c...@googlegroups.com
Ah, you're right, thanks for correcting me.  I keep making this mistake because iirc our revision 0 in m-c just contains a .hgignore...

George Miroshnykov

unread,
Mar 5, 2014, 3:47:33 PM3/5/14
to Ehsan Akhgari, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com

You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?

I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.

So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

Thanks,
George

Taras Glek

unread,
Mar 5, 2014, 3:50:47 PM3/5/14
to George Miroshnykov, Ehsan Akhgari, Gregory Szorc, mozilla-c...@googlegroups.com


Wednesday, March 5, 2014 12:47
Note, you should not be able to submit a review request without specifying a bug#.

Not sure how much of this reviewboard does for you, but bz integration is probably a good next area to poke at:
* add an attachment with a link to review in RB
* post comment with a summary of hg comments as a comment into BZ

Thanks,
George
Wednesday, March 5, 2014 12:21
On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?

Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?
Wednesday, March 5, 2014 11:58

On March 5, 2014 at 21:33:16 , Ehsan Akhgari (eh...@mozilla.com) wrote:

I was more concerned that the lowest common denominator may be so low that we won’t actually solve the problem.
But again, I’m watching from the sidelines, so if you say that our current approach is good enough, let’s stick to it.

I haven’t worked with gerrit before, I’ll look into it in my spare time.

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?

Wednesday, March 5, 2014 11:32

On Wed, Mar 5, 2014 at 8:46 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:

On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote: 

> I had a mechanism for dealing with this in my experimental extension:  
> 
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.  
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question  
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head  
> My bad - first time I couldn’t get this logic even after looking at the code. 
> Is this the approach you’re using now? Any false-positives or something? 
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow. 

I just thought of a more generic and possibly better way to do this: phases. 

Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases 

If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g. 

$ hg log -r 'draft() & ancestors(tip)' 

Wow, this is cool!
I’ll read it up, thanks.

Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there. 

Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)

You're not the first one!  :-)  Please keep asking questions in the future, and I and others here will be happy to help.

Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.

Yeah, one of the reasons why I came up with the design which I have proposed is that is really sticks to the absolute minimum, and doesn't make any assumptions on people's workflow.  That will give us a flexible solution which will stand robust in the face of people changing their workflows.  I'm still convinced that we can make the simple design work well.

(FWIW in case I have not mentioned it earlier, that design is based on how gerrit works, any smartness should be attributed to those folks, not myself!)
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.  
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns. 
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker. 
> But it’s probably a good opportunity to raise a number of related questions: 
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL. 
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us? 

"It depends" and "ask a license geek" (like Gerv). 

> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly. 
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it? 

If you have comprehensive test coverage it shouldn't be a problem. 


> 3. There is a command server thing I was planning to use: 
> http://mercurial.selenic.com/wiki/CommandServer 
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”. 
> So while it avoid possible licensing issues, it is still of no use to us. 

Yes, python-hglib uses the command server. 

The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository. 

Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements. 

Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing. 

tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it. 

As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.

Sounds good.

More about the squashed diff issue, please see what Taras suggested.  Basically, if you can, just ignore the squashed view.  It's not important as long as it doesn't stand in your way to make further progress.
 
Cheers,
Ehsan
Wednesday, March 5, 2014 5:46

On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:

Wow, this is cool!
I’ll read it up, thanks.

Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)


Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.

As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.


Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.

Thanks,
George

Ehsan Akhgari

unread,
Mar 5, 2014, 4:00:26 PM3/5/14
to George Miroshnykov, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com
On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?

Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?

You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?

I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.


Remember the case of:

Bug 555555 Part 1
Bug 555555 Part 2
Bug 666666

and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.

So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Thanks,
Ehsan

George Miroshnykov

unread,
Mar 5, 2014, 4:45:07 PM3/5/14
to Ehsan Akhgari, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com

On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?
Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?

You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?

I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.


Remember the case of:

Bug 555555 Part 1
Bug 555555 Part 2
Bug 666666

and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.

Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.

How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.

Also, maybe there’s another way we haven’t considered?


So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.
Each review request (and, consequently, changeset) will have it’s own attached link.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?

Ehsan Akhgari

unread,
Mar 5, 2014, 4:55:21 PM3/5/14
to George Miroshnykov, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com
On Wed, Mar 5, 2014 at 4:45 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?
Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?

You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?

I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.


Remember the case of:

Bug 555555 Part 1
Bug 555555 Part 2
Bug 666666

and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.

Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.

How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.

Also, maybe there’s another way we haven’t considered?

What we talked about last time was remembering which changesets were pushed as part of which bug, and when you receive an empty push the second time with a different bug number, reassigning the unused changesets from the last time in the chooser UI.  Does that make sense?


So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.

So we're not getting one of those squashed requests?  That sucks, since I wast to be able to have a permalink for the reviews on a bug.

Each review request (and, consequently, changeset) will have it’s own attached link.

That's not great...

If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?

Hmm, I think just one, in the interest of less comment spam.  Please look into the RB/Bugzilla integration to see what it offers.  Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210.  We might want to sync up with them.

Cheers,
Ehsan

Gregory Szorc

unread,
Mar 5, 2014, 4:57:20 PM3/5/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Taras Glek, Ehsan Akhgari
> Yeah, but the problem I have with Mercurial right now is that there are so
> many features and extensions that it’s hard to wrap the head around it :)
> Judging from my Git perspective, this leads to workflow fragmentation when
> different developers may have different (or even incompatible) preferences
> for their workflow.
> I don’t think we can simply “fix” the code review process without introducing
> some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but
> I hope I’m wrong here.

mq or bookmark based workflow should all be compatible. At the end of the day, the client is pushing a new head to a remote repo.

The main problem I can think of is how the server will tie multiple pushes together. There are a number of options:

a) Apply a heuristic and guess. e.g. compare the first bug number referenced in the tip-most commit message. There's lots of opportunity for bikeshedding and iterative improvements here. This is a hard problem!

b) Make clients specify an identifier. If you want to update an existing review, you push to the same identifier or the review system assumes separate reviews. We can use bookmarks and branch names here. For clients pushing with an extension, we can have the client upload metadata as part of the push. This includes the mq patch name.

c) Use changeset evolution and obsolescence markers. Mercurial will tell us which changesets are logical successors of others even though they don't obviously chain through the DAG. This will likely be the best long-term solution, as it requires the least involvement from users. But it's likely 6-12 months out.

Gregory Szorc

unread,
Mar 5, 2014, 5:02:33 PM3/5/14
to George Miroshnykov, Ehsan Akhgari, Taras Glek, mozilla-c...@googlegroups.com
> Choosing the reviewer.  Also, did you get a chance to try out the push to bug
> workflow as well?
> You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
>
> I haven’t tried it yet, but I’m sure it will be trivial to implement.
> The changegroup hook will simply append the value of our env var to the
> changeset-chooser URL.
> The HTML UI will take the value from the URL and pass it over to the HTTP API
> call that will assign the corresponding bug number via Review Board API.

I don't think you'll need to mess around with --remotecmd. Let's require a bug # in commit messages and remove the need for guessing games.

Ehsan Akhgari

unread,
Mar 5, 2014, 5:05:49 PM3/5/14
to Gregory Szorc, George Miroshnykov, Taras Glek, mozilla-c...@googlegroups.com
The problem with requiring a bug number is that people don't always do that, and I'm interested in supporting as many different workflows as possible.

My idea was basically the (b) option you listed.  I think we should keep the API on the server side dumb, and don't try to make any guesses inside it.  On the client side, we can definitely do a lot of smarts later on, by looking at the commit messages and pre-fill what gets delivered to the server based on the commit message if it's available, otherwise prompting the user as needed.  What do you think about that?

(c) is obviously not an option for us, as it's not something that we can use right now.

Gregory Szorc

unread,
Mar 5, 2014, 5:12:34 PM3/5/14
to Ehsan Akhgari, George Miroshnykov, Taras Glek, mozilla-c...@googlegroups.com
> The problem with requiring a bug number is that people don't always do
> that, and I'm interested in supporting as many different workflows as
> possible.
>
> My idea was basically the (b) option you listed. I think we should keep
> the API on the server side dumb, and don't try to make any guesses inside
> it. On the client side, we can definitely do a lot of smarts later on, by
> looking at the commit messages and pre-fill what gets delivered to the
> server based on the commit message if it's available, otherwise prompting
> the user as needed. What do you think about that?

It's a start. And that's all we need to do.

This leads me to another idea: patch first, bug later. There are a number of times where I want to make a minor change. I just want to code the patch first and deal with Bugzilla later. hg bzexport allows me to do this, although the UX could use some work. It would be rad if this tool allowed us to turn a bug-free push into a new bug automagically. Non-launch requirement though.

Do we have an etherpad or something listing all features and their respective priorities?

George Miroshnykov

unread,
Mar 5, 2014, 5:48:12 PM3/5/14
to Ehsan Akhgari, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com

On March 5, 2014 at 23:56:03 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 4:45 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:

On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:

So if we ignore squashed diffs, what is still missing?


The only thing on my mind is assigning the reviewer.
What else should we implement?
Choosing the reviewer.  Also, did you get a chance to try out the push to bug workflow as well?

You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?

I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.


Remember the case of:

Bug 555555 Part 1
Bug 555555 Part 2
Bug 666666

and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.

Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.

How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.

Also, maybe there’s another way we haven’t considered?

What we talked about last time was remembering which changesets were pushed as part of which bug, and when you receive an empty push the second time with a different bug number, reassigning the unused changesets from the last time in the chooser UI.  Does that make sense?

Let’s make sure I got it right:

  1. User does a `hg push` (or some equivalent command that triggers our hook and passes over the bug number)
  2. We authenticate him and get the list of his previously pushed changesets that don’t yet have review requests
  3. We combine that list with the changesets that he had just pushed
  4. We show the combined list of changesets in the UI
  5. When the review request is created, we scratch corresponding changeset off the list
  6. We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?



So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.

So we're not getting one of those squashed requests?  That sucks, since I wast to be able to have a permalink for the reviews on a bug.

I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.
Any suggestions how to do this, preferably without messing with the working directory?


Each review request (and, consequently, changeset) will have it’s own attached link.

That's not great...

The squashed review request may fix this.


If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?

Hmm, I think just one, in the interest of less comment spam.  Please look into the RB/Bugzilla integration to see what it offers.  Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210.  We might want to sync up with them.

Okay, got it.

George Miroshnykov

unread,
Mar 5, 2014, 5:53:53 PM3/5/14
to Gregory Szorc, Ehsan Akhgari, Taras Glek, mozilla-c...@googlegroups.com

I believe we have this:
https://github.com/ehsan/reviewtool

But it needs an update.

George Miroshnykov

unread,
Mar 6, 2014, 2:44:29 PM3/6/14
to mozilla-c...@googlegroups.com, Ehsan Akhgari, Gregory Szorc, Taras Glek
Hey everyone,

I’ve added the ability to select a reviewer:
https://cloudup.com/clRexrvtyK3
Sorry for the awful UI: apparently it’s not that easy to integrate Twitter’s typeahead.js into Twitter’s Bootstrap :)
I’ll fix the UI tomorrow, but I wanted to let you know that the functionality (and corresponding HTTP API) is already in place.

The usernames are provided by Review Board API, so later we’ll have to configure RB to use Mozilla’s LDAP or something like that.

Ehsan, could you please confirm that I understood our next step.
We’re going to start managing some state in changeset-chooser like this:
  1. User does a `hg push` (or some equivalent command that triggers our hook and passes over the bug number)
  2. We authenticate him and get the list of his previously pushed changesets that don’t yet have review requests
  3. We combine that list with the changesets that he had just pushed
  4. We show the combined list of changesets in the UI
  5. When the review request is created, we scratch corresponding changeset off the list
  6. We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Is that right? Am I missing something?

Other TODOs:
* squashed review requests
* Bugzilla integration

Thanks,
George

Ehsan Akhgari

unread,
Mar 6, 2014, 3:08:11 PM3/6/14
to Gregory Szorc, George Miroshnykov, Taras Glek, mozilla-c...@googlegroups.com
I think if we finish up what we're working on here, doing that on the client side would be easy.  You'd call into the API to file a new bug, and then you'll grab the bug number and push to it.
 
Do we have an etherpad or something listing all features and their respective priorities?

Not that I know of.  It's mostly in my head :/  I haven't had enough time to distill everything I have there in a good place yet.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Mar 6, 2014, 3:18:09 PM3/6/14
to George Miroshnykov, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com
Well, we have the tip changeset of the tree they're pushing I assume in the hook, so we don't need to look at all of the changesets I hope.
  1. We combine that list with the changesets that he had just pushed
  2. We show the combined list of changesets in the UI
  3. When the review request is created, we scratch corresponding changeset off the list
  4. We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?

Yeah I think something like that.  Please also pay attention to the use case of pushing the exact same branch twice in order to submit the same branch into two bugs for review.



So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.

So we're not getting one of those squashed requests?  That sucks, since I wast to be able to have a permalink for the reviews on a bug.

I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.

Which conflicts do you mean?

Any suggestions how to do this, preferably without messing with the working directory?

I'm not clear on what you mean to be honest.


Each review request (and, consequently, changeset) will have it’s own attached link.

That's not great...

The squashed review request may fix this.


If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?

Hmm, I think just one, in the interest of less comment spam.  Please look into the RB/Bugzilla integration to see what it offers.  Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210.  We might want to sync up with them.

Okay, got it.

--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks and sorry for the slow response.  FWIW please ping me on IRC if you think you're spending too much time waiting on me to reply.  I'll do my best to respond quickly but I'm very overloaded and sometimes like this I fail to respond in ~24 hours which is unacceptable!

Cheers,
Ehsan

Ehsan Akhgari

unread,
Mar 6, 2014, 3:22:08 PM3/6/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
Thanks George, this is nice!

FWIW the users come from Bugzilla.  Please ping either mcote, dkl or glob on #bmo and they can tell you the API you should use.

Please see my other email for the rest, and let me know if there is anything that is not clear.

Thanks!

--

George Miroshnykov

unread,
Mar 6, 2014, 4:41:52 PM3/6/14
to Ehsan Akhgari, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com

What I meant was that we have to show changesets for both previous and current pushes in case that current push will be empty.
Or should we simply have a logic like “if the current push is empty, show changesets from previous push”?


  1. We combine that list with the changesets that he had just pushed
  2. We show the combined list of changesets in the UI
  3. When the review request is created, we scratch corresponding changeset off the list
  4. We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?

Yeah I think something like that.  Please also pay attention to the use case of pushing the exact same branch twice in order to submit the same branch into two bugs for review.

I believe we can’t trigger hooks by pushing the exact same branch twice, because there are no outgoing changesets at all, but I’ll have to double-check that.
For now I suggest we assume “push or equivalent command that could trigger the hook” every time we use just “push”.
It can be a command provided by our custom Mercurial extension or something like that.




So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.

So we're not getting one of those squashed requests?  That sucks, since I wast to be able to have a permalink for the reviews on a bug.

I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.

Which conflicts do you mean?

I mean that if you have a history like this:


A -> B -> C

and you pick only A and C, then try to squash them, it may result in conflict because C may rely on changes from B.

That is what I’m trying to detect / work around.

We had a brief discussion with Taras and while we’re still not sure how to approach this problem, we can skip the squashed diff for now.
Instead, we can generate single parent review request with no diff attached.
We can then attach a link to this review request to Bugzilla bug.

If more review requests are later created for this bug, they will be added as children of the corresponding parent review request.
My description is probably confusing, but this is basically a squashed review request without the actual diff.

If it’s not good enough, we can later figure out how to generate the diff to attach to this parent review request.


Any suggestions how to do this, preferably without messing with the working directory?

I'm not clear on what you mean to be honest.


Each review request (and, consequently, changeset) will have it’s own attached link.

That's not great...

The squashed review request may fix this.


If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?

Hmm, I think just one, in the interest of less comment spam.  Please look into the RB/Bugzilla integration to see what it offers.  Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210.  We might want to sync up with them.

Okay, got it.

--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks and sorry for the slow response.  FWIW please ping me on IRC if you think you're spending too much time waiting on me to reply.  I'll do my best to respond quickly but I'm very overloaded and sometimes like this I fail to respond in ~24 hours which is unacceptable!

Sure, no worries!

Thanks,
George

George Miroshnykov

unread,
Mar 6, 2014, 4:48:49 PM3/6/14
to Ehsan Akhgari, Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek

On March 6, 2014 at 22:22:50 , Ehsan Akhgari (eh...@mozilla.com) wrote:

Thanks George, this is nice!

FWIW the users come from Bugzilla.  Please ping either mcote, dkl or glob on #bmo and they can tell you the API you should use.

Yes, eventually they'll come from Bugzilla, but changeset-chooser shouldn’t probably know about it.
We can’t simply take Bugzilla user and assign him as a reviewer inside RB - we need to have a user on RB side.
Right now I’m using the RB API to get the list of RB users that can be assigned as reviewers.
Syncing Bugzilla and RB users should be handled as part of RB <=> Bugzilla integration, so it’s probably out of scope of changeset-chooser.

Also, we should probably come up with a more creative name :)

Thanks,
George

George Miroshnykov

unread,
Mar 7, 2014, 7:33:16 AM3/7/14
to mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek, Ehsan Akhgari

Hey everyone,

I've added a way to specify a bug number during hg push. Unfortunately I couldn't use --remotecmd for this because it doesn't allow any arguments.

Instead, for now we'll have to do this:

BUG=31337 hg push -e 'ssh -o SendEnv=BUG'

The output looks like this:

pushing to ssh://review.infinity.com.ua//var/hg/playground-central
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: choose changesets to review: http://review.infinity.com.ua:5000/?changeset=b43890fa6f7411c1f3f31bd2c701c1656892d608&bug=31337

You can see both changeset and bug number in the link above.

HTML UI retrieves the corresponding bug summary via Bugzilla REST API and displays it at the top of the page.

Obviously, all new review requests are associated with the given bug number. I've configured everything to point to bmo (just for testing), so we can see that links actually work.

Here's the review request I've created from the above link: http://review.infinity.com.ua/r/11/diff/

Thanks, George

Ehsan Akhgari

unread,
Mar 10, 2014, 1:47:24 PM3/10/14
to George Miroshnykov, Taras Glek, Gregory Szorc, mozilla-c...@googlegroups.com
More like "if the current push is empty and we have a record of the same user pushing the same tip before for a different bug, take a look at all of the changesets pushed the last time, filter out the one's that were set for review at that time, and present the reset in the changeset chooser UI to create a new review request based on for the new bug."
 

  1. We combine that list with the changesets that he had just pushed
  2. We show the combined list of changesets in the UI
  3. When the review request is created, we scratch corresponding changeset off the list
  4. We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?

Yeah I think something like that.  Please also pay attention to the use case of pushing the exact same branch twice in order to submit the same branch into two bugs for review.

I believe we can’t trigger hooks by pushing the exact same branch twice, because there are no outgoing changesets at all, but I’ll have to double-check that.
For now I suggest we assume “push or equivalent command that could trigger the hook” every time we use just “push”.
It can be a command provided by our custom Mercurial extension or something like that.

So when you push to either an hg or a git server, at some point the decision on what things are missing on the server that the client has locally is made, and then that stuff is pushed to the other side.  We need to hook into the hg (and in the future the git) server before that time, otherwise the push will simply terminate without pushing any new commits to the other side.  Can you please research what the available options are?  So far, I've been assuming that there is either some kind of a hook which is invoked at that time or that we can make one.

Note that I do want us to go through as much of the usual push command as possible so that we don't have to reinvent its logic.




So the TODO list looks like this:

  1. Allow choosing the reviewer
  2. Associate review requests with a bug number, if provided
Anything else we should try out?

What Taras said, but please try to work on the push to bug workflow first!

Got it.

So right now we’re creating one review request per changeset.

So we're not getting one of those squashed requests?  That sucks, since I wast to be able to have a permalink for the reviews on a bug.

I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.

Which conflicts do you mean?

I mean that if you have a history like this:


A -> B -> C
and you pick only A and C, then try to squash them, it may result in conflict because C may rely on changes from B.

That is what I’m trying to detect / work around.

There is no good solution there, so we shouldn't try to solve that problem!  It's best to try to work around it somehow.

We had a brief discussion with Taras and while we’re still not sure how to approach this problem, we can skip the squashed diff for now.
Instead, we can generate single parent review request with no diff attached.
We can then attach a link to this review request to Bugzilla bug.

Yeah, my understanding was that the squashed diff in the squashed review request is not used for anything inside ReviewBoard, so I suggested to simply generate a A:C diff in the example above and be done with it.  Or we could do what Taras suggests and not generate a squashed thing at all.  The important thing is to ensure that it won't be used for anything (because it will be wrong no matter which one of these suggestions we pick).  And once we ensure that, by definition it doesn't matter what's in the diff any more!

If more review requests are later created for this bug, they will be added as children of the corresponding parent review request.

Ensuring the above, this should be fine too.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Mar 10, 2014, 1:49:27 PM3/10/14
to George Miroshnykov, Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek
Agreed.  But I'd still like to understand what our plans are here.  I asked https://bugzilla.mozilla.org/show_bug.cgi?id=515210#c8 on the bug, please make sure you're CCed there so that you will be in the loop!

Also, we should probably come up with a more creative name :)

Suggestions welcome!  I'm terrible at naming things, and don't really care about the names all that much.  :-)

Cheers,
Ehsan

Ehsan Akhgari

unread,
Mar 10, 2014, 2:00:17 PM3/10/14
to George Miroshnykov, mozilla-c...@googlegroups.com, Gregory Szorc, Taras Glek
On Fri, Mar 7, 2014 at 7:33 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:

Hey everyone,

I've added a way to specify a bug number during hg push. Unfortunately I couldn't use --remotecmd for this because it doesn't allow any arguments.

Instead, for now we'll have to do this:

BUG=31337 hg push -e 'ssh -o SendEnv=BUG'

I think this command would so just fine, since we'll be hiding this behind a script of some sort.  Thankfully things will at least be simple for the git version once we work on it.

The output looks like this:

pushing to ssh://review.infinity.com.ua//var/hg/playground-central
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: choose changesets to review: http://review.infinity.com.ua:5000/?changeset=b43890fa6f7411c1f3f31bd2c701c1656892d608&bug=31337

This looks good!

You can see both changeset and bug number in the link above.

HTML UI retrieves the corresponding bug summary via Bugzilla REST API and displays it at the top of the page.

Obviously, all new review requests are associated with the given bug number. I've configured everything to point to bmo (just for testing), so we can see that links actually work.

Here's the review request I've created from the above link: http://review.infinity.com.ua/r/11/diff/

This all looks great!  Thanks a lot.

Have you made any progress on the multi-bug branch use case yet?  I'm eagerly awaiting some results there.

Also, I'd like to spend some time now going through your code.  Where should I be looking?

Cheers,
Ehsan

George Miroshnykov

unread,
Mar 11, 2014, 11:57:21 AM3/11/14
to Ehsan Akhgari, Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek

I’ve just finished it - you can try it out now.
After you push the “create review request” button, you should get a link to a parent review request.
This review request is associated with a bug number (instead of changeset ID), so it can be linked to Bugzilla.
If you later push more changesets for the same bug number, they should be added to that parent review request, as you would expect.

As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.

Also, I'd like to spend some time now going through your code.  Where should I be looking?

Sure, here it is:
https://github.com/laggyluke/changeset-chooser
It’s not production quality yet, but should be at least somewhat readable :)
The most interesting part is this:
https://github.com/laggyluke/changeset-chooser/blob/master/lib/routes.js

Thanks,
George

Ehsan Akhgari

unread,
Mar 13, 2014, 11:05:28 AM3/13/14
to George Miroshnykov, Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek
So I tried this, and there is something strange happening here.  I first pushed a review request for bug 123 and then tried to push another one for bug 456 but it didn't work because of the problem below.  Then I tried to push a part2 for bug 123 and I got http://review.infinity.com.ua/r/16/.  It seems like review 15 and 17 are treated differently by RB somehow.

As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.

Do you have a solution for this yet?  We definitely need to figure out a way to solve this, otherwise this workflow won't work.
Also, I'd like to spend some time now going through your code.  Where should I be looking?

Sure, here it is:
https://github.com/laggyluke/changeset-chooser
It’s not production quality yet, but should be at least somewhat readable :)
The most interesting part is this:
https://github.com/laggyluke/changeset-chooser/blob/master/lib/routes.js

Thanks!  I am unfortunately extremely busy with some stuff that came up recently so I'll try to do this either this weekend or next week.

Cheers,
Ehsan

George Miroshnykov

unread,
Mar 14, 2014, 6:48:03 AM3/14/14
to Ehsan Akhgari, Gregory Szorc, mozilla-c...@googlegroups.com, Taras Glek

Ah, I forgot that special case again.
Your first push for 123 introduced just a single changeset, so it should have created one review request.
Instead, it created two: a child review request with a diff (15) and a parent review request without a diff (16).
Then you pushed another changeset for 123 and that created another child review request (17) and associated it with 16.

Here we have a mix of technical and logical issues.

Technical issue is that AFAIK Review Board API won’t let us search review requests by bug number:
https://code.google.com/p/reviewboard/issues/detail?id=1398

So I had to abuse the ‘commit’ field of review request like this:

  • children review requests have their true commit ID in the “commit_id” field
  • parent review requests have corresponding Bugzilla bug number in the “commit_id” field
The original solution for git (rb-repo) did something similar and saved the branch name in the “commit_id” field, see:
https://github.com/mikeconley/rb-repo/blob/master/post-receive#L206

I’ll try to keep it short, but here’s where the problems start.
On the first push to bug 123, we create a single review request with commit_id set to a true commit id.
The bug number is saved in the bugs_closed field as well.
We take the RB link (e.g. /r/1) and attach it to corresponding Bugzilla bug.
Next, we push the second commit for bug 123.
At this point we have to check if there are any previous commits / review requests created for bug 123.
There’s no easy way for us to do it right now, but let’s imagine we extend RB API to allow it.
At this point we find out that we have two commits for bug 123, so we have two options:
  1. Create new review request for the second commit (/r/2) and a separate parent review request for both commits (/r/3). We then have to update Bugzilla bug and change the review request link from /r/1 to /r/3.
  2. Transform the initial review request /r/1 into a parent review request (in order not to change the link in Bugzilla) and create two new review requests for each commit. This means we’ll loose all the review remarks that were already person on /r/1, because AFAIK we can’t simply move them to the newly created children review requests.
Personally I don’t like any of those options.
We need a concept of a branch on RB side that would group individual review requests or commits.
AFAIK this feature is planned, but not yet available in RB.


As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.

Do you have a solution for this yet?  We definitely need to figure out a way to solve this, otherwise this workflow won't work.

Taking a step back, I see a couple of options here:

  1. Require bug numbers in commit messages. As we’ve discussed before, this requirement is unrealistic.
  2. Allow manually specifying bug numbers in the HTML UI. You’ll have a text field next to each changeset where you can manually put the bug number for that changeset.
  3. The combination of 1 and 2: pre-populate the text fields with bug numbers extracted from the corresponding commit messages, when available.
  4. Use Mercurial bookmarks. Putting the bug number into bookmark name would make it available to the Mercurial server hook.
  5. Create an external tool (Mercurial extension or mach command) that will take the bug number from the command line and invoke “hg push” with the correct env var. The downside is that you need to install some extra tool on each client.
  6. Switch to git and use branch names :)
Again, I don’t like any of these approaches, but 3 looks most realistic.
As usual, there’s a chance I’m missing something.

This question is really related with my intent to gather more requirements, so I hope I’ll have some kind of list today and the picture will be more clear.

Thanks,
George

Reply all
Reply to author
Forward
0 new messages