Hey everyone,As we've discussed before on IRC, I've been working on a service that should allow users to create code review requests based on the changesets pushed to the special "review" repository.For now, the workflow looks like this:$ hg push -f reviewpushing to ssh://review.infinity.com.ua//var/hg/playground-centralsearching for changesremote: adding changesetsremote: adding manifestsremote: adding file changesremote: added 1 changesets with 1 changes to 1 filesremote: choose changesets to review: http://review.infinity.com.ua:5000/?dbb8d013a3f3c32f0bdd4373e128f8cab11c9cd2You can see the UI by opening the link above.I've already created a review request for that changeset, so you'll have to push your own changesets to play around with it.Ehsan should already have access to this repository, here's the steps for anyone interested:1. Send me your GitHub username (or just some login / SSH public key).
2. Once you have an account, clone the playground repository:3. Make some changes, commit and push (force push?).
4. After push, you should see the URL to changeset-chooser, open it.5. Choose the changesets to create review requests by clicking the corresponding table rows.6. Click the "Create review request" button.
7. You should see the links to review requests in the rightmost column.The code of the changeset-chooser is available here:I've made this prototype as a separate tool using Node.js, but it should betrivial to port it to Django and/or integrate into RB as soon as we'recomfortable with the general approach.So the next roadblock I have is a problem with creating squashed review requests.I couldn't find a way to squash arbitrary changesets without touching filesin the working repository.But that would require some kind of exclusive lock on the repository(so no simultaneous pushes from different users)and would generally complicate things a lot.
Hey everyone,As we've discussed before on IRC, I've been working on a service that should allow users to create code review requests based on the changesets pushed to the special "review" repository.For now, the workflow looks like this:$ hg push -f reviewpushing to ssh://review.infinity.com.ua//var/hg/playground-centralsearching for changesremote: adding changesetsremote: adding manifestsremote: adding file changesremote: added 1 changesets with 1 changes to 1 filesremote: choose changesets to review: http://review.infinity.com.ua:5000/?dbb8d013a3f3c32f0bdd4373e128f8cab11c9cd2You can see the UI by opening the link above.I've already created a review request for that changeset, so you'll have to push your own changesets to play around with it.
Ehsan should already have access to this repository, here's the steps for anyone interested:1. Send me your GitHub username (or just some login / SSH public key).2. Once you have an account, clone the playground repository:hg clone ssh://user...@review.infinity.com.ua//var/hg/playground-central3. Make some changes, commit and push (force push?).4. After push, you should see the URL to changeset-chooser, open it.5. Choose the changesets to create review requests by clicking the corresponding table rows.6. Click the "Create review request" button.7. You should see the links to review requests in the rightmost column.
The code of the changeset-chooser is available here:I've made this prototype as a separate tool using Node.js, but it should betrivial to port it to Django and/or integrate into RB as soon as we'recomfortable with the general approach.So the next roadblock I have is a problem with creating squashed review requests.I couldn't find a way to squash arbitrary changesets without touching filesin the working repository.But that would require some kind of exclusive lock on the repository(so no simultaneous pushes from different users)and would generally complicate things a lot.Any suggestions on how to approach this are much appreciated.
On 3/4/14, 9:37 AM, George Miroshnykov wrote:
> Hey everyone,
>
> As we've discussed before on IRC, I've been working on a service that
> should allow users to create code review requests based on the
> changesets pushed to the special "review" repository.
>
> For now, the workflow looks like this:
> $ hg push -f review
> pushing to ssh://review.infinity.com.ua//var/hg/playground-central
> searching for changes
> remote: adding changesets
> remote: adding manifests
> remote: adding file changes
> remote: added 1 changesets with 1 changes to 1 files
> remote: choose changesets to review:
> http://review.infinity.com.ua:5000/?dbb8d013a3f3c32f0bdd4373e128f8cab11c9cd2
>
> You can see the UI by opening the link above.
> I've already created a review request for that changeset, so you'll have
> to push your own changesets to play around with it.
>
> Ehsan should already have access to this repository, here's the steps
> for anyone interested:
>
> 1. Send me your GitHub username (or just some login / SSH public key).
IT was able to provide me a dump of all SSH keys for Mozillians in bug
928461. If you file a bug, they can probably do the same for you.
Good idea, but is this the right bug number?
That bug was about adding a new SSH key, not providing any key dumps.
Actually, it would be great to have a place where you can grab all up-to-date public keys.
For now, I’ve added user “gps” with keys from here:
https://github.com/indygreg.keys
In case you need it, you have sudo access on that box.
> 2. Once you have an account, clone the playground repository:
>
> hg clone ssh://user...@review.infinity.com.ua//var/hg/playground-central
>
> 3. Make some changes, commit and push (force push?).
FWIW, with Mercurial 2.9 and later, if you push a bookmark (hg push -B),
you don't need push -f.
If you are writing a Mercurial extension that does the pushing, you can
pass an argument to localrepo.push so the new heads warning/push
requirement isn't an issue.
> 4. After push, you should see the URL to changeset-chooser, open it.
> 5. Choose the changesets to create review requests by clicking the
> corresponding table rows.
> 6. Click the "Create review request" button.
Hopefully we can eventually do this client-side in the Mercurial client.
Please ensure there is an API for this so users don't need to open a
browser to initiate code review.
Yup, that was the plan from day one.
The current UI is just a static HTML talking to that API.
Here’s the draft of the API:
https://github.com/laggyluke/changeset-chooser#api
There’s no auth yet, because the repo is public anyway.
> 7. You should see the links to review requests in the rightmost column.
>
> The code of the changeset-chooser is available here:
> https://github.com/laggyluke/changeset-chooser
>
> I've made this prototype as a separate tool using Node.js, but it should be
> trivial to port it to Django and/or integrate into RB as soon as we're
> comfortable with the general approach.
>
> So the next roadblock I have is a problem with creating squashed review
> requests.
> I couldn't find a way to squash arbitrary changesets without touching files
> in the working repository.
>
> But that would require some kind of exclusive lock on the repository
> (so no simultaneous pushes from different users)
> and would generally complicate things a lot.
There is already a lock acquired on the repo during many parts of push.
I wouldn't be too worried about multiple clients racing for a lock - not
with our current load anyway.
That being said, you imply we're manipulating files in the working copy
(as opposed to the usually hidden store). We should not need to maintain
a checkout/working copy on the server. That will be problematic from a
performance perspective. Although, SSDs and Mercurial 2.6+ should go a
long way. We should be able to perform this squashing with Mercurial
APIs without touching the working directory. If we can't, long term we
should patch Mercurial to make that possible.
The current implementation talks to a local repo using the standard Mercurial CLI client.
I guess this is not a strict requirement, but it was easier to get this off the ground like that.
What we need is basically three commands:
1. Get the list of changesets we’re interested in.
Which changesets exactly we’re interested in is a tricky question in and of itself.
For now, we assume we’re interested in everything we’ve just pushed.
That means grabbing everything from the first pushed changeset to the tip:
hg log --rev <first-pushed-changeset>:tip
2. Get the diff of the individual changeset.
This is simple: hg diff --rev <rev>
3. Get the squashed diff of all the changesets that user have picked.
That’s what I’m struggling with.
So let’s say user has just pushed the following changesets:
A -> B -> C -> D
And he picked B and D for review.
We need to squash them somehow and I don’t know how to do this, apart from:
1. revert to A
2. apply B
3. apply D
4. get the diff
5. discard everything and revert to D
And we clearly need exclusive access to the (local?) repository to do this.
Could we offload the squashing to the client? We could have the client
generate a temporary squashed commit, push it, then strip it from history.
I guess we can, but it would probably require the same effort from client: the working copy should not have any local modifications etc.
I was hoping to find some less intrusive solutions, while keeping this approach as a last resort.
Thanks,
George
I think this is something for us to discuss.
Do you want the user to be able to choose reviewer on changeset-chooser side instead of RB side?
We can explicitly tell RB to use particular reviewer or review group via its API.
The code of the changeset-chooser is available here:I've made this prototype as a separate tool using Node.js, but it should betrivial to port it to Django and/or integrate into RB as soon as we'recomfortable with the general approach.So the next roadblock I have is a problem with creating squashed review requests.I couldn't find a way to squash arbitrary changesets without touching filesin the working repository.But that would require some kind of exclusive lock on the repository(so no simultaneous pushes from different users)and would generally complicate things a lot.Any suggestions on how to approach this are much appreciated.You shouldn't need to use the working directory at all. Can you please explain what you're trying to do?
Please see my reply to Gregory.
Also, I’ve just found this, maybe we could use it:
http://cyberelk.net/tim/patchutils/man/rn01re02.html
Thanks,
George
"You are not authorized to access bug #969515.” :(
I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world.
Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.
But I guess we already have most of the accounts set up on that box, so this is not a problem.
My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
This mechanism will correctly pick up partial pushes.
> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
3. There is a command server thing I was planning to use:
http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.
> 3. Get the squashed diff of all the changesets that user have picked.
> That’s what I’m struggling with.
> So let’s say user has just pushed the following changesets:
> A -> B -> C -> D
> And he picked B and D for review.
> We need to squash them somehow and I don’t know how to do this, apart from:
> 1. revert to A
> 2. apply B
> 3. apply D
> 4. get the diff
> 5. discard everything and revert to D
Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster.
This does look bad, we just tried to cover some possible corner cases.
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…
Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.
Thanks,
George
I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world.Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.
But I guess we already have most of the accounts set up on that box, so this is not a problem.
I had a mechanism for dealing with this in my experimental extension:
1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
2) Find the distance (in terms of changeset count) between the pushed head and the head in question
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
This mechanism will correctly pick up partial pushes.
> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
3. There is a command server thing I was planning to use:
http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.
> 3. Get the squashed diff of all the changesets that user have picked.
> That’s what I’m struggling with.
> So let’s say user has just pushed the following changesets:
> A -> B -> C -> D
> And he picked B and D for review.
> We need to squash them somehow and I don’t know how to do this, apart from:
> 1. revert to A
> 2. apply B
> 3. apply D
> 4. get the diff
> 5. discard everything and revert to D
Why would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster.This does look bad, we just tried to cover some possible corner cases.
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:I just thought of a more generic and possibly better way to do this: phases.
> I had a mechanism for dealing with this in my experimental extension:
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases
If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.
$ hg log -r 'draft() & ancestors(tip)'
Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests."It depends" and "ask a license geek" (like Gerv).
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
If you have comprehensive test coverage it shouldn't be a problem.
> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
Yes, python-hglib uses the command server.
> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.
The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.
Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.
Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.
tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.
On Tuesday, March 4, 2014 3:22:55 PM UTC-8, Ehsan Akhgari wrote:
> Hmm, ok so based on the above I definitely prefer the command server over directly using the API. But I still think we should use the CLI interface for now while we get everything working and then we can start worrying about this if the performance of the CLI layer proves to be insufficient.
Or we can just use the command server now.
The command server will execute things faster and it avoids technical debt. Furthermore, if you use an existing library like python-hglib, it's already solved the technical problem of parsing output into data structures, which saves you even more time now.
--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
On Tuesday, March 4, 2014 7:12:19 PM UTC-8, Ehsan Akhgari wrote:It's not clear to me why we're writing code that won't ship.
> Greg, we're not close to launching this. This project literally got started last week, and we're currently trying to focus on the basics. I appreciate you trying to ensure that we keep the bigger picture in mind, and it's definitely helpful to have you point this out when we forget, but I'm not saying that we should not use the command server. All I'm saying is that we should not switch to using it _now_. The current code may even need to be rewritten in python if we want it to be integrated with RB. I think worrying about the performance of the code in its *current* shape is a mistake.
>
> Does this address your concern?
It's not clear to me because the goals/phases of this project have not been explained to me. It sounded like we were trying to build something that would facilitate push-based, ReviewBoard-based code review to replace Bugzilla's Splinter and that we are on an aggressive timeline to have that rolled out. It sounds like part of that is incorrect.
Could you please state the goals and deliverables of this project so I can restrain the horse the next time it wants to leave the stable?
Got it.
Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.None of the workflow I laid out above is Mozilla specific unless I'm missing something. (And it would be fine if it were -- just trying to be clear!)
I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).
Our current approach of only looking at the pushed changesets may be too simple, but yet generic.
This mechanism will correctly pick up partial pushes.
> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory. You probably want to specify two revisions there.
Sorry, wrong command.
I’m actually using `hg export --git --rev <rev>` which should do the right thing.
As a hint for the future, try running this command in the mercurial repo on the server:hg update 0This command basically updates to revision 0, which is fancy talk for wiping out the entire working directory. That way, if you end up relying on the working directory accidentally, you'll know very fast!
That’s a neat trick, thanks!
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
Agreed, let's focus on correctness first, and then on performance.But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?No, we can choose whatever license we like. Our RB changes will probably be MIT. But where is this FAQ BTW?
Sorry, it wasn’t in the FAQ, it’s here:
http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."
Technically we don’t need a squashed diff at all, see below.
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.Even that won't work. My suggestion is for now to try to see if there is a workaround for this RB limitation. I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...
RB itself doesn’t need a squashed diff at all.
I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.
Thanks,
George
Thanks!
Ehsan
On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:
> I had a mechanism for dealing with this in my experimental extension:
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
I just thought of a more generic and possibly better way to do this: phases.
Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases
If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.
$ hg log -r 'draft() & ancestors(tip)'
Wow, this is cool!
I’ll read it up, thanks.
Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.
Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
"It depends" and "ask a license geek" (like Gerv).
> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
If you have comprehensive test coverage it shouldn't be a problem.
> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.
Yes, python-hglib uses the command server.
The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.
Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.
Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.
tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.
As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.
Thanks,
George
On March 5, 2014 at 12:56:24 AM, Ehsan Akhgari (eh...@mozilla.com) wrote:
Got it.
Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.
I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).
Our current approach of only looking at the pushed changesets may be too simple, but yet generic.
Sorry, wrong command.
I’m actually using `hg export --git --rev <rev>` which should do the right thing.
That’s a neat trick, thanks!
Sorry, it wasn’t in the FAQ, it’s here:
http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."
Technically we don’t need a squashed diff at all, see below.
RB itself doesn’t need a squashed diff at all.
I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.
We don't need this squashed view. Since everything is kept in hg, this can be implemented via some hg web view later.
Thanks,
George
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
None of the workflow I laid out above is Mozilla specific unless I'm missing something. (And it would be fine if it were -- just trying to be clear!)
This mechanism will correctly pick up partial pushes.
> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>
I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory. You probably want to specify two revisions there.
As a hint for the future, try running this command in the mercurial repo on the server:
hg update 0This command basically updates to revision 0, which is fancy talk for wiping out the entire working directory. That way, if you end up relying on the working directory accidentally, you'll know very fast!
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
Agreed, let's focus on correctness first, and then on performance.But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?No, we can choose whatever license we like. Our RB changes will probably be MIT. But where is this FAQ BTW?
2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…
Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.
Even that won't work. My suggestion is for now to try to see if there is a workaround for this RB limitation. I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...
Thanks!
Ehsan
On March 4, 2014 at 23:39:03 , Gregory Szorc (gsz...@mozilla.com) wrote:
"You are not authorized to access bug #969515.” :(
Unfortunately I’m not a Mozillian, so if public SSH keys are not considered… err… public :), I’m not sure I’m allowed to request them.
But I guess we already have most of the accounts set up on that box, so this is not a problem.
My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
3. There is a command server thing I was planning to use:
http://mercurial.selenic.com/wiki/CommandServer
But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
So while it avoid possible licensing issues, it is still of no use to us.
This does look bad, we just tried to cover some possible corner cases.
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…
Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.
Thanks,
George
--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Fail. It's bug 969515. I had a feeling that if enough people request this data that IT will make it available through some automated means. A problem is they are treating this data as PII, so we can't just expose it to the world.
Yes, it is a tricky question. The mechanism of only examining pushed changesets is somewhat flawed in that it will fail to detect shared ancestor changesets that didn't change between pushes. e.g. if I have a 10 part series consisting of changesets 1..10, if I push 1..10, then make changes to 5, the second push will only contain 5..10. If you rebase, you'll likely repush 1..10. I had a mechanism for dealing with this in my experimental extension: 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default. 2) Find the distance (in terms of changeset count) between the pushed head and the head in question 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head This mechanism will correctly pick up partial pushes.
2. Get the diff of the individual changeset. This is simple: hg diff --rev <rev>
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
3. Get the squashed diff of all the changesets that user have picked. That’s what I’m struggling with. So let’s say user has just pushed the following changesets: A -> B -> C -> D And he picked B and D for review. We need to squash them somehow and I don’t know how to do this, apart from: 1. revert to A 2. apply B 3. apply D 4. get the diff 5. discard everything and revert to DWhy would we ever perform a review without C applied? That seems... wrong. IMO if we want to skip changesets, they should be at the beginning or end (we'd change the range of applicable commits). Cherry picking changesets just seems like a recipe for disaster.
Good idea, but is this the right bug number?
That bug was about adding a new SSH key, not providing any key dumps.Actually, it would be great to have a place where you can grab all up-to-date public keys.
For now, I’ve added user “gps” with keys from here:
https://github.com/indygreg.keys
In case you need it, you have sudo access on that box.
Yup, that was the plan from day one.
The current UI is just a static HTML talking to that API.
Here’s the draft of the API:
https://github.com/laggyluke/changeset-chooser#apiThere’s no auth yet, because the repo is public anyway.
The current implementation talks to a local repo using the standard Mercurial CLI client.
I guess this is not a strict requirement, but it was easier to get this off the ground like that.
What we need is basically three commands:
1. Get the list of changesets we’re interested in.
Which changesets exactly we’re interested in is a tricky question in and of itself.
For now, we assume we’re interested in everything we’ve just pushed.
That means grabbing everything from the first pushed changeset to the tip:
hg log --rev <first-pushed-changeset>:tip
2. Get the diff of the individual changeset.
This is simple: hg diff --rev <rev>
3. Get the squashed diff of all the changesets that user have picked.
That’s what I’m struggling with.
So let’s say user has just pushed the following changesets:
A -> B -> C -> D
And he picked B and D for review.
We need to squash them somehow and I don’t know how to do this, apart from:
1. revert to A
2. apply B
3. apply D
4. get the diff
5. discard everything and revert to D
And we clearly need exclusive access to the (local?) repository to do this.
I guess we can, but it would probably require the same effort from client: the working copy should not have any local modifications etc.
I had a mechanism for dealing with this in my experimental extension:
1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
2) Find the distance (in terms of changeset count) between the pushed head and the head in question
3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push headPlease see above, we don't need any of this complexity for the purposes of this tool. Which is also why we won't need to deal with bookmarks, or try to sync up with the upstream repo(s).
My bad - first time I couldn’t get this logic even after looking at the code.
Is this the approach you’re using now? Any false-positives or something?George, I thought we already agreed as of last week that the above is not a problem for us. What changed since then?Nothing changed. The current implementation sticks to what we’ve agreed upon: we only consider changesets that we’ve just pushed.
I’m just looking at other options, because this simple approach may not be the most convenient for our users.
I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
None of the workflow I laid out above is Mozilla specific unless I'm missing something. (And it would be fine if it were -- just trying to be clear!)I was referring to Greg’s approach as being Mozilla-specific (unified repo, searching for a shortest path to multiple heads etc.).
Our current approach of only looking at the pushed changesets may be too simple, but yet generic.
This mechanism will correctly pick up partial pushes.
> 2. Get the diff of the individual changeset.
> This is simple: hg diff --rev <rev>I think that is what's causing you to require a working directory, since that is asking Mercurial to generate a diff between <rev> and the working directory. You probably want to specify two revisions there.Sorry, wrong command.
I’m actually using `hg export --git --rev <rev>` which should do the right thing.
You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
Agreed, let's focus on correctness first, and then on performance.But it’s probably a good opportunity to raise a number of related questions:
1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?No, we can choose whatever license we like. Our RB changes will probably be MIT. But where is this FAQ BTW?Sorry, it wasn’t in the FAQ, it’s here:
http://mercurial.selenic.com/wiki/MercurialApi
"Using this API is a strong indication that you're creating a "derived work" subject to the GPL. Before going any further, read the License page."
Let’s say you clone m-c, commit something, pull m-c, commit something again etc.
This way you end up with your own commits mixed up with upstream commits from m-c.
Hm, I’m not even sure how the algorithm that you’ve described above would deal with this.
But I’m not sure how ofter this actually happens either…Right now I really don’t see a clean solution apart from emulating git’s approach by having a “master” bookmark and comparing our pushes against it.
Though I may be wrong as I’m still new to Mercurial.Even that won't work. My suggestion is for now to try to see if there is a workaround for this RB limitation. I mean, I imagine that the sqaushed diff is not used for anything since we don't perform reviews on the squashed RB "review" objects...RB itself doesn’t need a squashed diff at all.
I believe `rb-repo` project used squashed diffs as a convenience for reviewer: you can either look at each individual diff or you can look at a single squashed diff with all the changes combined.
We already have the first part, now I’m looking for a way to implement the second one, if that’s what we need.
On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:
> I had a mechanism for dealing with this in my experimental extension:
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
I just thought of a more generic and possibly better way to do this: phases.
Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases
If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.
$ hg log -r 'draft() & ancestors(tip)'Wow, this is cool!
I’ll read it up, thanks.Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
"It depends" and "ask a license geek" (like Gerv).
> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
If you have comprehensive test coverage it shouldn't be a problem.
> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.
Yes, python-hglib uses the command server.
The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.
Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.
Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.
tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.
On March 5, 2014 at 21:33:16 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 8:46 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:
> I had a mechanism for dealing with this in my experimental extension:
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
I just thought of a more generic and possibly better way to do this: phases.
Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases
If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.
$ hg log -r 'draft() & ancestors(tip)'Wow, this is cool!
I’ll read it up, thanks.Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
You're not the first one! :-) Please keep asking questions in the future, and I and others here will be happy to help.Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.Yeah, one of the reasons why I came up with the design which I have proposed is that is really sticks to the absolute minimum, and doesn't make any assumptions on people's workflow. That will give us a flexible solution which will stand robust in the face of people changing their workflows. I'm still convinced that we can make the simple design work well.(FWIW in case I have not mentioned it earlier, that design is based on how gerrit works, any smartness should be attributed to those folks, not myself!)
I was more concerned that the lowest common denominator may be so low that we won’t actually solve the problem.
But again, I’m watching from the sidelines, so if you say that our current approach is good enough, let’s stick to it.
I haven’t worked with gerrit before, I’ll look into it in my spare time.
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
"It depends" and "ask a license geek" (like Gerv).
> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
If you have comprehensive test coverage it shouldn't be a problem.
> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.
Yes, python-hglib uses the command server.
The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.
Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.
Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.
tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.Sounds good.More about the squashed diff issue, please see what Taras suggested. Basically, if you can, just ignore the squashed view. It's not important as long as it doesn't stand in your way to make further progress.
So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?
You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.
So the TODO list looks like this:
Thanks,
George
On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:
So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?
Choosing the reviewer. Also, did you get a chance to try out the push to bug workflow as well?
On March 5, 2014 at 21:33:16 , Ehsan Akhgari (eh...@mozilla.com) wrote:
I was more concerned that the lowest common denominator may be so low that we won’t actually solve the problem.
But again, I’m watching from the sidelines, so if you say that our current approach is good enough, let’s stick to it.I haven’t worked with gerrit before, I’ll look into it in my spare time.
So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?
On Wed, Mar 5, 2014 at 8:46 AM, George Miroshnykov <gmiros...@rebbix.com> wrote:
On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:
On Tuesday, March 4, 2014 2:18:01 PM UTC-8, George Miroshnykov wrote:
> I had a mechanism for dealing with this in my experimental extension:
>
> 1) Iterate over all heads for the major landing trees (central, inbound, fx-team, etc). You can use https://hg.stage.mozaws.net/gecko/ and examine all bookmarks matching the pattern */default.
> 2) Find the distance (in terms of changeset count) between the pushed head and the head in question
> 3) Assume the review consists of all the changesets between the landing head with the shortest distance to the push head
> My bad - first time I couldn’t get this logic even after looking at the code.
> Is this the approach you’re using now? Any false-positives or something?
> I was trying to generalise this, so it wouldn’t be Mozilla-specific, but I guess we can’t do that without changing everyone’s workflow.
I just thought of a more generic and possibly better way to do this: phases.
Mercurial keeps track of which changesets have been "published" and makes them a lot harder to accidentally edit. This is called phases. See http://mercurial.selenic.com/wiki/Phases
If the review repository pulled in {central, inbound, fx-team, etc} (via a CRON on the server) and the review repository was marked as non-publishing (it should be), the server could deduce review-relevant changesets by finding all non-public ancestors of a given changeset. e.g.
$ hg log -r 'draft() & ancestors(tip)'Wow, this is cool!
I’ll read it up, thanks.Unfortunately, Git doesn't track which commits have been pushed, so you'll have to resort to your original idea or DAG walking there.Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
You're not the first one! :-) Please keep asking questions in the future, and I and others here will be happy to help.
Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.
Yeah, one of the reasons why I came up with the design which I have proposed is that is really sticks to the absolute minimum, and doesn't make any assumptions on people's workflow. That will give us a flexible solution which will stand robust in the face of people changing their workflows. I'm still convinced that we can make the simple design work well.
(FWIW in case I have not mentioned it earlier, that design is based on how gerrit works, any smartness should be attributed to those folks, not myself!)
> You should write some Python and talk to the Mercurial API directly (preferred) or use python-hglib to open a persistent process. The reason is performance: opening the repo multiple times for a repo the size of mozilla-central has a *lot* of overhead, especially when you need to resolve manifests.
> Yes, we should definitely look into performance as soon as we have at least somewhat clear vision of usage patterns.
> I didn’t start with Mercurial API because I wanted to bootstrap the whole thing quicker.
> But it’s probably a good opportunity to raise a number of related questions:
> 1. Mercurial’s FAQ mentions that basically any code that uses Mercurial API should also be released under GPL.
> Personally I don’t have a preference, but AFAIK Mozilla uses it’s own GPL-incompatible license - is this a problem for us?
"It depends" and "ask a license geek" (like Gerv).
> 2. Again, as per FAQ, Mercurial API is considered unstable - future versions can change incompatibly.
> I’m new to Mercurial, so I don’t know how big is the risk - should we worry about it?
If you have comprehensive test coverage it shouldn't be a problem.
> 3. There is a command server thing I was planning to use:
> http://mercurial.selenic.com/wiki/CommandServer
> But I’ve just read on that page that there is a “significant performance overhead for launching Mercurial repeatedly”.
> So while it avoid possible licensing issues, it is still of no use to us.
Yes, python-hglib uses the command server.
The command server will open a pipe to a persistent Mercurial process for the lifetime of the command server's client. You incur the overhead once per client open, not per request to the command server. Contrast with running `hg`, which incurs the overhead for every new process. Contrast with talking to Mercurial directly through the API, which incurs the overhead roughly every time you open a repository.
Mercurial native and command server are in the same performance range. Command server is slower because there is string parsing and wire transfer overhead. But command server does allow you to circumvent GPL requirements.
Also, if you want to decrease GPL surface area, you can write a bare bones GPL-licensed Mercurial extension that provides a new command and then use the command server to run that command, bypassing the GPL. The GPL (and people's interpretations of it) can be a silly thing.
tl;dr use the command server. Use python-hglib from PyPI (or some other existing client API) for talking to it.
As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.
Sounds good.More about the squashed diff issue, please see what Taras suggested. Basically, if you can, just ignore the squashed view. It's not important as long as it doesn't stand in your way to make further progress.
Cheers,
Ehsan
On March 5, 2014 at 1:04:30 AM, gregor...@gmail.com (gregor...@gmail.com) wrote:
Wow, this is cool!
I’ll read it up, thanks.
Yeah, but the problem I have with Mercurial right now is that there are so many features and extensions that it’s hard to wrap the head around it :)
Judging from my Git perspective, this leads to workflow fragmentation when different developers may have different (or even incompatible) preferences for their workflow.
I don’t think we can simply “fix” the code review process without introducing some stricter workflow guidelines (e.g. mandatory bookmarks) at Mozilla, but I hope I’m wrong here.
As Ehsan suggested, I’ll keep the internal API as abstract as possible, so we can replace child processes with command server or even some REST API that provides an interface for the repository.
Once we know what we’re doing (feature-wise), we’ll quickly figure out how we’re doing it - that will also mean some real benchmarks.
Thanks,
George
On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:
So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?Choosing the reviewer. Also, did you get a chance to try out the push to bug workflow as well?You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?
On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?Choosing the reviewer. Also, did you get a chance to try out the push to bug workflow as well?You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.Remember the case of:Bug 555555 Part 1Bug 555555 Part 2Bug 666666and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.
Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.
How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.
Also, maybe there’s another way we haven’t considered?
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!
Got it.
So right now we’re creating one review request per changeset.
Each review request (and, consequently, changeset) will have it’s own attached link.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?
On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:
On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?Choosing the reviewer. Also, did you get a chance to try out the push to bug workflow as well?You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.Remember the case of:
Bug 555555 Part 1Bug 555555 Part 2Bug 666666and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.
How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.Also, maybe there’s another way we haven’t considered?
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!Got it.
So right now we’re creating one review request per changeset.
Each review request (and, consequently, changeset) will have it’s own attached link.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?
On March 5, 2014 at 23:56:03 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 4:45 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:On March 5, 2014 at 23:01:09 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 3:47 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:On March 5, 2014 at 22:21:59 , Ehsan Akhgari (eh...@mozilla.com) wrote:
On Wed, Mar 5, 2014 at 2:58 PM, George Miroshnykov <gmiros...@rebbix.com> wrote:So if we ignore squashed diffs, what is still missing?
The only thing on my mind is assigning the reviewer.
What else should we implement?Choosing the reviewer. Also, did you get a chance to try out the push to bug workflow as well?You mean the "--remotecmd='export BUG_NUMBER=123456; hg’” thing?
I haven’t tried it yet, but I’m sure it will be trivial to implement.
The changegroup hook will simply append the value of our env var to the changeset-chooser URL.
The HTML UI will take the value from the URL and pass it over to the HTTP API call that will assign the corresponding bug number via Review Board API.Remember the case of:Bug 555555 Part 1Bug 555555 Part 2Bug 666666and pushing the same branch once to bug 555555 and once to bug 666666, like we discussed before.Yeah, as we’ve discussed before, the second push won’t actually push anything and won’t trigger the hook, so we can’t do that.
How about this: we can add a separate “bug number” column to the HTML UI and possibly pre-populate it from commit messages.
This way you’ll be able to create review requests associated with different bugs after a single push.Also, maybe there’s another way we haven’t considered?
What we talked about last time was remembering which changesets were pushed as part of which bug, and when you receive an empty push the second time with a different bug number, reassigning the unused changesets from the last time in the chooser UI. Does that make sense?
Let’s make sure I got it right:
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!Got it.
So right now we’re creating one review request per changeset.
So we're not getting one of those squashed requests? That sucks, since I wast to be able to have a permalink for the reviews on a bug.
I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.
Any suggestions how to do this, preferably without messing with the working directory?
Each review request (and, consequently, changeset) will have it’s own attached link.That's not great...
The squashed review request may fix this.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?Hmm, I think just one, in the interest of less comment spam. Please look into the RB/Bugzilla integration to see what it offers. Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210. We might want to sync up with them.
Okay, got it.
Do we have an etherpad or something listing all features and their respective priorities?
- We combine that list with the changesets that he had just pushed
- We show the combined list of changesets in the UI
- When the review request is created, we scratch corresponding changeset off the list
- We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!Got it.
So right now we’re creating one review request per changeset.
So we're not getting one of those squashed requests? That sucks, since I wast to be able to have a permalink for the reviews on a bug.I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.
Any suggestions how to do this, preferably without messing with the working directory?
Each review request (and, consequently, changeset) will have it’s own attached link.That's not great...The squashed review request may fix this.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?Hmm, I think just one, in the interest of less comment spam. Please look into the RB/Bugzilla integration to see what it offers. Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210. We might want to sync up with them.Okay, got it.
--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
What I meant was that we have to show changesets for both previous and current pushes in case that current push will be empty.
Or should we simply have a logic like “if the current push is empty, show changesets from previous push”?
We combine that list with the changesets that he had just pushed- We show the combined list of changesets in the UI
- When the review request is created, we scratch corresponding changeset off the list
- We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?Yeah I think something like that. Please also pay attention to the use case of pushing the exact same branch twice in order to submit the same branch into two bugs for review.
I believe we can’t trigger hooks by pushing the exact same branch twice, because there are no outgoing changesets at all, but I’ll have to double-check that.
For now I suggest we assume “push or equivalent command that could trigger the hook” every time we use just “push”.
It can be a command provided by our custom Mercurial extension or something like that.
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!Got it.
So right now we’re creating one review request per changeset.
So we're not getting one of those squashed requests? That sucks, since I wast to be able to have a permalink for the reviews on a bug.I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.Which conflicts do you mean?
I mean that if you have a history like this:
A -> B -> C
and you pick only A and C, then try to squash them, it may result in conflict because C may rely on changes from B.
That is what I’m trying to detect / work around.
We had a brief discussion with Taras and while we’re still not sure how to approach this problem, we can skip the squashed diff for now.
Instead, we can generate single parent review request with no diff attached.
We can then attach a link to this review request to Bugzilla bug.
If more review requests are later created for this bug, they will be added as children of the corresponding parent review request.
My description is probably confusing, but this is basically a squashed review request without the actual diff.
If it’s not good enough, we can later figure out how to generate the diff to attach to this parent review request.
Any suggestions how to do this, preferably without messing with the working directory?I'm not clear on what you mean to be honest.
Each review request (and, consequently, changeset) will have it’s own attached link.That's not great...The squashed review request may fix this.
If there are multiple changesets associated with the same bug, should we create one bz comment per changeset or one bz comment per push?Hmm, I think just one, in the interest of less comment spam. Please look into the RB/Bugzilla integration to see what it offers. Also our Bugzilla team at Mozilla has been doing some work here as well, please see https://bugzilla.mozilla.org/show_bug.cgi?id=515210. We might want to sync up with them.Okay, got it.
--
You received this message because you are subscribed to the Google Groups "mozilla-code-review" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mozilla-code-re...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.Thanks and sorry for the slow response. FWIW please ping me on IRC if you think you're spending too much time waiting on me to reply. I'll do my best to respond quickly but I'm very overloaded and sometimes like this I fail to respond in ~24 hours which is unacceptable!
Sure, no worries!
Thanks,
George
On March 6, 2014 at 22:22:50 , Ehsan Akhgari (eh...@mozilla.com) wrote:
Thanks George, this is nice!FWIW the users come from Bugzilla. Please ping either mcote, dkl or glob on #bmo and they can tell you the API you should use.
Yes, eventually they'll come from Bugzilla, but changeset-chooser shouldn’t probably know about it.
We can’t simply take Bugzilla user and assign him as a reviewer inside RB - we need to have a user on RB side.
Right now I’m using the RB API to get the list of RB users that can be assigned as reviewers.
Syncing Bugzilla and RB users should be handled as part of RB <=> Bugzilla integration, so it’s probably out of scope of changeset-chooser.
Also, we should probably come up with a more creative name :)
Hey everyone,
I've added a way to specify a bug number during hg push
.
Unfortunately I couldn't use --remotecmd
for this because it doesn't allow any arguments.
Instead, for now we'll have to do this:
BUG=31337 hg push -e 'ssh -o SendEnv=BUG'
The output looks like this:
pushing to ssh://review.infinity.com.ua//var/hg/playground-central
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: choose changesets to review: http://review.infinity.com.ua:5000/?changeset=b43890fa6f7411c1f3f31bd2c701c1656892d608&bug=31337
You can see both changeset and bug number in the link above.
HTML UI retrieves the corresponding bug summary via Bugzilla REST API and displays it at the top of the page.
Obviously, all new review requests are associated with the given bug number. I've configured everything to point to bmo (just for testing), so we can see that links actually work.
Here's the review request I've created from the above link: http://review.infinity.com.ua/r/11/diff/
Thanks, George
We combine that list with the changesets that he had just pushed- We show the combined list of changesets in the UI
- When the review request is created, we scratch corresponding changeset off the list
- We can optionally provide a “discard” button that scratches off the changeset without creating a review request
Did I miss anything?Yeah I think something like that. Please also pay attention to the use case of pushing the exact same branch twice in order to submit the same branch into two bugs for review.I believe we can’t trigger hooks by pushing the exact same branch twice, because there are no outgoing changesets at all, but I’ll have to double-check that.
For now I suggest we assume “push or equivalent command that could trigger the hook” every time we use just “push”.
It can be a command provided by our custom Mercurial extension or something like that.
So the TODO list looks like this:
- Allow choosing the reviewer
- Associate review requests with a bug number, if provided
Anything else we should try out?What Taras said, but please try to work on the push to bug workflow first!Got it.
So right now we’re creating one review request per changeset.
So we're not getting one of those squashed requests? That sucks, since I wast to be able to have a permalink for the reviews on a bug.I believe that’s why we needed squashed requests in the first place.
But if we allow to pick arbitrary changesets, we need a way to handle conflicts as well.
Which conflicts do you mean?I mean that if you have a history like this:
and you pick only A and C, then try to squash them, it may result in conflict because C may rely on changes from B.
A -> B -> C
That is what I’m trying to detect / work around.
We had a brief discussion with Taras and while we’re still not sure how to approach this problem, we can skip the squashed diff for now.
Instead, we can generate single parent review request with no diff attached.
We can then attach a link to this review request to Bugzilla bug.
If more review requests are later created for this bug, they will be added as children of the corresponding parent review request.
Also, we should probably come up with a more creative name :)
Hey everyone,
I've added a way to specify a bug number during
hg push
. Unfortunately I couldn't use--remotecmd
for this because it doesn't allow any arguments.Instead, for now we'll have to do this:
BUG=31337 hg push -e 'ssh -o SendEnv=BUG'
The output looks like this:
pushing to ssh://review.infinity.com.ua//var/hg/playground-central searching for changes remote: adding changesets remote: adding manifests remote: adding file changes remote: added 1 changesets with 1 changes to 1 files remote: choose changesets to review: http://review.infinity.com.ua:5000/?changeset=b43890fa6f7411c1f3f31bd2c701c1656892d608&bug=31337
You can see both changeset and bug number in the link above.
HTML UI retrieves the corresponding bug summary via Bugzilla REST API and displays it at the top of the page.
Obviously, all new review requests are associated with the given bug number. I've configured everything to point to bmo (just for testing), so we can see that links actually work.
Here's the review request I've created from the above link: http://review.infinity.com.ua/r/11/diff/
I’ve just finished it - you can try it out now.
After you push the “create review request” button, you should get a link to a parent review request.
This review request is associated with a bug number (instead of changeset ID), so it can be linked to Bugzilla.
If you later push more changesets for the same bug number, they should be added to that parent review request, as you would expect.
As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.
Also, I'd like to spend some time now going through your code. Where should I be looking?
Sure, here it is:
https://github.com/laggyluke/changeset-chooser
It’s not production quality yet, but should be at least somewhat readable :)
The most interesting part is this:
https://github.com/laggyluke/changeset-chooser/blob/master/lib/routes.js
Thanks,
George
As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.
Also, I'd like to spend some time now going through your code. Where should I be looking?Sure, here it is:
https://github.com/laggyluke/changeset-chooser
It’s not production quality yet, but should be at least somewhat readable :)
The most interesting part is this:
https://github.com/laggyluke/changeset-chooser/blob/master/lib/routes.js
Ah, I forgot that special case again.
Your first push for 123 introduced just a single changeset, so it should have created one review request.
Instead, it created two: a child review request with a diff (15) and a parent review request without a diff (16).
Then you pushed another changeset for 123 and that created another child review request (17) and associated it with 16.
Here we have a mix of technical and logical issues.
Technical issue is that AFAIK Review Board API won’t let us search review requests by bug number:
https://code.google.com/p/reviewboard/issues/detail?id=1398
So I had to abuse the ‘commit’ field of review request like this:
As we’ve discussed before, second push of the same chanesets won’t trigger the hook, but for now you can simply change the bug number in the URL.Do you have a solution for this yet? We definitely need to figure out a way to solve this, otherwise this workflow won't work.
Taking a step back, I see a couple of options here: