[DISCUSS] merge/commit flow for committers

198 views
Skip to first unread message

Jason Plurad

unread,
Jul 12, 2017, 3:09:12 PM7/12/17
to JanusGraph developer list
I came across PR-393 the other day as I was looking through the latest commits.

> A number of commits in 0.1 are missing in master. @srosenthal Thanks for catching.

Many thanks to everybody that helped identify and fix that situation!

However I was left wondering a bunch of things:
1. Where would I find the conversation that triggered the PR?
2. How did we miss merging commits into master?
3. How are we tracking which issues get merged into which branches?
4. Do we have the merge/commit process sufficiently documented?

We have this document on pull requests http://docs.janusgraph.org/latest/pull-requests.html but it doesn't seem to cover anything about how to handle merges for multiple branches. For example, TinkerPop is a bit more clear on how it handles this http://tinkerpop.apache.org/docs/3.2.5/dev/developer/#branches

There's some good discussion in PR-393. If we reached a consensus on what the right strategy is, feel free to chime in here amcp, jerryjch, sjudeng, srosenthal, and others so we can get it documented and publicized more widely.

-- Jason

sjudeng

unread,
Jul 12, 2017, 9:22:51 PM7/12/17
to JanusGraph developer list
Based on our trial with the approach of merging into master and cherry-picking commits into release branches, I'd like to suggest we instead follow the TinkerPop process and merge features into relevant release branches and then merge release branches into master. I'd suggest we make this change after releasing 0.2.0, depending on the separate decision on how branching occurs after that (e.g. if we created a 0.2 branch and bumped master to 0.3.0-SNAPSHOT).

Some of the problems with the current approach have come up in the past weeks, including updates merged into 0.1 but not master (Jason this comment is what started it). There have also been cases where contributors are trying to solve this disconnect on their own by submitting multiple PRs for the same feature, one for the release branch and one for master. This makes it more difficult for us as reviewers and also makes it hard to track all interactions on an issue because comments are spread across multiple PRs.

Irving Duran

unread,
Jul 13, 2017, 1:04:59 PM7/13/17
to sjudeng, JanusGraph developer list
I like this approach. Does anybody know if this will be a problem for Travis-CI to pick up as of which branch the testing will be done?


Thank You,

Irving Duran

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ted Wilmes

unread,
Jul 13, 2017, 5:22:57 PM7/13/17
to JanusGraph developer list, sju...@gmail.com
I think the TinkerPop way is a good way to go. It's straightforward and has worked well on that project.

--Ted

Jerry He

unread,
Jul 13, 2017, 7:54:18 PM7/13/17
to JanusGraph developer list
I recall we had previous discussed this topic, and had a consensus.

Is it not working for us and we need to change direction? 

By looking at the history,  the 3 or so PRs that were missed in master were during the transition time of the previous discussion. Things after that are working as expected. 
But surely we all want to remind ourselves to follow so when we commit.

I see there is no differences in term of the work involved for the different approaches.

We all have to:
1. Decide which version  the change will go into.  
2. After committing to one branch, merge or PR (and resolving conflicts) to the other branches.

The current approach (as agreed upon in the previous thread) is default to master.  Then selectively go to the other branches, which will be done at the same time.
The alternative is to go to the other branches, then merge  (resolving conflicts) to master.  If we do branch merge, then we have the question of how often we do it,  For each commit, or once a while?

Branches will diverge, and can significantly so after a while.  Then there is a question on who is better to do the merge/resolving conflicts.  I think it belongs to the originator of the change, not the committer.  (It could be the same person.)

Also, as we go, the changes that need to go to the other branches should diminish significantly, and we should do it purposefully to reduce our maintenance cost and encourage upgrade.

It does not seem things are broken, or there is an overwhelming advantage of the alternative so that we need to change.


Thanks.

Jerry


Thank You,

Irving Duran

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.

Jason Plurad

unread,
Jul 14, 2017, 1:40:02 PM7/14/17
to JanusGraph developer list
That's right, Jerry. We had consensus. I wanted to revisit it in case it is not working out for us. I'm not certain it is, but we should have the conversation on the dev list, not buried in various PRs.

My main point is I don't want to worry about fixes missing from the right branches. If it was just that one PR, then maybe I don't need to be so worried.

This could be as simple as the approving committer putting a closing comment saying that it needs to be merged/cherry-picked into the other branch.

-- Jason

sjudeng

unread,
Jul 14, 2017, 9:34:38 PM7/14/17
to JanusGraph developer list
At the time of our previous discussion on this we were blocking merge of all PRs until resolution, so I know for my part it seemed reasonable to try an approach and move forward. Similarly could we try the alternative merge strategy for the next round of development (e.g. 0.2 release branch) and see if it works better for us? For one it might make sense to align with TinkerPop on this since many are members of both communities. I agree it does depend on how active development is on release branches. Soon I expect we'll have a 0.2 release branch (tied to TinkerPop-3.2/Spark-1.6) and master (tied to TinkerPop-3.3/Spark-2.1). I had been thinking we would be maintaining both of these in parallel to a much greater extent than we tried to do with 0.1/master, where 0.1 mostly only got bug fixes. It seems like the proposed merge strategy would scale better when many PRs (with potentially multiple commits) will go to both branches. I do like your point regarding the contributor resolving conflicts in both branches. But it also seems odd to me to have duplicate PRs with the same feature going into different branches. Is this common in other projects?

Jerry He

unread,
Jul 15, 2017, 3:48:06 PM7/15/17
to JanusGraph developer list

Do we allow direct but protected push to the main repo?  If not, PR will be needed to push to each branch in all approaches.

The work is the same.  There is no easy way out.  The steps may be slightly different.  The goal is the same too, to let the change go to appropriate branches and maintain a relatively clean commit history.


Let’s look at two examples, Spark and TinkerPop, both of which use the PR model, from my understanding. Please correct me if wrong.


Spark flow:

1. JIRA

2. Pull request against master (contributor)

3. Merge PR (committer)

4. Cherry-pick the commit to an old branch if needed, direct push, no PR (committer)

    In rare cases and if the conflicts are non-trivial, the committer will request a PR from the contributor against the old branch and then merge.


TinkerPop flow:

1. JIRA

2. Pull request against master or an old branch (contributor)

3. Merge PR (committer)

4. Merge branch from an old branch that has the PR to master, direct push, no PR (committer).


Again, the work is similar. There is no big difference. 

Spark has additional requirements on the PR and commit message format and hides all pure ‘merge commit’. 


Thanks,


Jerry

sjudeng

unread,
Jul 15, 2017, 5:37:42 PM7/15/17
to JanusGraph developer list
Thanks for your summary, it looks looks good and I agree the approaches are very similar. Given this it might just come down to what our preference is as a community. Unlike before we now have time to hopefully come to a stronger consensus. I think we just want to decide either way by the 0.2.0 release.

Few questions about the Spark flow:

1. When cherry-picking what if the commit isn't the most recent, do you need to identify the commit hash (or should you always cherry-pick by hash)?
2. In the case of multiple commits in a PR do you cherry-pick individually or can you cherry pick the merge commit?
3. If a committer wants to check whether master is up-to-date with a release branch how is this done?

With the TinkerPop flow the merge into master (and the check whether master is up-to-date) is always just "git merge release-branch". I like the simplicity, especially for checking whether master is up-to-date. Right now if I want to confirm all commits in 0.1 are in master I try "git log origin/0.1 ^origin/master". But this shows me 29 commits (16 if I add "--no-merges"). Is this expected or did we do something wrong (or is there a better git call for this)? If we followed the TinkerPop flow I could do a "git merge release-branch" and get a satisfying "Already up-to-date" response from git.

sjudeng

unread,
Aug 13, 2017, 5:05:15 PM8/13/17
to JanusGraph developer list
Jerry, Would you be okay trying out the TinkerPop flow for a period or do you prefer the Spark flow? If it's the Spark flow what are your (or anyone elses) thoughts on my previous questions/concerns and/or what are the concerns with the TinkerPop flow?

With the approaching release of TinkerPop 3.3.0 I think we should address this if our intention is to maintain both 0.2 and master branches to follow TinkerPop 3.2.x and 3.3.x, respectively.

Jerry He

unread,
Aug 14, 2017, 1:15:02 AM8/14/17
to JanusGraph developer list
Thanks for bringing up this thread again, and I apologize for not answering earlier.  I was trying to wait for others to chime in, but it eventually escaped me.

Some answers to the earlier questions first.

1. When cherry-picking what if the commit isn't the most recent, do you need to identify the commit hash (or should you always cherry-pick by hash)?
Yes, need the commit hash.

2. In the case of multiple commits in a PR do you cherry-pick individually or can you cherry pick the merge commit?
There is always one commit per PR.

3. If a committer wants to check whether master is up-to-date with a release branch how is this done?
The master is the main branch and always first to consider (unless in special cases where it does not apply).  So there is no worry if it is up-to-date.

I think my questions and concerns are here:
1.  Do we allow command line direct push of commits to the main repo?
2.  In the commit history, the less the 'merge' commits showing up, the better.  For example, for one real commit of a code change, we could have 3 commits showing up in history:
  - commit for the code change
  - 'merge' commit for the PR from the user branch.
  - the branch merge commit.
I think it is truly confusing and pollutes the history.
3.  Also I think the master branching should still be denoted as the main dev branch, even if we have future 0.2 and master branches to follow TP 3.2 and 3.3 respectively.
     There may be more activities on the 0.2 branch, but master is still the main dev branch.

I am very much open to anything that comes up nice and clean.

Thanks,

Jerry

sjudeng

unread,
Aug 16, 2017, 11:55:23 PM8/16/17
to JanusGraph developer list
Jerry, Thank you for continuing to discuss this. It's hard to find consensus when it comes to whether to follow a merge or rebase workflow.

I do think the TinkerPop flow puts less burden on committers for day-to-day reviews and merges because as long as the PR is targeted to the lowest applicable release branch the merge up to master (or other release branches) can happen any time later, potentially by another committer, with equal ease. I think validating the branch target for new PRs would be easy for us to implement and monitor.

With the Spark flow the committer would really need to, immediately after merging into master, create a new branch off the relevant release branch, cherry-pick the commit from master into same, and then create a new PR (possibly repeating the process in the case of multiple release branches). It would be possible to do it later but I think it would be easy to lose track of the commit. For example if a month passed or another committer wanted to see if any commits in master were missing in the release branch all commits would need to be scanned.

It seems like while the Spark flow may result in a cleaner repository in terms of removing merge commits it does require more diligence by the repository maintainer and I think it would be better in an environment where a single person is responsible for managing the repository, probably with the ability to push directly to branches.

Currently we're almost all merging rather than rebasing PRs. As a compromise we could consider instead rebasing PRs. This way we'd remove the merge commit into the release branch, which would balance out the eventual additional merge commit (which might actually include multiple PRs/commits) from the release branch into master.

sjudeng

unread,
Aug 23, 2017, 7:54:49 PM8/23/17
to JanusGraph developers
It looks like we're getting pretty close to the release of TinkerPop 3.3.0, do we have a path forward here? I think Jerry and I have made some good points for and against both of the proposed approaches. Irving and Ted, there has been a lot more discussion since you chimed in have you changed your opinions? Any other committers or contributors have a strong preference between the proposed merge/rebase approaches?


On Wednesday, July 12, 2017 at 2:09:12 PM UTC-5, Jason Plurad wrote:

Robert Dale

unread,
Nov 5, 2017, 10:36:04 AM11/5/17
to JanusGraph developers
I'm for bug fixes committed to 0.2 branch, merge to master.

Robert Dale

unread,
Nov 5, 2017, 10:39:58 AM11/5/17
to JanusGraph developers
BTW, this decision is holding up at least three PRs now.

Jerry He

unread,
Nov 5, 2017, 9:35:35 PM11/5/17
to JanusGraph developers
First of all, I don't think the contributions to the project can be blocked or hijacked by the discussion here.  This is a community project. 

With people from different background and experience, there can be difficulty reaching consensus, which is perfectly ok. 
Also, this is a new project, and It is ok that we don't dictate a fixed policy at the moment and be flexible as we search for a good one.  
If you want to do a branch merge, do it with clear PR and commits and demonstrate the benefit of this flow. 
I am happy to be swayed either way.

Thanks,

Jerry

Jason Plurad

unread,
Nov 14, 2017, 5:31:11 PM11/14/17
to JanusGraph developers
This discussion should not have blocked the incoming pull requests from getting merged, but it appears that has happened.

1. Based on this thread, we have more people in favor of the TinkerPop approach to fixes. Fix goes into maintenance branch then gets merged into master. When a PR comes in, the contributor may have targeted master. If a committer determines that it should go into a maintenance branch, he/she should comment on the PR and request that the contributor opens the request against the specific maintenance branch. This creates some extra churn for the contributor, but it's beneficial for any contributors that plan to work with TinkerPop since the flow would be the same.

Jerry mentioned a few times that he's open to either approach. Unless there are any final objections, it sounds like we have a consensus on changing approaches. Let's get some +1s on this thread to confirm this and then get it written up in the docs.

2. There was a mention on whether Merge should be a merge commit or a rebase. Right now, GitHub exposes both through the UI with merge commit being the default. There wasn't much discussion on this so the merge action is left up to the committer. I don't have a strong opinion on making this more formal, so I'd suggest we leave it out for now.

3. Going back to my initial email on this thread, my main concern was making sure that we appropriately track issues so that we can easily understand where specific fixes land. We don't want to run into the situation again where a fix went into a maintenance branch but did not get merged into master. During the 0.2 release, I used the Milestone to tag the issues that were associated with it. Milestones can also be used on PRs, so if committers use these as they merge the PRs, it'll be easy to keep track things that get put in multiple branches.

-- Jason

Robert Dale

unread,
Nov 16, 2017, 7:36:43 AM11/16/17
to Jason Plurad, JanusGraph developers
1.  +1. I would like to add that maintenance branch 'fixes' should include non-breaking enhancements. E.g. https://github.com/JanusGraph/janusgraph/pull/682  ,  https://github.com/JanusGraph/janusgraph/pull/689  . That is to say we should target the lowest branch possible.  

2. I think 'rebase and merge' is the default when possible. I think 'merge' is selected when there would be issues doing a rebase.  In our case, I think we should always be able to do rebase and merge since we require that the PR is rebased and squashed.

3. I started tagging items with Milestones then I realized I was tagging both PRs and issues.  Should there be a preference for one or the other or both? 


Robert Dale

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/6fdac80a-63a6-49a0-915d-e6b562b61277%40googlegroups.com.

Jason Plurad

unread,
Nov 16, 2017, 12:01:30 PM11/16/17
to JanusGraph developers
1. Thanks for the +1. We had been backporting doc fixes also as an example of "non-breaking enhancements".

2. Rebase merging can be set as the default. We can remove the other 2 options (merge commits, squash merging).

3. Assigning milestones on the issue is most important. Then at least you could scroll through and see the PRs merged against it. Assigning milestones on the PR is perhaps a bit redundant, but I don't see any harm in sticking the milestone on there too. It would give you the ability to filter on it from the GitHub PR search box-- is:pr is:closed milestone:"Release v0.2.1"


On Thursday, November 16, 2017 at 7:36:43 AM UTC-5, Robert Dale wrote:
1.  +1. I would like to add that maintenance branch 'fixes' should include non-breaking enhancements. E.g. https://github.com/JanusGraph/janusgraph/pull/682  ,  https://github.com/JanusGraph/janusgraph/pull/689  . That is to say we should target the lowest branch possible.  

2. I think 'rebase and merge' is the default when possible. I think 'merge' is selected when there would be issues doing a rebase.  In our case, I think we should always be able to do rebase and merge since we require that the PR is rebased and squashed.

3. I started tagging items with Milestones then I realized I was tagging both PRs and issues.  Should there be a preference for one or the other or both? 


Robert Dale

Robert Dale

unread,
Nov 17, 2017, 5:46:58 AM11/17/17
to Jason Plurad, JanusGraph developers
It appears that issues can be assigned to only one Milestone.  So perhaps it makes sense to put the issue on the lowest Milestone and then put PRs on actual Milestones.

Robert Dale

sjudeng

unread,
Nov 18, 2017, 1:44:33 PM11/18/17
to JanusGraph developers
When doing a "rebase and merge" does the original PR author still receive full credit in GitHub contribution statistics?


On Wednesday, July 12, 2017 at 2:09:12 PM UTC-5, Jason Plurad wrote:

Jason Plurad

unread,
Nov 18, 2017, 2:21:58 PM11/18/17
to JanusGraph developers
+1 on setting lowest milestone on the issue, and assigning milestone on individual PRs.

For "rebase and merge", GitHub does retain the original PR author. A couple references:

"Rebases automatically set the committer of the rebased commits to the current user, while keeping authorship information intact." If there are rebase conflicts, a committer should request that the author rebases on their branch, resolve it, and then force push.

Ted Wilmes

unread,
Dec 10, 2017, 4:17:54 PM12/10/17
to JanusGraph developers
A belated +1 from me on point #1 and I think the milestone hygiene makes sense. 

Makes me wonder if we should also require folks to add an entry to the CHANGELOG for every contribution so it gets built out as things are merged. In the past we've pointed users to the milestone, which is good, but it would be nice to start including a more detailed changelog in the release artifact. I'll start that as a separate discussion though to gather opinions.

Since master is protected, after we merge a fix into the maintenance branch, should our process be to also open a PR for that maintenance branch on the main repo to master? Alternatively, I wouldn't be opposed to un-protecting master so that whoever performed the merge could just go ahead and get it into master without additional review.

--Ted
Reply all
Reply to author
Forward
0 new messages