Fwd: Improvements to TensorFlow PR merge workflow

240 views
Skip to first unread message

Edd Wilder-James

unread,
Jun 29, 2018, 9:02:58 PM6/29/18
to devel...@tensorflow.org
Resending as this didn't make it through the first time. 

---------- Forwarded message ---------
From: 'Yifei Feng' via TensorFlow Announcements <anno...@tensorflow.org>
Date: Fri, Jun 29, 2018, 5:01 PM
Subject: Improvements to TensorFlow PR merge workflow
To: <anno...@tensorflow.org>, <devel...@tensorflow.org>


Hi all,

tldr; we are changing our internal GitHub code sync process. Expect a delay in merging pull requests on GitHub tensorflow/tensorflow master branch for the next week or two.

TensorFlow is switching to a new workflow to enable us to update GitHub more frequently, and remove some of the friction from merging PRs. Currently, changes to TensorFlow can be submitted both to our internal repository in Google, and on GitHub. This can cause long delays between when changes are submitted in one source to when they show up at another, due to merge conflicts. Switching to using a single source-of-truth will allow us to automate the code sync process, and push the latest TensorFlow changes to GitHub as soon as they are available.

There will be minimal changes to how you interact with GitHub: this is an internal improvement to help us be more efficient, avoid complex merge conflicts, and publish commits more quickly.

How will this change affect you:
  • In order to switch to the new workflow, we will temporarily stop merging pull requests to the GitHub master branch, starting from EOD 07/03 for about a week.
  • Meanwhile, pull request reviews and CI will not be affected.
  • Once the new workflow is live, we will resume merging pull requests (and catch up with the backlog). 
  • With the new workflow, once your pull request has been approved and passed tests, you will see a `ready to pull` label applied to your change. This means we are working on getting your pull request submitted to our internal repository. After the change has been submitted internally, your pull request will be merged automatically on GitHub.


Cheers,
Yifei


Bhack

unread,
Jun 29, 2018, 10:26:49 PM6/29/18
to TensorFlow Developers
So to summarize:

- The new unique source of truth will be the internal hidden repository.

- PR/WIP will be still visible only for third party contributors as now.

- The approval/"ready to pull" timing will establish who will have the burden of rebasing and resolving conflicts: google internally vs third party contributor in public PR.

Is this right?

clayne.b...@intel.com

unread,
Jul 2, 2018, 12:42:54 PM7/2/18
to TensorFlow Developers
So is the July 12 branch date for 1.10 still on?

Any update on a 1.9 release?

Clayne

Martin Wicke

unread,
Jul 2, 2018, 12:58:47 PM7/2/18
to Bhack, TensorFlow Developers
Instead of regularly merging commits from two repositories, the new process will keep them tightly in sync, so that commits appear in both effectively simultaneously. We cannot use Github's tooling to achieve this, so we will merge using our internal tooling (which also merges the PR on GitHub). 

Google internal commits will show up sooner in this new model (as soon as they're committed), but not before they are committed. 

PRs have to be rebased against master as before. Once a PR is "ready to pull", it does undergo additional testing (we sometimes do this today, but not for every PR). Sometimes those tests will uncover additional problems, and the reviewer may request additional changes. But as today, those changes will generally not be for conflict resolution. 

Martin

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/a8679756-4443-4953-80ab-2abc7c6be7f7%40tensorflow.org.

Edd Wilder-James

unread,
Jul 2, 2018, 1:15:50 PM7/2/18
to clayne.b...@intel.com, devel...@tensorflow.org
Yes, the July 12 branch date for 1.10 is still current. 

I expect 1.9 RC2 to be released this week. As per the usual practice, if there aren't any blocking bugs found in RC2, then 1.9 final will be released 7 days later.

Bhack

unread,
Jul 2, 2018, 1:21:39 PM7/2/18
to TensorFlow Developers, s.fa...@gmail.com
I meant:

If the only new source of truth will be an internal repository that we cannot see how we can be sure that:

* You will not procastinate to label a ready PR as "ready to pull" just cause your internal code, that we will not see, is conflicting with this potential PR and so you will try to public push your internal change on master before labeling a Public PR to the burden the charge of rebasing to the author of the external contributor.
* That there will be a minimal respect of the time of the arrival of the code (when it is QA ready) cause we only know when WIP started in public PR but not in the internal side (the new source of truth).

I understand the main motivation of your new approach, but I think that there is a regression from the community transparency point of view.

Just my 2c.

Edd Wilder-James

unread,
Jul 2, 2018, 2:15:53 PM7/2/18
to Stefano Fabri, devel...@tensorflow.org
First up, we care a lot about community transparency. When this change was first proposed, I asked in depth to ensure it didn't jeopardize the openness of the project. 

This change is a good one: GitHub will now be much more up to date, and reduce the amount of work that contributors need to do. It also frees up team members from tedious labor, so more cycles are available for working with PRs.

Please do not start from the position of assumed negative motives. There are lots of ways in which we'd like to be better, naturally, and they are easy to find, but often hard to fix. There's a team of people who work really hard to keep TensorFlow a viable open source project, and they have the best intent. 

To speak to your particular points:

On Mon, Jul 2, 2018 at 10:21 AM Bhack <s.fa...@gmail.com> wrote:
If the only new source of truth will be an internal repository that we cannot see how we can be sure that:

* You will not procastinate to label a ready PR as "ready to pull" just cause your internal code, that we will not see, is conflicting with this potential PR and so you will try to public push your internal change on master before labeling a Public PR to the burden the charge of rebasing to the author of the external contributor.

If you suspect we would deliberately behave like this, you might misunderstand our motivation.

Let's start from the assumption that we all want to collaborate constructively, and we all operate with incomplete information about each other. For a community to function, we should seek to be generous and start from a place of assumed good intent.
 
* That there will be a minimal respect of the time of the arrival of the code (when it is QA ready) cause we only know when WIP started in public PR but not in the internal side (the new source of truth).

We expect the average time until a commit is visible to the public to fall. This sync change makes no difference to how things were before, in terms of whether work started as a GitHub PR or elsewhere.

Here's the main change that you'll see. Currently, code is synced over to GitHub periodically, sometimes with a delay of days. Now, it'll happen automatically in response to being committed into our internal source control. So, it's the opposite of what you are worrying about: you'll see more of what's going on.

In my reply to you in our other thread I describe some of the ways in which we're making work-in-progress more obvious, and I also responded there to your desire to make it more visible.

I understand the main motivation of your new approach, but I think that there is a regression from the community transparency point of view.

I think we disagree here: the community will have more insight into the development process, seeing commits as they happen (not only after regular syncs).

This thread has made me realize we can do a better job of describing our workflows and processes: there's a serious and sophisticated investment made by the TensorFlow team in ensuing this is a viable open source project.

I understand your frustrations, and am determined to work with you and every contributor to collectively improve the project, but I must ask you too to not rush to assume negative motives.

thanks!

-- Edd

 

Just my 2c.



Il giorno lunedì 2 luglio 2018 18:58:47 UTC+2, Martin Wicke ha scritto:
Instead of regularly merging commits from two repositories, the new process will keep them tightly in sync, so that commits appear in both effectively simultaneously. We cannot use Github's tooling to achieve this, so we will merge using our internal tooling (which also merges the PR on GitHub). 

Google internal commits will show up sooner in this new model (as soon as they're committed), but not before they are committed. 

PRs have to be rebased against master as before. Once a PR is "ready to pull", it does undergo additional testing (we sometimes do this today, but not for every PR). Sometimes those tests will uncover additional problems, and the reviewer may request additional changes. But as today, those changes will generally not be for conflict resolution. 

Martin

On Fri, Jun 29, 2018, 19:26 Bhack <s.fa...@gmail.com> wrote:
So to summarize:

- The new unique source of truth will be the internal hidden repository.

- PR/WIP will be still visible only for third party contributors as now.

- The approval/"ready to pull" timing will establish who will have the burden of rebasing and resolving conflicts: google internally vs third party contributor in public PR.

Is this right?


--
Edd Wilder-James, Open Source Strategy at Google

Adam Bouhenguel

unread,
Jul 2, 2018, 2:38:53 PM7/2/18
to Edd Wilder-James, Stefano Fabri, devel...@tensorflow.org
I'm glad to see this conversation happening. TensorFlow is an important steward of the ML community.

Hope these are just the first of many conversations that surface the tensions that we as a community are feeling as well as the efforts the TF core team is actively making.


--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

Bhack

unread,
Jul 2, 2018, 3:52:57 PM7/2/18
to TensorFlow Developers, s.fa...@gmail.com
Hi Edd,
I don't assume negative motives, but some of these observation are caused by the asymmetry of the setup.

It is really hard now to imagine that some strong/valid community member, if not employed by Google, could become a community PRs gatekeeper and flag some PR "ready to pull" in the future.
I think this is true also if we suppose to require a final confirmation by someone from TF google team for the final internal pull.

I clearly don't see any possible path, with this setup, to achieve this.

So the infos are highly unbalanced:

- Community will never have community gatekeeper for PRs
- We all operate with incomplete information but the community work need to be public before having an occasion to be merged. So on our side it is mandatory.

Having both of this two levers in Google hand it will never let the community to scale especially for a project with 100k stars and near 65k forks.

I think that there is a need of a process to let community members to grow up and become PR gatekeeper and to flag PR "ready to pull", so at least we could see what happens between the "ready to pull", Google final PR approval and internal pushes to master.

As PRs will be no more directly merged in github I think that this solution could be a very honest compromise.

For sure you will be faster than now with your new workflow but it will be not enough for the growing popularity of the framework if you don't let the community to scale on "limited" gate-keeping roles.

I really hope to be constructive

Thanks

P.s. It is a bias to assume that the community priorities are the same of Google. So we need to be fair on this topic and let community PR go at ideal speed like internal pushes.

I just want to improve the process that of course is try to migrate from "somewhere" towards a true opensource project with a real community role.

Thanks



Thanks

Martin Wicke

unread,
Jul 2, 2018, 4:39:32 PM7/2/18
to s.fa...@gmail.com, TensorFlow Developers
I think there's a misunderstanding here: 

All community members who have been given write access, whether working for Google or not, are able to approve PRs (in review), and they can apply labels. This will not change: it includes the "ready to pull" label, which will trigger somewhat automated but sadly not fully automatic processes. 

This internal merge process does include some additional tests, and in some cases we may have to refer the PR back to the author when those fail. Our experience to date shows that such failures are rare, and will be dramatically reduced by this change (since it improves synchronization).

We absolutely intend for members of the community to be able to approve PRs, and this change is a further step in that direction. There is no additional review after "ready to pull" has been applied, the reviewer (whether at Google or not) has the final word. This new process does not change that.

Martin

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

Bhack

unread,
Jul 2, 2018, 5:27:08 PM7/2/18
to TensorFlow Developers, s.fa...@gmail.com
Martin,
I was clearly referring to the status quo of both the internal change control and full gate-keeping owned by TF (Google).

This is the first time that I hear from TF team that you "intend for members of the community to be able to approve PRs".

As I see this specific point a little bit far probably we could simply find a more near intermediate step and give "Apply labels and milestones"[¹] also to some estimated member of the community.

I hope that after 3 years of opensource history you have not employed all the valid contributors :)

Jason Zaman

unread,
Jul 3, 2018, 5:49:42 AM7/3/18
to Edd Wilder-James, devel...@tensorflow.org
Hey,

I want to start off saying I support fixing the workflow, the current
one sucks and clearly doesnt work. I have some questions or concerns and
I want to make sure the new workflow will end up better. I dont really
understand why the weirdness in the current workflow is happening so
this may not be so clearly explained :(.

I'm also fine that your "main" repo will be internal and hopefully after
this initial hurdle things should be close enough to instant between the
two so the fact that master is internal is mostly irrelevant. Maybe one
day there could be only one public repo or at least the github one would
be the master but one step at a time :). We do a similar thing for
gentoo's main repo, github is a mirror of git.gentoo.org and commits
show up there within seconds.

My problem with the current workflow is keeping track of and finding my
own commits in TF has been really hard, they appear to get applied and
then un-applied and then re-applied then squashed then suddenly there is
one massive commit and files mostly appear to have what they should but
the git history is completely worthless. I have one example below but
I've seen it a few times so it seems like this is some issue with the
workflow and the bots not a one-off mistake.

Also, the /releases page on github has: "This release contains
contributions from many people at Google, as well as:" and then a list
of names. The names appear to not be updated with releases, maybe its
because the bot is doing weird things so its hard to find? Ctrl+f jason
doenst find me at least.

Not sure if I've explained this clearly, but my questions are mostly:
(1) will this weird back and forth thing go away?
(2) will the squashing random unrelated commits together also go away?

Here is one of the weirder examples:
https://github.com/tensorflow/tensorflow/commit/1039ff9ee8c8c7ed09f9bb106131a50285866dd4
When I made the PR it was two separate commits, one of the BUILD file
and one for configure right after it. The commit that ended up in the
final repo is squashed together tho.

If you checkout the github repo and run:
$ git log -p -- tensorflow/BUILD
then scroll down a ways, you'll see that the patch gets added and
removed like 3 times.

https://github.com/tensorflow/tensorflow/commit/e80732c9895d1283af9b98d6277ad1a1015e2e9a#diff-ba5b4c00a780208294e782e644856f1a
commit e80732c9895d1283af9b98d6277ad1a1015e2e9a
Author: Akshay Modi <nares...@google.com>
Date: Tue Jun 19 00:57:19 2018

Merge changes from github.

PiperOrigin-RevId: 201011811

this commit above changes 232 files, 3433 additions, 909 deletions,
which is *really* weird.

https://github.com/tensorflow/tensorflow/commit/148b4381fd0259cae441e459ec8ebe2c5d557722#diff-ba5b4c00a780208294e782e644856f1a
commit 148b4381fd0259cae441e459ec8ebe2c5d557722
Author: Akshay Modi <nares...@google.com>
Date: Tue Jun 19 02:48:36 2018

Automated g4 rollback of changelist 201011811

PiperOrigin-RevId: 201033171

https://github.com/tensorflow/tensorflow/commit/6070ae0e148f50dbc8f36e1654f0a3f53b8b067e#diff-ba5b4c00a780208294e782e644856f1a
commit 6070ae0e148f50dbc8f36e1654f0a3f53b8b067e
Author: Akshay Modi <nares...@google.com>
Date: Tue Jun 19 12:00:34 2018

Merge changes from github.

PiperOrigin-RevId: 201110240

git blame is pretty useless too:
https://github.com/tensorflow/tensorflow/blame/master/configure.py#L1414
That line should blame me. If there is any issue with the line it should
be easy for someone to find out where it came from and if it breaks
something they should be able to ask me, thats not really possible here.

Thanks,
Jason

On Fri, Jun 29, 2018 at 06:02:45PM -0700, 'Edd Wilder-James' via TensorFlow Developers wrote:
> Resending as this didn't make it through the first time.
>
> ---------- Forwarded message ---------
> From: 'Yifei Feng' via TensorFlow Announcements <anno...@tensorflow.org>
> Date: Fri, Jun 29, 2018, 5:01 PM
> Subject: Improvements to TensorFlow PR merge workflow
> To: <anno...@tensorflow.org>, <devel...@tensorflow.org>
>
>
> Hi all,
>
> *tldr;* *we are changing our internal GitHub code sync process. Expect a
> delay in merging pull requests on GitHub tensorflow/tensorflow master
> branch for the next week or two.*
>
> TensorFlow is switching to a new workflow to enable us to update GitHub
> more frequently, and remove some of the friction from merging PRs.
> Currently, changes to TensorFlow can be submitted both to our internal
> repository in Google, and on GitHub. This can cause long delays between
> when changes are submitted in one source to when they show up at another,
> due to merge conflicts. Switching to using a single source-of-truth will
> allow us to automate the code sync process, and push the latest TensorFlow
> changes to GitHub as soon as they are available.
>
> There will be minimal changes to how you interact with GitHub: this is an
> internal improvement to help us be more efficient, avoid complex merge
> conflicts, and publish commits more quickly.
>
> How will this change affect you:
>
> - In order to switch to the new workflow, we will temporarily stop
> merging pull requests to the GitHub master branch, *starting from EOD
> 07/03 for about a week*.
> - Meanwhile, pull request reviews and CI will not be affected.
> - Once the new workflow is live, we will resume merging pull requests
> (and catch up with the backlog).
> - With the new workflow, once your pull request has been approved and
> passed tests, you will see a `ready to pull` label applied to your change.
> This means we are working on getting your pull request submitted to our
> internal repository. After the change has been submitted internally, your
> pull request will be merged automatically on GitHub.
>
>
>
> Cheers,
> Yifei
>
> --
> You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
> Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CALW3V4fta1D%2BP8%3Drz1PcyULKOq4ToZgY7UrX1VBBbVn_6%3DobSw%40mail.gmail.com.

颜发才(Yan Facai)

unread,
Jul 3, 2018, 7:36:23 AM7/3/18
to Martin Wicke, Edd Wilder-James, TensorFlow Developers, Bhack
Hi,
If I understand it correctly, the new workflow is to make Github become the mirror of Google internal repository, right?

It sounds good (at least to me) if only the synchronization could be instant enough. However I have the same question as Bhack about merge conflict:
If the PR of contributor is conflicting with Google internal repository, how to resolve the merge conflict?

Thanks.




To unsubscribe from this group and stop receiving emails from it, send an email to developers+unsubscribe@tensorflow.org.

Yifei Feng

unread,
Jul 3, 2018, 11:00:53 PM7/3/18
to 颜发才(Yan Facai), Jason Zaman, Martin Wicke, Edd Wilder-James, TensorFlow Developers, Bhack
Thanks for the detailed example, Jason. The new workflow is aimed to address the two issues you brought up. With all changes submitted at one place, there will no longer be needs to go back and forth to squash commits and sync the two repositories. Tools like `git blame` will work much better as well.

Facai, your understanding is correct. For conflict resolving, there shouldn't be much difference from today when two PRs conflict with each other. 

Thanks,
Yifei

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

颜发才(Yan Facai)

unread,
Jul 4, 2018, 8:10:41 AM7/4/18
to Yifei Feng, Martin Wicke, Edd Wilder-James, TensorFlow Developers
Thanks for your prompt response, Yifei.



>>> For conflict resolving, there shouldn't be much difference from today when two PRs conflict with each other.

I don't think so. Because contributors cannot fetch the maser branch from Google, I suppose we can't do anything but wait.


Would the synchronization be automatic and periodic? If yes, how frequent?

Thanks.





To unsubscribe from this group and stop receiving emails from it, send an email to developers+unsubscribe@tensorflow.org.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+unsubscribe@tensorflow.org.

Yifei Feng

unread,
Jul 4, 2018, 12:27:25 PM7/4/18
to 颜发才(Yan Facai), Martin Wicke, Edd Wilder-James, TensorFlow Developers
The synchronization will be automatic and happen as soon as each change is submitted internally. Therefore, the GitHub master branch will be tightly synced with the internal repo.


Thanks,
Yifei

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

颜发才(Yan Facai)

unread,
Jul 4, 2018, 6:40:17 PM7/4/18
to Yifei Feng, Martin Wicke, Edd Wilder-James, TensorFlow Developers
Cool, many thanks! 

Bhack

unread,
Jul 10, 2018, 2:52:06 PM7/10/18
to TensorFlow Developers, yif...@google.com, wi...@google.com, e...@google.com
The new workflow seems in place.

Yifei Feng

unread,
Jul 10, 2018, 10:26:19 PM7/10/18
to Bhack, TensorFlow Developers, wi...@google.com, e...@google.com
Yes we have started the new workflow and resumed PR merging. Thanks for all your patience!

Bhack

unread,
Jul 12, 2018, 6:05:20 PM7/12/18
to TensorFlow Developers, s.fa...@gmail.com, wi...@google.com, e...@google.com
Why some commits[1] are still authored and pushed by tensorflower-gardener?

If you are interested there are some PR related stats[2].


Martin Wicke

unread,
Jul 12, 2018, 6:18:40 PM7/12/18
to Bhack, TensorFlow Developers, Edd Wilder-James
The commits authored by tensorflower-gardener are from anonymous people at Google who may not have a github account.

Robin Richtsfeld

unread,
Jul 13, 2018, 10:31:56 AM7/13/18
to TensorFlow Developers, s.fa...@gmail.com, e...@google.com

The commits authored by tensorflower-gardener are from anonymous people at Google who may not have a github account.

It doesn't matter if there's a GitHub account is associated with the commit authors email.
Git just asks for name (and email). In theory, any Googler should have both.

Gunhan Gulsoy

unread,
Jul 14, 2018, 12:38:09 AM7/14/18
to robin.ri...@gmail.com, devel...@tensorflow.org, s.fa...@gmail.com, Edd Wilder-James
Many people, for various reasons, may not want their emails or names listed publicly.
We just respect their choices.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

faca...@gmail.com

unread,
Aug 15, 2018, 1:16:28 AM8/15/18
to TensorFlow Developers, robin.ri...@gmail.com, s.fa...@gmail.com, e...@google.com
In new workflow, PR seems to be merged (rather than squashed) into master branch in Google internal repository, right?

I think it is better to use squash and merge to create a more streamlined Git history.
Take Apache Spark as an example, If I remember correctly,  its CI robot always squash and merge new commits, and then close the corresponding PR (don't merge it again).
Does the workflow make sense for TensorFlow?

Gunhan Gulsoy

unread,
Aug 15, 2018, 1:50:28 AM8/15/18
to faca...@gmail.com, devel...@tensorflow.org, Robin Richtsfeld, Stefano Fabri, Edd Wilder-James
I think we squash every PR into a single commit, then mark the PR as merged.
It is just like clicking "Squash an Merge" on github.

Jason Zaman

unread,
Aug 15, 2018, 1:57:35 AM8/15/18
to faca...@gmail.com, TensorFlow Developers, robin.ri...@gmail.com, s.fa...@gmail.com, e...@google.com
I disagree, squashing is bad. I do like a clean history but that is
accomplished with rebasing which is unrelated to squashing. In the main
Gentoo tree we rebase commits on top of master then commit but all
commits should be separate not squashed into one. Squashing makes it
very hard to find things later and was one of my main issues with the
old workflow. If someone pushes many commits that are all messy they
should be asked to fix it and squash and rebase into a set of nice
atomic commits to make things nice on their own. Forcibly squashing
everything is bad.

-- Jason
> >> email to developers+...@tensorflow.org <javascript:>.
> >> <https://groups.google.com/a/tensorflow.org/d/msgid/developers/40b57af4-2fd7-4763-8a99-0479c3709914%40tensorflow.org?utm_medium=email&utm_source=footer>
> >> .
> >>
> >
>
> --
> You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
> Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/4a87c8cd-e03d-4887-b6cb-1133adc7e100%40tensorflow.org.

Gunhan Gulsoy

unread,
Aug 15, 2018, 2:13:28 AM8/15/18
to ja...@perfinion.com, Yifei Feng, faca...@gmail.com, devel...@tensorflow.org, Robin Richtsfeld, Stefano Fabri, Edd Wilder-James
I will let Yifei clarify, but what we do is, each pull request becomes a single commit.
We think this is actually cleaner, as we do not need to see single commits that are responses to reviewer comments.
In a few larger PRs I worked on, while as a whole pull request the change is meaningful, if we were to split the commits we would have very very broken points in our commit history.
Also, if we detect later that some PRs are causing problems, rolling back single commits is easier.

Jason Zaman

unread,
Aug 15, 2018, 2:31:50 AM8/15/18
to Gunhan Gulsoy, Yifei Feng, faca...@gmail.com, devel...@tensorflow.org, Robin Richtsfeld, Stefano Fabri, Edd Wilder-James
Yes, I agree if a PR with a small fix is opened, then a review says
something should be changed, the original commit should be amended (ie
git commit --amend). For some of my bigger PRs tho, the changes are
quite involved so I split them into several commits in a row so its
easier to follow changes. Those kinds of PRs should not be squashed
since sometimes reverting part of the series is needed and following in
general is more harder.

Squashing and/or commit --amend'ing is definitely important and commits
like "Fix bug in foo", "typo", "fix typo again" should definitely be
squashed. I only object to squashing bigger series the same way. I think
the way its done now is pretty good. Squashing a little more the "fix
typo" kinds of fixups would be good, I just dont want *everything*
squashed.

I think the earlier issue about clean history is more about merging vs
rebasing tho. The problem with merging is if you run 'git log
--oneline --graph' it gets a bit verbose. git rebase instead of git
merge cleans that up tho.

-- Jason
> > https://groups.google.com/a/tensorflow.org/d/msgid/developers/20180815055731.GB18057%40baraddur.perfinion.com
> > .
> >

颜发才(Yan Facai)

unread,
Aug 15, 2018, 6:13:48 AM8/15/18
to gu...@google.com, TensorFlow Developers, robin.ri...@gmail.com, Bhack, Edd Wilder-James
Hi, gunan.
I'm afraid not.
Take [PR 21428](https://github.com/tensorflow/tensorflow/pull/21428) as an example,  I think its [final commit 08719024](https://github.com/tensorflow/tensorflow/commit/087190246b95dc4c188f630ca90880a12e39b557)  is a "merge commit". Correct me if I'm wrong.

commit 087190246b95dc4c188f630ca90880a12e39b557
Merge: 1a22b0b982 a170aef5b6
Author: TensorFlower Gardener <gard...@tensorflow.org>
Date:   Sun Aug 12 16:10:42 2018 -0700

    Merge pull request #21428 from facaiy:BUG/clip_by_global_norm_with_inf

    PiperOrigin-RevId: 208412584


Yifei Feng

unread,
Aug 15, 2018, 2:00:32 PM8/15/18
to 颜发才(Yan Facai), gu...@google.com, TensorFlow Developers, robin.ri...@gmail.com, Bhack, Edd Wilder-James
We currently push a commit that merges the PR to master. In addition to the reason Jason mentioned, we also have seen large PRs with commits from more than a single authors, in which case we would like to credit the commits to the corresponding authors. 

Yong Tang

unread,
Aug 16, 2018, 10:12:20 AM8/16/18
to TensorFlow Developers, faca...@gmail.com, gu...@google.com, robin.ri...@gmail.com, s.fa...@gmail.com, e...@google.com
In addition to `git commit --amend -s` mentioned by Jason, `git rebase -i HEAD~3` could also be used to edit the past 3 git commit history (3 or other number) so that you have a chance to squash the commits.

In the docker project we maintain, we typically ask the contributors to squash the commits of the PR at the end, after two LGTMs have been given and before the merge button is pushed. Though we also could merge multiple commits in one PR if each commit consists of a complete story, or commits are done by different contributors.

I think the reviewers could comment to remind the PR authors to squash the commits if the history is too verbose.

颜发才(Yan Facai)

unread,
Aug 16, 2018, 7:03:04 PM8/16/18
to yong...@hotmail.com, TensorFlow Developers, Gunhan Gulsoy, Robin Richtsfeld, Bhack, Edd Wilder-James
Because `git rebase` will discard old commits which might have been linked to comments of reviewer, I think it will make a mess of review process in Github.
Reviewers could always ask contributors to squash the commits of the PR at the end, as suggested by Yong Tang. However I'm afraid that the solution don't solve the reviewing problem as mentioned above, and it also become a burden on contributors.

So far as I know, 'Squash and merge' option might be the best solution provided by Github. The work should be done by robots instead of contributors.


Martin Wicke

unread,
Aug 16, 2018, 9:20:00 PM8/16/18
to 颜发才(Yan Facai), yong...@hotmail.com, TensorFlow Developers, Gunhan Gulsoy, Robin Richtsfeld, Bhack, Edd Wilder-James
Sadly that option isn't available to us (at least not yet) in our merge tooling. It's strictly merging for now. 

Having done the "ask contributors to squash" before that was an option on GitHub, I'd say that pretty much doesn't work. Most people doesn't have the mastery of git that that sadly requires.

颜发才(Yan Facai)

unread,
Aug 16, 2018, 10:21:40 PM8/16/18
to Martin Wicke, yong...@hotmail.com, TensorFlow Developers, Gunhan Gulsoy, Robin Richtsfeld, Bhack, Edd Wilder-James
As for merge tooling, `git merge --squash` has supported "squash and merge" strategy, unfortunately, it also forces maintainer to recreate a new commit (might loss the information of contributor).

faca...@gmail.com

unread,
Sep 18, 2018, 2:28:08 AM9/18/18
to TensorFlow Developers
Hi, I think that [contributing guidelines](https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md#contributing-code) is outdated about merge workflow, could we update it? New contributor might be confused about what's the role of read-to-pull label.

Yifei Feng

unread,
Sep 18, 2018, 3:21:04 AM9/18/18
to faca...@gmail.com, TensorFlow Developers
Thanks for pointing it out, and absolutely! https://github.com/tensorflow/tensorflow/pull/22340 sent to update the guide.

Reply all
Reply to author
Forward
0 new messages