What's the state of the incompatible changes policy?

99 views
Skip to first unread message

Lukács T. Berki

unread,
Aug 21, 2018, 8:13:24 AM8/21/18
to bazel-dev, Marcel Hlopko, Laurent Le Brun, Rosica Dejanovska, Dmitry Babkin, Dmitry Lomov
Hey there,

I got word of the rollback of Dmitry B's //tools/defaults change which made me aware that there isn't a lot of consensus on the incompatible changes policy.

Turns out, we have the incompatible changes policy since a lot of time for Skylark, but we don't officially promise the same for Bazel in general, which is trouble. In particular:
  1. Our users are unhappy because we break their code
  2. We are unhappy because some changes are rolled back because they break things of various nature
  3. We are unhappy because some people keep themselves to this policy, some people don't, which results in the "privatizing gains, socializing losses" antipattern
  4. The rollbacks are not consistent; sometimes forward fixes are okay, sometimes not depending on who complains more loudly
  5. There is a fear that our velocity will take a hit. In particular, it can take three months until you can remove code, even though no one actually depends on it (the best case is two months):
    1. You submit the code on Jan 2. Unfortunately, the release was cut on Jan 1.
    2. Next release is cut on Feb 2. The release is published on March 1, which is when your flag reaches our users.
    3. The April 1 release will default to the new value of the flag
    4. You remove code on April 2
So, what's the plan? I'd like at least for the changes currently in the pipeline not to be held to this standard (since they were started, well, without knowing it and some of them have flags, but they don't start with --incompatible_*) and I postulate that waiting for three months to be able to remove a code path will crater our velocity.


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Marcel Hlopko

unread,
Aug 21, 2018, 8:21:49 AM8/21/18
to Lukács T. Berki, bazel-dev, Laurent Le Brun, Rosica Dejanovska, Dmitry Babkin, Dmitry Lomov
I don't know where to share this, so I'll share it here, fyi.

m3b@ writes:

Here's what I've been able to estimate.   I apologise that the data are so soft.

Summary:

I estimate the cost per developer for a bazel incompatibility
to be between 5 minutes (in a almost perfect case) to maybe more than
an hour, depending
on how confusing the resulting error messages are,
and how familiar the developer is with the bazel upgrade process.

I estimate the number of developers affected by such a change would
be measured in thousands, for tensorflow alone.  It might be
higher---the data I have are annoying
hard to interpret.

So I'd guess the wasted developer time from an incompatible bazel change might
vary anywhere from a man month to several man years.
Alas, it's hard to be more precise with the data I currently have.


Because the cost is spread thinly over a large number of developers,
there will be few strident complaints, if any.
Rather, the reputation of Google, tensorflow
and bazel will take a small hit in the minds of a significant number of people,
as they grumble to one another privately about the software's stability.
I suspect that cost in reputation makes it worthwhile to spend some time
on ensuring that incompatible changes happen only rarely,
even if bazel is official still only "beta" software.

  ------------

Details on which I base my estimates:

A few weeks ago, we had to upgrade bazel on every
machine that builds new releases of tensorflow.
That was due to a change to allow the bazel
build work on Windows.  It somehow affected Linux and
MacOS builds too, and it required a more recent bazel release than most of us
were using.

The error message was not as obvious as one might have liked;
     ERROR: .../tensorflow/third_party/gpus/cuda_configure.bzl:117:1:
file '@bazel_tools//tools/cpp:windows_cc_configure.bzl' does not
contain symbol 'setup_vc_env_vars'
     ERROR: error loading package '': Extension file
'third_party/gpus/cuda_configure.bzl' has errors
it certainly confused me.   I think I flailed for tens of minutes looking at
change histories, wondering what I'd done wrong.  (Remember that bazel is often
just one part of a complex process, and that build procedures often
update source to the
latest version without warning.  I was invoking bazel via something like that.)
Eventually, I got lucky: I mentioned my frustration to someone,
who said "Try upgrading bazel."
Then I had to look up the upgrade process, follow it, and then test
that it had worked.
That probably took a further 5-10 minutes because I hadn't done it in a while.

Notice that that's not all, because I'm not yet finished with the upgrades.
I can think of at least four more machines I must deal with.
I just timed myself performing the upgrade on one of them.
It took me 3 minutes, even having had the experience of having done it recently,
and without having to spend any time on diagnosis, and without counting the cost
of having at least one previous invocation fail.


To estimate the number of affected developers, I looked at this page:
    https://github.com/tensorflow/tensorflow/graphs/traffic
It says that that there were over 13 thousand "unique cloners" in the
last couple of weeks.
(The number of "clones" was more than 3 times higher.)
I am not sure how to map "unique cloners"
in 2 weeks to people who will upgrade tensorflow, and therefore have
needed to upgrade bazel.
But given that bazel is the primary build mechanism, it would not surprise
me if thousands of developers were involved (or will be eventually).
--
Marcel Hlopko | Software Engineer | hlo...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Laurent Le Brun

unread,
Aug 21, 2018, 8:59:33 AM8/21/18
to Marcel Hlopko, Lukács T. Berki, bazel-dev, Rosica Dejanovska, Dmitry Babkin, Dmitry Lomov
For Starlark language/API, we follow this process:
https://docs.bazel.build/versions/master/skylark/backward-compatibility.html
and we try to go slowly.
- We first update the documentation to mark something as deprecated.
- When a flag is introduced, we wait multiple releases before we switch it.
- We announce changes in the release announcement.
- We wait again multiple releases before removing the flag.
- We try to provide a good error message. We often keep special-cases
in the code just to have a good error message (it generally mentions
the incompatible flag to switch).
- We try to do this, even when we just fix a bug (e.g. we rolled back
the fix for https://github.com/bazelbuild/bazel/issues/5709 because it
could possibly break users, even if we haven't found any code that
would break).

In my opinion, this is not enough and we should try to do more.
- We are working on better tooling
(https://github.com/bazelbuild/buildtools/issues/341). We should
provide a tool to automatically fix user code, at least for the simple
cases.
- I would like to give users more visibility about future breaking
changes. I would like to set a version number to each incompatible
flag, just to say: "This flag will switch to True with Bazel 0.19".
Once we have that, users may test their code with
--all_incompatible_changes=0.19 and know what will break with Bazel
0.19. This is useful because we sometimes introduce flags long in
advance, and users are not required to update their code right now.

I understand that this workflow might not work well in all areas of
Bazel. At the minimum, we can aim for:
- Each breaking change should be discussed in a design document. We
can discuss the rollout plan on a case-by-case basis.
- A change in Bazel should never break a project tested on Bazel CI.

We won't be able to release Bazel 1.0 if we're not able to do
guarantee stability to users.
--
Laurent

Lukács T. Berki

unread,
Aug 21, 2018, 10:04:14 AM8/21/18
to Laurent Le Brun, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Babkin, Dmitry Lomov
I think it's the classical stability/velocity trade-off with the added twist that we can't know how disruptive a particular change is because we don't have a global view of the world (unlike at Google).

 
 - Each breaking change should be discussed in a design document. We
can discuss the rollout plan on a case-by-case basis.
...or maybe in an e-mail thread for simplicity?
 
 - A change in Bazel should never break a project tested on Bazel CI.
+1
 

We won't be able to release Bazel 1.0 if we're not able to do
guarantee stability to users.
What is the plan for when we get there? I suppose then the trade-off between stability and velocity will shift towards stability?

Lukács T. Berki

unread,
Aug 21, 2018, 10:07:40 AM8/21/18
to Marcel Hlopko, bazel-dev, Laurent Le Brun, Rosica Dejanovska, Dmitry Babkin, Dmitry Lomov

At Google, we solve this by asking the author of the incompatible change to update all code. That's very unpleasant (but doable) at Google, but impossible with Bazel because we can't access all the source code that's compiled with Bazel.

We also have the advantage of having one single HEAD, and therefore not having to worry about updating the version of a dependent library only so that it works with Bazel (and pulling in other, unrelated, possibly breaking) changes with it.

Of course, we can always keep incompatible flags alive for a long time, but the more flags we have, the more difficult it is to maintain Bazel, which is (again) a velocity hit.

Dmitry Babkin

unread,
Aug 22, 2018, 2:14:06 PM8/22/18
to Laurent Le Brun, Marcel Hlopko, Lukács T. Berki, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
- We wait again multiple releases before removing the flag.
I think better to make commitment in terms of time instead of number of releases. 
For instance: after 1st of November this flag will be removed. 

Dmitry Babkin

unread,
Aug 22, 2018, 2:16:59 PM8/22/18
to Lukács T. Berki, Laurent Le Brun, Marcel Hlopko, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
>>- A change in Bazel should never break a project tested on Bazel CI. 
We do not have access to code tested by Bazel CI, is not it?
How can we enforce owners of this code to do changes? 

Laurent Le Brun

unread,
Aug 22, 2018, 2:32:14 PM8/22/18
to Dmitry Babkin, Lukács T. Berki, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
You have read access to the projects tested by the CI (most of them
are on GitHub, I think).

I think it should go both ways:
- Changes in Bazel are not allowed to break projects tested (without
flags) on the CI.
- If a project cannot build with "bazel build
--all_incompatible_changes", the owners have to fix it quickly (or
report why it's difficult or not feasible).

So you first add your incompatible flag. Then the owners will fix
their code. Then you switch the flag (it shouldn't break anything).

If a project is abandoned or the owners are unresponsive, we may drop
them from the CI. If a fix is not trivial, we should give owners more
time.

The goal for the Bazel team is to minimize the amount of work they ask
to their users. It's beta and there's a lot of legacy issues to fix,
so some amount of breaking changes make sense. But really, users are
not going to be happy if they are required to update their code every
month.

Laurent Le Brun

unread,
Aug 22, 2018, 2:34:19 PM8/22/18
to Dmitry Babkin, Marcel Hlopko, Lukács T. Berki, bazel-dev, Rosica Dejanovska, Dmitry Lomov
> I think better to make commitment in terms of time instead of number of releases.

Releases are time-based, so it's kind of the same. The date you submit
your change is not meaningful for the users - they will care the day
their Bazel binary contains your change.
--
Laurent

Tony Aiuto

unread,
Aug 22, 2018, 3:34:22 PM8/22/18
to Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, Lukács T. Berki, baze...@googlegroups.com, ros...@google.com, Dmitry Lomov
On Wed, Aug 22, 2018 at 2:34 PM 'Laurent Le Brun' via bazel-dev <baze...@googlegroups.com> wrote:
> I think better to make commitment in terms of time instead of number of releases.

Releases are time-based, so it's kind of the same. The date you submit
your change is not meaningful for the users - they will care the day
their Bazel binary contains your change.

On Wed, 22 Aug 2018 at 20:14, Dmitry Babkin <dba...@google.com> wrote:
>
> - We wait again multiple releases before removing the flag.
> I think better to make commitment in terms of time instead of number of releases.
> For instance: after 1st of November this flag will be removed.

Date based is going to be very hard to manage unless developers can perfectly
align date of commit with date of release. Better to do it with versions
  • 0.17.0 is released
  • developer works on a breaking change
  • submits new flag which is off by default with code to do it.
  • deprecation warning says "this flag will flip in 0.20"
  • time passes
  • 0.18.0 gets cut with change but flag keeps it off.
  • time passes
  • 0.19  gets released
  • The day after 0.19 gets released, developer flips the flag and commits that
Now we are not coupled to specific dates and the developer easily knows that they
are providing the requisite number of cycles.

Note that I did say "deprecation warning". I strongly believe that we can and should
provide console spewed deprecation warnings for any change requiring a migration.
This is especially true as the community gets larger and useful rule sets produced
outside of the Bazel core team gain traction. I never want to see this scenario:
  • Bazel 1.n ships with a flagged incompatible change that will go live in 1.n+2.
  • This change breaks rules_foo, where rules_foo has widespread use, but is
    not created by the Bazel team.
  • The rules_foo team notices the change in 1.n and updates their rules.
  • A casual Bazel user happens to use rules_foo. They update Bazel periodically
    but forget to update rules_foo. If there is no deprecation warning, this situation
    can last for a while.
  • A few Bazel releases happen, rules_foo now breaks the user, who had no
    prior warning. In their minds, this is Bazel's fault.
With a warning, the user may at least see that something is going to change.
If done right, the warning message mentions the rule which invoked the deprecated
usage, so the user knows it is rules_foo. Then they have a place to look for more
information. In this scenario, both Bazel and rules_foo have done the right thing
for the user.

 
>
> On Tue, Aug 21, 2018 at 2:59 PM Laurent Le Brun <laur...@google.com> wrote:
>>
>> For Starlark language/API, we follow this process:
>> https://docs.bazel.build/versions/master/skylark/backward-compatibility.html
>> and we try to go slowly.
>>  - We first update the documentation to mark something as deprecated.
>>  - When a flag is introduced, we wait multiple releases before we switch it.
>>  - We announce changes in the release announcement.
>>  - We wait again multiple releases before removing the flag.
>>  - We try to provide a good error message. We often keep special-cases
>> in the code just to have a good error message (it generally mentions
>> the incompatible flag to switch).
>>  - We try to do this, even when we just fix a bug (e.g. we rolled back
>> the fix for https://github.com/bazelbuild/bazel/issues/5709 because it
>> could possibly break users, even if we haven't found any code that
>> would break).
>>
>> In my opinion, this is not enough and we should try to do more.
>>  - We are working on better tooling
>> (https://github.com/bazelbuild/buildtools/issues/341). We should
>> provide a tool to automatically fix user code, at least for the simple
>> cases.

+1.  F
or breaking changes, we should always provide as much automation
scripting as possible for users to automatically update the code base.
This
goes beyond linting as in issue 341. We would need to provide scripts
for specific method, attribute renames. 
+100

 
--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To post to this group, send email to baze...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAFtxR%2BksMsNXicTr0j24fq3q%2BRj59cqa3eZair_gP%3Dc0DLHnjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jon Brandvein

unread,
Aug 22, 2018, 4:50:25 PM8/22/18
to bazel-dev
(Emphasis mine)

I strongly believe that we can and should provide console spewed deprecation warnings for any change requiring a migration.

We (Starlark team and maybe others) have strong opinions against polluting the console with arbitrary warnings, including deprecation warnings. The rationale is that these warnings are not actionable by the majority of users who see them, and taken together, they drown out the useful information that a user is actually interested in. Instead of "pushing" warnings to users, you can get similar value by allowing users to "pull" warnings by running Bazel with a verbose flag, running a linter, etc. Hence why we have --all_incompatible_changes instead of a laundry list of warnings.

For the scenario you highlight, where a user is broken by upgrading Bazel without upgrading their dependencies, this is accommodated by having a release cycle where the flag defaults to true before you remove the flag entirely. Then the user has an escape hatch. Most of our changes are straightforward enough that we can point to the relevant flag in the error message.

Tony Aiuto

unread,
Aug 22, 2018, 4:57:04 PM8/22/18
to Jon Brandvein, bazel-dev
I am well aware I am in the minorrity on this. That means I have to present really good reasons why my position is the right one. :-)

Lukács T. Berki

unread,
Aug 23, 2018, 9:58:15 AM8/23/18
to Tony Aiuto, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
@Tony: What's your opinion on the fact that removing old functionality would then take *at least* two months even if it's actually unused?

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+unsubscribe@googlegroups.com.

Jon Brandvein

unread,
Aug 23, 2018, 10:27:03 AM8/23/18
to Lukács T. Berki, Tony Aiuto, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
FWIW, I suspect a lot of the pain of slower velocity will be felt much more in work on rule logic, as compared with work on Starlark or core logic. (The differing views within the team seem to at least partially fall along those boundaries.) Maybe that's an argument for applying radical stability to Bazel itself before its native / "blessed" rule sets. But that doesn't solve the underlying problem, that we're still breaking users every release.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/tk3szEs1WAY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-dev+...@googlegroups.com.

To post to this group, send email to baze...@googlegroups.com.

Tony Aiuto

unread,
Aug 23, 2018, 2:55:58 PM8/23/18
to Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Thu, Aug 23, 2018 at 10:27 AM Jon Brandvein <bran...@google.com> wrote:
FWIW, I suspect a lot of the pain of slower velocity will be felt much more in work on rule logic, as compared with work on Starlark or core logic. (The differing views within the team seem to at least partially fall along those boundaries.)

Yes. I feel the same. There is more ability to get "creative" in the rules than in Bazel itself. When rules depend on rules you get N^2 creativity.
Maybe that's an argument for applying radical stability to Bazel itself before its native / "blessed" rule sets. But that doesn't solve the underlying problem, that we're still breaking users every release.

When I put on my "user" hat, I see no distinction between Bazel's native and blessed rule sets. It is all one product - especially if the same people answer my questions on StackOverflow  no matter what part of Bazel, Starlark, or Rules I ask about. As contributors it is easier for us to see the distinction. Breaking casual users is what I care most about avoiding. Contributors know they are in for a bumpier ride.
 

On Thu, Aug 23, 2018 at 9:58 AM 'Lukács T. Berki' via bazel-dev <baze...@googlegroups.com> wrote:
@Tony: What's your opinion on the fact that removing old functionality would then take *at least* two months even if it's actually unused?

"even it it is actually unused" is the part that I think is hard to quantify. Within any user organization (e.g. Google) you can tell if you use some bit of obsolete functionality. But it is, for the most part, impossible to know if it used by some other organization that you (a Bazel contributor) does not work for. So, I think the question is how do we trade off breaking change velocity vs. precision of knowing the potential damage. Since we really can't know the potential damage to arbitrary users, we can make pre-announcing the damage as a proxy. If we can give them warning that something is coming, they can take steps to mitigate it. 

If we absolutely know that a feature could not be being used (maybe because using the feature results in broken output), then perhaps all we need to do is announce the removal in the release notes.

If we are a little less certain, perhaps a warning in a users news group (presuming we can convince users to sign up for the news group), as soon as the code is checked in, but before the release goes out. I believe this has to be through a non-source channel, because most users will not need, nor want to follow github. They are going to get binary distributions and ask questions on stack-overflow.

The hard case is where we are even less certain people are not using the feature. Consider a case where someone just refactored rules_foo to stop using an obscure feature X of java_library. We may feel confident that X is not used any more, because we can not find it anywhere in bazelbuild/*. But if rules_foo was a pretty good example rule set, I would bet that some user in the wild had copied the same use of X and has it preserved in their private code repo. The 2 release cycle rule protects those users. I think we have to pay the velocity tax for this case.

This line of reasoning is why I like visible deprecation warnings. They tell a user that the next release will break them. In the (hopefully) rare case they used the obscure rule X, they will get a chance to refactor their code before the Bazel update. If we provide a migration script for them, even better! I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well. I believe in the latter scenario because that is the successful end game - that there are truly third party Bazel rule sets, produced by communities external to BazelBuild, having their own release cycles and probably not in the Bazel CI system. If we are doing well, we will not be able to know all the "creative" ways Bazel is being used.

Jon Brandvein

unread,
Aug 23, 2018, 3:16:10 PM8/23/18
to Tony Aiuto, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
> This line of reasoning is why I like visible deprecation warnings. They tell a user that the next release will break them.

So does a linter, or running with --all_incompatible_changes. The user you're protecting is the user who declines to do those things. Rather than advocate to that user by dumping on their console, we could advocate to them via friendly reminders on the release page and in the blog post ("please run with --all_incompatible_changes to ensure...").

> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of. If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

Tony Aiuto

unread,
Aug 23, 2018, 4:06:31 PM8/23/18
to Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Thu, Aug 23, 2018 at 3:16 PM Jon Brandvein <bran...@google.com> wrote:
> This line of reasoning is why I like visible deprecation warnings. They tell a user that the next release will break them.

So does a linter, or running with --all_incompatible_changes. The user you're protecting is the user who declines to do those things. Rather than advocate to that user by dumping on their console, we could advocate to them via friendly reminders on the release page and in the blog post ("please run with --all_incompatible_changes to ensure...").

We can't expect users to lint the transitive closure of .bzl files they happen to use. People generally lint on edit.
Lint is for rule developers, not most end users.

Nor do I believe it is reasonable to expect people to flip --all_incompatible_changes. They are either going to
have it off all the time, and thus not see the looming breakage, or they will have it on all the time, and we
will break them as soon as they try to update. Then they grumble, and turn it off, fix the problem and
flip it back on. There are two problems with that
1. They got a failure rather than a warning.
2. We just shortened the time they have for mitigation. With a deprecation warning you know when
    you upgrade to version N that you will break at version N+1.  Without the warnings, you find out
    that you broke at version N+1 when you try to update to version N+1.
 

> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of.

Exactly. There will be no upgrade from above, so we want to give the end user as much lead time as possible.
 
If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

Where we differ on this is that I believe there is no such thing as a "friendly" failure message.  A failure stops my work right now and I have to take action to get going again. It is hostile to me at that instant.
 
But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

I think we are talking about slightly different cases here. I may be putting words in his mouth, but Lukács was concerned with keeping velocity high w.r.t removing things that he knows are unused. So, I am assuming that he wants to condense the cycle for turning something off to something like

Release N
  • Implement new (replacement) feature.
  • Add flag that keeps old feature on. Flag is on by default.
  • [aiuto addition] Add deprecation warning on old feature.
  • Maybe convert all Bazelbuild rules to new feature. Else do in release N+1
Release N+1
  • Convert rules if not done in Release N
  • Remove old feature code and the flag.

Austin Schuh

unread,
Aug 23, 2018, 4:52:32 PM8/23/18
to Jon Brandvein, Tony Aiuto, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
I'm enjoying watching this discussion.  Thanks for having it on a public list!

On Thu, Aug 23, 2018 at 12:16 PM 'Jon Brandvein' via bazel-dev <baze...@googlegroups.com> wrote:
> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of. If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

And it can be a lot of work to upgrade third_party rulesets.  rules_closure can require us to modify a lot of javascript to remove linter warnings.  rules_go requires us to port our changes to get hermetic compilers working back onto the new version.  Upstream does not yet support our use case, and I'm not sure some of the flags we need are exposed without the bazel patches we re-apply every release.  The last upgrade took 2 weeks.

But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

-Werror -Wextra in gcc is similar to this.  We turn those flags on by default for all code, and then do the -Wno-crazy-warning-flag for the ones which conflict with code we don't/can't update.  That would be a pretty visible but give people hooks to delay a fix.  That is assuming a more savvy user though who cares about the build and isn't just trying to build tensorflow and this is another blocker.

I want our code base to be on more modern bazel.  I'd ideally like to see us update every release.

With skylark rules, the developer gets to pick when to update, or if to update.  With built-in rules, we don't get a choice and need to migrate.  This has in the past blocked bazel upgrades for months for us.

Austin

Jon Brandvein

unread,
Aug 23, 2018, 5:09:33 PM8/23/18
to Tony Aiuto, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, baze...@googlegroups.com, Rosica Dejanovska, Dmitry Lomov
> We can't expect users to lint the transitive closure of .bzl files they happen to use.

Agreed.

> With a deprecation warning you know when
    you upgrade to version N that you will break at version N+1.  Without the warnings, you find out
    that you broke at version N+1 when you try to update to version N+1.

Your timescale's a bit compressed, you have the breakage occurring just one release after it's introduced. With --incompatible_* flags, we (at worst) introduce the flag at N, flip it at N+1, and remove it at N+2.

Note that I'm not considering a failure under --all_incompatible_changes as breaking your workflow, because the idea is to use that flag to test for readiness to upgrade, perhaps in CI. Normal dev builds don't need to use it, and if they do then we can assume they know how to opt out.

So to compare the strategies:

1. At version N you print a warning or observe an --all_incompatible_changes failure (if you are testing that flag)
2. At N+1 the flag flips and you have a soft breakage
3. At N+2 the flag is gone and you have a hard breakage

The difference between the two is step 1: Do you push warnings to the user (whether they want console spam or not), or do you expect them (or their CI) to ask for warnings with a flag? Either way, if they miss it they still have time to recover in step 2. You say warnings will avoid the unpleasantness of the soft-breakage, but there's a tragedy of the commons if everyone (especially transitive dependencies) pollutes the console with unactionable garbage.

Warnings do have one advantage though: You can get more than one of them in a single invocation. Us console-protectionists would counter that the answer to this is either a linter or opt-in warnings.

> Lukács was concerned with keeping velocity high w.r.t removing things that he knows are unused. 

I'm not sure about the interaction of my anti-warning crusade with this higher-velocity approach. But regardless, I'm hoping we can give users due notice without making a bad impression on their console by default.

Lukács T. Berki

unread,
Aug 24, 2018, 3:50:10 AM8/24/18
to Jon Brandvein, Tony Aiuto, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
No, the difference is not only step 1. The two options are:
1. version N has a warning and a flag you can use to test the incompatible change
2. version N+1 flips the flag and you have a soft breakage
3. version N+2 removes the flag and you have a hard breakage

OR:
1. version N has a warning and a flag you can use to test the incompatible change
2. version N+1 removes the flag and you have a hard breakage

That is, there is no intermediate "soft breakage state".

Letting my inner cynic surface, I think people are quite inured to warnings so they don't help a ton, but I don't have a very strong opinion against warnings either. We could make it so that the warning message also tells which repositories make use of the obsolete feature and say something like "repository @rules_bf uses obsolete feature 'fraboozle'. Please either update your version of that repository or contact their developers so that they keep working in the next release". That's at least actionable.

In a sense, what we have here is a specific instance of the "how to update your transitive dependencies" problem. At Google, we just solve this by keeping everything at HEAD, but that doesn't fly with Bazel. I think in order to reduce fragmentation in the ecosystem, we should make an effort to keep everyone as close to the latest release of Bazel and the latest release of rule sets as possible. Of course, that directly contradicts our desire to make changes to Bazel :(

And yes, a lot of the grief stems from the fact that we cannot know what features are being used. There is, of course, the siren song of on-by-default telemetry so that we do know what features are used, but that's a very significant change and it wouldn't be perfect either, because not all Bazel invocations are done on computers connected to the Internet... maybe we can make a rule that if you don't want to be broken by a Bazel release, you need to do some things, for example:
  1. Run your builds with release candidates and if you don't report a breakage with a release candidate, the breakage is deemed to not be release-breaking
  2. Run builds with the set of incompatible flags we publish and if the next release that flips an incompatible flag, breaks you, and you didn't notify us (or even just didn't update your code), you're on your own
  3. Submit some code to ci.bazel.build that exercises the features you use and thus we get automatic advance warning (that can easily degrade into us running the Continuous Integration For The World, though, which I don't want to sign up for)
This won't change a lot when we reach 1.0, either. We may need to make stronger guarantees about not breaking people, but we'll still need to be able to make incompatible changes in some way.




To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+unsubscribe@googlegroups.com.



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | GermanyGeschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/tk3szEs1WAY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-dev+unsubscribe@googlegroups.com.

Laurent Le Brun

unread,
Aug 24, 2018, 8:07:13 AM8/24/18
to Lukács T. Berki, Jon Brandvein, Tony Aiuto, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
An issue was warnings on the console output is that they are spammy.
On a project with multiple contributors and multiple dependencies, a
warning is often not relevant to you. Most of the time, the warning
should be addressed by another person. So you'll stop paying attention
to the output (imagine if each time you build your project, 10-20
lines of noise appear). If your change introduces a new warning,
chances are you won't even notice it.

To make things work, there are many features we may want:
- flag to fail the build if there's a warning (like -Werror)
- flag to turn off specific warnings
- flag to filter the warnings we display based on their location
- some cleverness to avoid flooding the console. If a piece of code
is executed multiple times, it should show at most one warning. It's
not clear how to do it exactly, for example if a macro is used in many
packages.

But I'd like to point out we can't always use warnings. For example,
we're updating the "range" function. Instead of returning allocating
the whole list, it returns a lazy object. In most cases, users won't
see any difference, but it can cause subtil changes in the rest of the
code (e.g. if you convert the object to string). We're using an
--incompatible flag for this. But I don't see how to use a warning.

Similarly, filtering warnings is tricky because we often can't know
who is responsible for it. Should it be fixed inside a macro, or in
the BUILD file that calls it?

C++ compilers have a --std flag to select the C++ version they target.
As I said before, I'd like to have something similar for Bazel. If you
run "bazel build --std=0.19", it will enable the --incompatible flags
we plan to enable with Bazel 0.19. This allows users to the
compatibility of their code is (in particular before they update their
Bazel binary). It also gives them more visibility about the future
changes: some of the --incompatible flags are introduced a long time
before they are
switched.

This kind of flag allows us to coordinate breaking changes. For
example, we can decide to synchronize the flag switches with specific
releases (in the future, only at major versions). Users will know what
to expect, and know which updates are supposed to be safe.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
>>>>>>>>> To post to this group, send email to baze...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAFtxR%2BksMsNXicTr0j24fq3q%2BRj59cqa3eZair_gP%3Dc0DLHnjg%40mail.gmail.com.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lukács T. Berki | Software Engineer | lbe...@google.com |
>>>>>>>
>>>>>>> Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/tk3szEs1WAY/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to bazel-dev+...@googlegroups.com.
>>>>>>> To post to this group, send email to baze...@googlegroups.com.
>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAOu%2B0LW%3DJn58j4fefUobbO%2BY6vijXMQ4J8BxmuYXV5iBCu1H%2Bg%40mail.gmail.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Lukács T. Berki | Software Engineer | lbe...@google.com |
>
> Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891



--
Laurent

Tony Aiuto

unread,
Aug 24, 2018, 10:25:34 AM8/24/18
to Laurent Le Brun, Lukács T. Berki, Jon Brandvein, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Fri, Aug 24, 2018 at 8:07 AM Laurent Le Brun <laur...@google.com> wrote:
An issue was warnings on the console output is that they are spammy.
On a project with multiple contributors and multiple dependencies, a
warning is often not relevant to you. Most of the time, the warning
should be addressed by another person. So you'll stop paying attention
to the output (imagine if each time you build your project, 10-20
lines of noise appear). If your change introduces a new warning,
chances are you won't even notice it.

I wonder if we are looking at users in different ways. You say "multiple contributors", which seems to feel more like the point of view of one
of us. I very much believe we have to take the point of view of an end user who is not a contributor to Bazel. Their organization is the single
contributor to their code base, with their own organizational tolerance for warnings. Some people tree them as high priority things to fix, others ignore them for all.

To make things work, there are many features we may want:
 - flag to fail the build if there's a warning (like -Werror)
 - flag to turn off specific warnings

It is almost inevitable that we need this. While trying to make Google style C++ code build into iOS apps we regularly had to disable specific warnings for the Apple builds. Compilers have a history of producing warnings where some are  important and others are noise. Careful teams treat it all warnings as something to investigate. Some they decide are real errors and some are noise to globally filter out.
 
 - flag to filter the warnings we display based on their location

That would be very interesting. I suspect it is difficult. My two second design for that is we add a Bazel command line flag which specifies a Starlark error handling function. It gets the error level, the message text and a stack trace of the targets and rules that got us here. The system default just print it, but a user could write their own to filter out specific annoying things in a third party dependency that they need but is a little sloppy. With luck, someone from the community will eventually contribute a much richer capability to filter with specification rather than code.
 
 - some cleverness to avoid flooding the console. If a piece of code
is executed multiple times, it should show at most one warning. It's
not clear how to do it exactly, for example if a macro is used in many
packages.
I don't think we are talking about deprecation warnings that are used in many packages. This discussion started with how to quickly remove obsolete code. If the usage shows up often, then the code really is not obsolete. Any extra flooding helps warn that a real problem is looming.
 
But I'd like to point out we can't always use warnings. For example,
we're updating the "range" function. Instead of returning allocating
the whole list, it returns a lazy object. In most cases, users won't
see any difference, but it can cause subtil changes in the rest of the
code (e.g. if you convert the object to string). We're using an
--incompatible flag for this. But I don't see how to use a warning.

Yes.  Behavior change has to be slower and more careful, with multiple flag flips.
Deprecation is the case I have mainly been talking about.
Yes. That is exactly the kind of wording I expect. 
 
>
> In a sense, what we have here is a specific instance of the "how to update your transitive dependencies" problem. At Google, we just solve this by keeping everything at HEAD, but that doesn't fly with Bazel. I think in order to reduce fragmentation in the ecosystem, we should make an effort to keep everyone as close to the latest release of Bazel and the latest release of rule sets as possible. Of course, that directly contradicts our desire to make changes to Bazel :(
>
> And yes, a lot of the grief stems from the fact that we cannot know what features are being used. There is, of course, the siren song of on-by-default telemetry so that we do know what features are used, but that's a very significant change and it wouldn't be perfect either, because not all Bazel invocations are done on computers connected to the Internet... maybe we can make a rule that if you don't want to be broken by a Bazel release, you need to do some things, for example:

 
>
> Run your builds with release candidates and if you don't report a breakage with a release candidate, the breakage is deemed to not be release-breaking

Are our release candidates available long enough for people to do this? Realistically, large users might need a few weeks to a month to do  test.
 
> Run builds with the set of incompatible flags we publish and if the next release that flips an incompatible flag, breaks you, and you didn't notify us (or even just didn't update your code), you're on your own
> Submit some code to ci.bazel.build that exercises the features you use and thus we get automatic advance warning (that can easily degrade into us running the Continuous Integration For The World, though, which I don't want to sign up for)

Rule set developers might love this. I would not want to have to support helping large end users figure out how to get small test cases.
 
>
> This won't change a lot when we reach 1.0, either. We may need to make stronger guarantees about not breaking people, but we'll still need to be able to make incompatible changes in some way.

Yes. Which is why we probably need to come up with multiple policies so we can apply the best one for changes based on their degree of risk. 

Tony Aiuto

unread,
Aug 24, 2018, 10:47:02 AM8/24/18
to Austin Schuh, Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Thu, Aug 23, 2018 at 4:52 PM Austin Schuh <austin...@gmail.com> wrote:
I'm enjoying watching this discussion.  Thanks for having it on a public list!

You are welcome. We are trying to change habits.
 

On Thu, Aug 23, 2018 at 12:16 PM 'Jon Brandvein' via bazel-dev <baze...@googlegroups.com> wrote:
> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of. If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

And it can be a lot of work to upgrade third_party rulesets.  rules_closure can require us to modify a lot of javascript to remove linter warnings.  rules_go requires us to port our changes to get hermetic compilers working back onto the new version.  Upstream does not yet support our use case, and I'm not sure some of the flags we need are exposed without the bazel patches we re-apply every release.  The last upgrade took 2 weeks.

Can you describe that experience more? The conversation is helped by having more user context.
Were you doing this as a developer of rules_closure, or a user of it?


But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

-Werror -Wextra in gcc is similar to this.  We turn those flags on by default for all code, and then do the -Wno-crazy-warning-flag for the ones which conflict with code we don't/can't update.  That would be a pretty visible but give people hooks to delay a fix.  That is assuming a more savvy user though who cares about the build and isn't just trying to build tensorflow and this is another blocker.
Yes. That is what Laurent's proposed too - selective warning suppression. You get a warning, you can turn it off to avoid the noise. But if you turn if off and do not take action on it warned you about, and you break in the next release, then shame on you.

I am still interested, however, in the attitudes of users towards that first notification. I find that sequence
  • update a tool
  • my build breaks
  • set a flag to suppress the breakage 
a horrible experience for me. It interrupts my excitement at getting a new version of the tool to play with. When I administered the tools repo for a few dozen developers and I introduced something that broke only a few of them, but not the rest, it was distinctly viewed as my fault by those who were broken. Maybe others weigh that cost less.


I want our code base to be on more modern bazel.  I'd ideally like to see us update every release.

With skylark rules, the developer gets to pick when to update, or if to update.  With built-in rules, we don't get a choice and need to migrate.  This has in the past blocked bazel upgrades for months for us.

Can you say more about that? How did you discover the Bazel upgrade which you could not do? Was it update, fail, rollback? Testing new release in a sandbox? Testing a pre-release? 
 

Austin

Laurent Le Brun

unread,
Aug 24, 2018, 11:38:52 AM8/24/18
to Tony Aiuto, Lukács T. Berki, Jon Brandvein, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Fri, 24 Aug 2018 at 16:25, Tony Aiuto <ai...@google.com> wrote:
> On Fri, Aug 24, 2018 at 8:07 AM Laurent Le Brun <laur...@google.com> wrote:
>> An issue was warnings on the console output is that they are spammy.
>> On a project with multiple contributors and multiple dependencies, a
>> warning is often not relevant to you. Most of the time, the warning
>> should be addressed by another person. So you'll stop paying attention
>> to the output (imagine if each time you build your project, 10-20
>> lines of noise appear). If your change introduces a new warning,
>> chances are you won't even notice it.
>
> I wonder if we are looking at users in different ways. You say "multiple contributors", which seems to feel more like the point of view of one
> of us. I very much believe we have to take the point of view of an end user who is not a contributor to Bazel. Their organization is the single
> contributor to their code base, with their own organizational tolerance for warnings. Some people tree them as high priority things to fix, others ignore them for all.

I'm confused, because I was talking about end users. In a typical
project or company, multiple people work on the same code base. But
not everyone is responsible for every line of code. So showing every
warning to everyone is not a good idea. On average, when Bazel would
print a warning, it would be almost always shown to the "wrong"
person, i.e. not the person who is going to fix it. So a user will
learn to ignore the output (including the one line in the middle that
they should ideally care about).

Tony Aiuto

unread,
Aug 24, 2018, 12:28:17 PM8/24/18
to Laurent Le Brun, Lukács T. Berki, Jon Brandvein, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Fri, Aug 24, 2018 at 11:38 AM Laurent Le Brun <laur...@google.com> wrote:
On Fri, 24 Aug 2018 at 16:25, Tony Aiuto <ai...@google.com> wrote:
> On Fri, Aug 24, 2018 at 8:07 AM Laurent Le Brun <laur...@google.com> wrote:
>> An issue was warnings on the console output is that they are spammy.
>> On a project with multiple contributors and multiple dependencies, a
>> warning is often not relevant to you. Most of the time, the warning
>> should be addressed by another person. So you'll stop paying attention
>> to the output (imagine if each time you build your project, 10-20
>> lines of noise appear). If your change introduces a new warning,
>> chances are you won't even notice it.
>
> I wonder if we are looking at users in different ways. You say "multiple contributors", which seems to feel more like the point of view of one
> of us. I very much believe we have to take the point of view of an end user who is not a contributor to Bazel. Their organization is the single
> contributor to their code base, with their own organizational tolerance for warnings. Some people tree them as high priority things to fix, others ignore them for all.

I'm confused, because I was talking about end users. In a typical
project or company, multiple people work on the same code base. But
not everyone is responsible for every line of code. So showing every
warning to everyone is not a good idea. On average, when Bazel would
print a warning, it would be almost always shown to the "wrong"
person, i.e. not the person who is going to fix it. So a user will
learn to ignore the output (including the one line in the middle that
they should ideally care about).

Yes. warnings will often be shown to the "wrong" person, but by showing them there is a chance that *someone* in the organization is the right person. By showing nothing, we drift towards hiding the problem longer than we needed to, thus shortening the organization's time to fix in advance.

Austin Schuh

unread,
Aug 24, 2018, 12:52:05 PM8/24/18
to Tony Aiuto, Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Fri, Aug 24, 2018 at 7:47 AM Tony Aiuto <ai...@google.com> wrote:
On Thu, Aug 23, 2018 at 4:52 PM Austin Schuh <austin...@gmail.com> wrote:
On Thu, Aug 23, 2018 at 12:16 PM 'Jon Brandvein' via bazel-dev <baze...@googlegroups.com> wrote:
> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of. If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

And it can be a lot of work to upgrade third_party rulesets.  rules_closure can require us to modify a lot of javascript to remove linter warnings.  rules_go requires us to port our changes to get hermetic compilers working back onto the new version.  Upstream does not yet support our use case, and I'm not sure some of the flags we need are exposed without the bazel patches we re-apply every release.  The last upgrade took 2 weeks.

Can you describe that experience more? The conversation is helped by having more user context.
Were you doing this as a developer of rules_closure, or a user of it?

For rules_closure, we are a user.  We have some javascript built with rules_closure.  It's code for debug webpages, so it's not mission critical code.  That means it gets a lot less TLC than some of our C++ code which is mission critical and has safety of life implications.

New rules_closure updates will bring in new rules_closure compilers.  Those compilers will catch new warnings in our code.  This puts the burden of updating code to fix the new warnings on our company's build and release folk rather than on the developers using rules_closure.  The last upgrade I did of the rules took 1-2 days.  Recent ones have been faster since the project seems to not be under active development.

For C++ changes which change the compiler flags, we need to re-certify our compilers.


But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

-Werror -Wextra in gcc is similar to this.  We turn those flags on by default for all code, and then do the -Wno-crazy-warning-flag for the ones which conflict with code we don't/can't update.  That would be a pretty visible but give people hooks to delay a fix.  That is assuming a more savvy user though who cares about the build and isn't just trying to build tensorflow and this is another blocker.
Yes. That is what Laurent's proposed too - selective warning suppression. You get a warning, you can turn it off to avoid the noise. But if you turn if off and do not take action on it warned you about, and you break in the next release, then shame on you.

I am still interested, however, in the attitudes of users towards that first notification. I find that sequence
  • update a tool
  • my build breaks
  • set a flag to suppress the breakage 
a horrible experience for me. It interrupts my excitement at getting a new version of the tool to play with. When I administered the tools repo for a few dozen developers and I introduced something that broke only a few of them, but not the rest, it was distinctly viewed as my fault by those who were broken. Maybe others weigh that cost less.

Let me be clear here.  We version Bazel with our repository.  That's the only way we can stay sane.  I have a requirement from ISO 26262 for automotive functional safety that we can build old versions of code for many years reproducibly.  So, when I say "update bazel", I mean update tools/bazel to point to a new version of bazel.  I *can't* submit that change and roll a new bazel out until our CI passes again (presubmit CI).  My experience is not a tensorflow user, but more of a corporate user.  I also volunteer with a high-school robotics team who adopts the same approach as work.

So, this process tends to look like for C++
  1) try to compile.
  2) get error.  Error says "-Wbad-bar-baz" somewhere in it.
  3) Add "-Wno-bad-bar-baz" to copts for the cc_library/binary.

With Bazel, if that approach was adopted, I would hope the experience would be similar.
  1) try to compile
  2) get error.  Error says "-Wbad-bar-baz" somewhere in it.  This is key because I know which flag is controlling the error.
  3a) add "build -Wno-bad-bar-baz" to //tools:bazel.rc and file a bug for someone to fix the problem.
      or
  3b) Fix the error.

Note: for us, this would be the build tools group, or developer wearing that hat.  A normal developer wouldn't see the error or be aware of this process.

As a tensorflow or other user, I'd want to grab a recent version of bazel and have everything just work.  In those cases, I don't think I'd really care about bazel version and really just want to know which version to install to make it work.  I think that's a distinctly different use case than a corporate developer/maintainer.
 
I want our code base to be on more modern bazel.  I'd ideally like to see us update every release.

With skylark rules, the developer gets to pick when to update, or if to update.  With built-in rules, we don't get a choice and need to migrate.  This has in the past blocked bazel upgrades for months for us.

Can you say more about that? How did you discover the Bazel upgrade which you could not do? Was it update, fail, rollback? Testing new release in a sandbox? Testing a pre-release? 

0.16.0 was one for us.  We grabbed the release, attempted to build our repo with it, and ran into Java problems pulling in host Java from broken machines.  Pulling in host dependencies violate our versioning policy above.  Since we version bazel with the repo, the engineer updating Bazel couldn't/didn't submit the change and nobody across the org noticed.

Most breaking bazel changes are in Starlark.  That was more true in the earlier days than today.  We've had a py_binary breaking change in the past.  That was simple enough to fix that we just went and fixed all the call sites in the rollout commit.  Some are easiest to fix by upgrading the rules with the problem.  Others require us to patch the code or rules.

Austin

Tony Aiuto

unread,
Aug 24, 2018, 2:47:05 PM8/24/18
to Austin Schuh, Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
Thanks for the info. 

On Fri, Aug 24, 2018 at 12:52 PM Austin Schuh <austin...@gmail.com> wrote:
On Fri, Aug 24, 2018 at 7:47 AM Tony Aiuto <ai...@google.com> wrote:
On Thu, Aug 23, 2018 at 4:52 PM Austin Schuh <austin...@gmail.com> wrote:
On Thu, Aug 23, 2018 at 12:16 PM 'Jon Brandvein' via bazel-dev <baze...@googlegroups.com> wrote:
> I suspect that the more common case is that they will have to remember that when they upgrade to Bazel n+1, they should update their third party rule sets as well.

Probably. But here's my attempt at finding an overly reductive dichotomy out of this: If the ruleset is poorly maintained, then there's no upgrade to speak of. If the ruleset is well maintained, then the maintainer could be trusted to annotate the old version as incompatible with newer Bazels, and the user gets a nice friendly failure message. (We may need some utils in a standard .bzl file to assist with this.)

And it can be a lot of work to upgrade third_party rulesets.  rules_closure can require us to modify a lot of javascript to remove linter warnings.  rules_go requires us to port our changes to get hermetic compilers working back onto the new version.  Upstream does not yet support our use case, and I'm not sure some of the flags we need are exposed without the bazel patches we re-apply every release.  The last upgrade took 2 weeks.

Can you describe that experience more? The conversation is helped by having more user context.
Were you doing this as a developer of rules_closure, or a user of it?

For rules_closure, we are a user.  We have some javascript built with rules_closure.  It's code for debug webpages, so it's not mission critical code.  That means it gets a lot less TLC than some of our C++ code which is mission critical and has safety of life implications.

New rules_closure updates will bring in new rules_closure compilers.  Those compilers will catch new warnings in our code.  This puts the burden of updating code to fix the new warnings on our company's build and release folk rather than on the developers using rules_closure.  The last upgrade I did of the rules took 1-2 days.  Recent ones have been faster since the project seems to not be under active development.

For C++ changes which change the compiler flags, we need to re-certify our compilers.

I'm guessing you keep those under revision control then too. If you are using or planning for remote build execution, you probably must be making your own containers with the specific compilers you have certified, right?



But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

-Werror -Wextra in gcc is similar to this.  We turn those flags on by default for all code, and then do the -Wno-crazy-warning-flag for the ones which conflict with code we don't/can't update.  That would be a pretty visible but give people hooks to delay a fix.  That is assuming a more savvy user though who cares about the build and isn't just trying to build tensorflow and this is another blocker.
Yes. That is what Laurent's proposed too - selective warning suppression. You get a warning, you can turn it off to avoid the noise. But if you turn if off and do not take action on it warned you about, and you break in the next release, then shame on you.

I am still interested, however, in the attitudes of users towards that first notification. I find that sequence
  • update a tool
  • my build breaks
  • set a flag to suppress the breakage 
a horrible experience for me. It interrupts my excitement at getting a new version of the tool to play with. When I administered the tools repo for a few dozen developers and I introduced something that broke only a few of them, but not the rest, it was distinctly viewed as my fault by those who were broken. Maybe others weigh that cost less.

Let me be clear here.  We version Bazel with our repository.  That's the only way we can stay sane.  I have a requirement from ISO 26262 for automotive functional safety that we can build old versions of code for many years reproducibly.  So, when I say "update bazel", I mean update tools/bazel to point to a new version of bazel.  I *can't* submit that change and roll a new bazel out until our CI passes again (presubmit CI).  My experience is not a tensorflow user, but more of a corporate user.  I also volunteer with a high-school robotics team who adopts the same approach as work.

Understood. That is the style I expect many Bazel users to adopt. I've been there myself. A small tools group owns the upgrades of common stuff for the whole company. My next question is, what is "Bazel". Do you view the update to Bazel and all the rule sets as one unit, or do you get more fine grained than that.  Do you try to update rules_closure ahead of an upcoming Bazel update, or is it an all-at-once certify and publish the tool stack?

Lukács T. Berki

unread,
Aug 27, 2018, 4:08:57 AM8/27/18
to Tony Aiuto, Austin Schuh, Jon Brandvein, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
I think there are two, mostly independent, problems (not every pair of possible solutions makes sense): 
  1. How breaking changes to Bazel should be rolled out and how much work people can be expected to do to update to a newer Bazel version
  2. How people should be warned about these
I agree with Laurent that if we print warnings that are not actionable, our users will get inured to them. "Not actionable" in this context means not only "has no way to fix it" but also "is not willing to fix it" -- if my job is to add a feature or fix a bug in the actual software I'm writing and I'm in a hurry, I couldn't care less about whatever log spam Bazel throws at me. However, I also agree with Tony that there needs to be a way to warn people that a change is coming.

I posit that our users can be split into two groups: 
  1. One where there is someone whose job is to update Bazel and make sure that their code keeps working
  2. One where there are no such people and things are expected to "just work".
I don't think we should ignore the second group, but at the same time, Bazel still contains enough quirks and misfeatures we'd like to remove that we can't promise indefinite or even medium-term backward compatibility. So the best we can do is to make sure that it's easy for them to migrate to the first group.

The warning could be done by having a way to disable moribund features (we kinda already have this with --incompatible_*) and having a way of reporting where they are used (which seems feasible for *most* of these things), preferably also in sub-repositories so that you know whom to send pull requests (or bug to update their code).

But that doesn't answer the question of "how much incompatibility is okay between Bazel releases". Do y'all think we could get away with "an arbitrary number, as long as we give some warning" approach?

(Note that we already have a way to filter messages based on where they were emitted from: it's the --output_filter command line option.)


Austin Schuh

unread,
Aug 27, 2018, 2:35:52 PM8/27/18
to Tony Aiuto, Jon Brandvein, Lukács T. Berki, Laurent Le Brun, Dmitry Babkin, Marcel Hlopko, bazel-dev, Rosica Dejanovska, Dmitry Lomov
On Fri, Aug 24, 2018 at 11:47 AM Tony Aiuto <ai...@google.com> wrote:
Thanks for the info. 

On Fri, Aug 24, 2018 at 12:52 PM Austin Schuh <austin...@gmail.com> wrote:
For C++ changes which change the compiler flags, we need to re-certify our compilers.

I'm guessing you keep those under revision control then too. If you are using or planning for remote build execution, you probably must be making your own containers with the specific compilers you have certified, right?

We have a custom CROSSTOOL file.  The compiler binaries are fetched with new_http_repository.  This means that we use very little from the developer's workstation and a worker container is pretty much empty (xz, 32 bit libc, small utilities like that). The former is more important.  We need to support developers building in the middle of nowhere (I'm testing today on a closed course in Texas right now 1.5 hours away from the nearest major city).

 


But again, assuming that the user is updating frequently enough to catch the flag in a release after it's been defaulted to true, and before it's been removed, why do we need anything special? The user can just opt-out of the flag. The fail-by-default-then-opt-out workflow serves as their warning.

-Werror -Wextra in gcc is similar to this.  We turn those flags on by default for all code, and then do the -Wno-crazy-warning-flag for the ones which conflict with code we don't/can't update.  That would be a pretty visible but give people hooks to delay a fix.  That is assuming a more savvy user though who cares about the build and isn't just trying to build tensorflow and this is another blocker.
Yes. That is what Laurent's proposed too - selective warning suppression. You get a warning, you can turn it off to avoid the noise. But if you turn if off and do not take action on it warned you about, and you break in the next release, then shame on you.

I am still interested, however, in the attitudes of users towards that first notification. I find that sequence
  • update a tool
  • my build breaks
  • set a flag to suppress the breakage 
a horrible experience for me. It interrupts my excitement at getting a new version of the tool to play with. When I administered the tools repo for a few dozen developers and I introduced something that broke only a few of them, but not the rest, it was distinctly viewed as my fault by those who were broken. Maybe others weigh that cost less.

Let me be clear here.  We version Bazel with our repository.  That's the only way we can stay sane.  I have a requirement from ISO 26262 for automotive functional safety that we can build old versions of code for many years reproducibly.  So, when I say "update bazel", I mean update tools/bazel to point to a new version of bazel.  I *can't* submit that change and roll a new bazel out until our CI passes again (presubmit CI).  My experience is not a tensorflow user, but more of a corporate user.  I also volunteer with a high-school robotics team who adopts the same approach as work.

Understood. That is the style I expect many Bazel users to adopt. I've been there myself. A small tools group owns the upgrades of common stuff for the whole company. My next question is, what is "Bazel". Do you view the update to Bazel and all the rule sets as one unit, or do you get more fine grained than that.  Do you try to update rules_closure ahead of an upcoming Bazel update, or is it an all-at-once certify and publish the tool stack?

I view Bazel as the thing that runs starlark and orchestrates the build.  It's also linux_sandbox.  From the attitudes that y'all have communicated on the list over the last couple of years, it's legacy that things like cc_library and friends are still in that binary.

I try to only update the pieces which need updating.  Updating rules can require us to modify code to address changes with the rules, which takes time.  Our story there isn't great.  We also need to re-host everything internally (skylark -> starlark rename would have blown out our ability to do old builds, and any artifacts that get removed kill that as well.  We need to control our destiny there.).  That's expensive in terms of developer time.  It would be very helpful to have a way to re-host artifacts easily, quickly, and generically, but that's getting beyond the scope of this discussion.
 
Austin

ittai zeidman

unread,
Aug 28, 2018, 12:01:28 AM8/28/18
to bazel-dev
Hi,
I think this is a very important discussion.
I wanted to add that while I agree Bazel needs to advance itself the current “incompatible” cost is too large.
We (rules_scala) has it turned on for a few months and just broke down from the amount of changes and their state and we disabled these tests.
Two main issues:
1. Handling other repositories issues (I had to fix protobuf build which broke me).
2. JavaCommon added a deprecation but the replacement was broken. It took a while (which is at least a release and might have been two, don’t remember) until it was fixed which also meant we needed to ignore it for now.

Lukács T. Berki

unread,
Aug 28, 2018, 3:59:37 AM8/28/18
to ittai zeidman, bazel-dev
On Tue, Aug 28, 2018 at 6:01 AM, ittai zeidman <itt...@gmail.com> wrote:
Hi,
I think this is a very important discussion.
Yep. We should have had it a long time ago, but better late than never.
 
I wanted to add that while I agree Bazel needs to advance itself the current “incompatible” cost is too large.
I agree. Unfortunately, our initial release contained a lot of cruft that it really should not have :( One way to go forward is to identify functionality we want to remove before Bazel 1.0, add incompatible flags for them, flip them all, declare Bazel 1.0, then be very careful about making incompatible changes in the future. However, we *will* need to be able to make incompatible changes in the future, too, but maybe if our original sins are all atoned for, we can make the incompatible changes process much more conservative.


We (rules_scala) has it turned on for a few months and just broke down from the amount of changes and their state and we disabled these tests.
Two main issues:
1. Handling other repositories issues (I had to fix protobuf build which broke me).
This, unfortunately, is an instance of the "I can easily fix my own code, but I cannot easily fix the code of my dependencies" problem. I think that the way around this is tools to make such fixes automatic or at least very easy and to have as few incompatible changes as we can get away with.
 
2. JavaCommon added a deprecation but the replacement was broken. It took a while (which is at least a release and might have been two, don’t remember) until it was fixed which also meant we needed to ignore it for now.
I think this is a case for postponing incompatible flag flips if need be and not mindlessly doing them the release after the flag was introduced.
 

--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+unsubscribe@googlegroups.com.

To post to this group, send email to baze...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Lukács T. Berki

unread,
Aug 30, 2018, 11:25:27 AM8/30/18
to ittai zeidman, bazel-dev
Let's continue the discussion on this thread on bazel-discuss in a bit wider audience. Arguably, I should have started this thread there in the first place.
Reply all
Reply to author
Forward
0 new messages