using a stateful compiler with Bazel

973 views
Skip to first unread message

P. Oscar Boykin

unread,
Nov 24, 2015, 4:54:19 PM11/24/15
to bazel-discuss
I want to use a compiler that keeps on disk state from run to run (zinc compiler for scala). The motivation is that the scala compiler is very slow, so compiling largish targets could take 1-2 minutes.

I know the canonical answer to this is to make finer grained targets, but to me this is not a complete solution. First, it is a bit painful to create one target for each item in the logical dependency graph just to speed up compilation. Second it conflates the real logical blocks of your system, which people can understand, with the nodes in the strict dependency graph, which might be enormous. Taken to the logical conclusion, a 100 file project may have 100 targets and I'll have to depend on some custom subset of these for each downstream project. Imagine if there was a separate jar for every class in guava.

What I would like:
To support something a compiler like zinc, I need to access the previous versions of two outputs. So, imagine here:


if we had a hypothetical skylark API that provided ctx.previous_outputs which give me the outputs from the most recent successful build of this target, which of course may be None, then I think we'd be set. I could call zinc and access the previous class files and analysis and allow zinc to decide which class files need to be rebuilt, which can be reused and which to delete, which it can already handle.

For zinc would also add a new output which is the cache file which tracks dependencies in addition to the usual jar of all the class files. If the sources and dependencies don't change, bazel would not run the rule. If the sources or the dependencies change, we run the rule again. In this way, previous_outputs would not be considered an input to the action, more like an "optional_input".

Adding this API, namely ctx.previous_outputs, seems in principle quite feasible. There may be other means of accomplishing the same thing.

Damien Martin-guillerez

unread,
Nov 27, 2015, 4:02:58 AM11/27/15
to P. Oscar Boykin, bazel-discuss, phi...@google.com, mst...@google.com
+Philipp Wollermann +Michael Staib 

Can you use a persistent process for that? In which case the worker strategy could be a good fit for you.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/5199222e-af2c-4d37-8fbe-08628837f5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

P. Oscar Boykin

unread,
Nov 30, 2015, 10:44:38 AM11/30/15
to bazel-discuss, oscar....@gmail.com, phi...@google.com, mst...@google.com
A persistent process *in addition to* access to the previous outputs can speed zinc up, but that would be my second priority after just getting zinc to run at all.

If the sandboxing is weak (which I guess it is not on Linux), I could have zinc maintain its own state directory, but it seems access to the previous build artifacts would be the simplest solution I can see now.

Janak Ramakrishnan

unread,
Dec 6, 2015, 5:21:11 PM12/6/15
to P. Oscar Boykin, bazel-discuss, Philipp Wollermann, Michael Staib
Adding a previous_outputs property would probably meet with a lot of resistance. First, it would be quite easy to accidentally break incremental correctness in rules with such a property. Second, Bazel deletes previous outputs before running an action. Changing that (to move them to a temp directory?) wouldn't be a trivial project. Unfortunately I don't have any constructive suggestions right now, except that you should be able to "union" a number of fine-grained targets into a single larger one that downstream projects depend on, so breaking your targets up into smaller ones may be the best solution. Only the larger one would be visible in your BUILD file, so the logical blocks of your project would still be preserved.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/2aaefbe1-4c29-408c-8e56-6557ee6c13c5%40googlegroups.com.

Kamal Marhubi

unread,
Dec 28, 2015, 12:01:48 PM12/28/15
to Janak Ramakrishnan, P. Oscar Boykin, bazel-discuss, Philipp Wollermann, Michael Staib
Coming at this from another direction, what would Bazel need to best exploit incrementality in a compiler? The Rust folks are starting work on this, and I'm interested in seeing if Bazel can jump on to make use of it. A relevant part of the incremental compilation RFC describing the fine-grained dependency graph: https://github.com/rust-lang/rfcs/blob/master/text/1298-incremental-compilation.md#core-idea-a-fine-grained-dependency-graph

-Kamal


Lukács T. Berki

unread,
Jan 5, 2016, 5:59:33 AM1/5/16
to Kamal Marhubi, Janak Ramakrishnan, P. Oscar Boykin, bazel-discuss, Philipp Wollermann, Michael Staib
I agree with Janak that this will be a hard fit with Blaze because it thinks of the world in a functional way. If I had to implement this, I'd implement it as a special ActionContext that has extra knowledge about how to run Rust actions.


For more options, visit https://groups.google.com/d/optout.



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Maximillianstr. 11-15 | 80539 München | Germany | Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle | Registergericht und -nummer: Hamburg, HRB 86891

P. Oscar Boykin

unread,
Jan 5, 2016, 10:13:42 AM1/5/16
to Lukács T. Berki, Kamal Marhubi, Janak Ramakrishnan, bazel-discuss, Philipp Wollermann, Michael Staib
Just a note: there is nothing non-functional about seeing the previous outputs. It just means your compiler is a kind of fold rather than a map function. It would still be a pure function.

Standard compiler:
compile :: Sources -> Artifacts

Incremental compiler:
compile :: Maybe Artifacts -> Sources -> Artifacts

(Using Haskell notation here)
--
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

Kamal Marhubi

unread,
Jan 5, 2016, 11:00:18 AM1/5/16
to P. Oscar Boykin, Lukács T. Berki, Janak Ramakrishnan, bazel-discuss, Philipp Wollermann, Michael Staib
On Tue, Jan 5, 2016 at 10:13 AM P. Oscar Boykin <oscar....@gmail.com> wrote:
Just a note: there is nothing non-functional about seeing the previous outputs. It just means your compiler is a kind of fold rather than a map function. It would still be a pure function.

Standard compiler:
compile :: Sources -> Artifacts

Incremental compiler:
compile :: Maybe Artifacts -> Sources -> Artifacts

(Using Haskell notation here)

Bazel would require that the incremental compilation output would be the same as the output starting from scratch. In this notation, I think bazel's requirement would be that given sources1 and sources2

compile Nothing sources2 == compile (Just (compile sources1)) sources2

which isn't automatically true. So it's possible for the compile to be a pure function, but for it not to satisfy proper incrementality and give bit-for-bit identical outputs for the incremental build.

-Kamal

P. Oscar Boykin

unread,
Jan 5, 2016, 7:26:48 PM1/5/16
to Kamal Marhubi, Lukács T. Berki, Janak Ramakrishnan, bazel-discuss, Philipp Wollermann, Michael Staib
Yes. And actually, you can imagine bazel providing easy tests for such an assertion so extension writers could verify this to be true.

Lukács T. Berki

unread,
Jan 7, 2016, 4:45:04 AM1/7/16
to P. Oscar Boykin, Kamal Marhubi, Janak Ramakrishnan, bazel-discuss, Philipp Wollermann, Michael Staib
On Wed, Jan 6, 2016 at 1:26 AM, P. Oscar Boykin <oscar....@gmail.com> wrote:
Yes. And actually, you can imagine bazel providing easy tests for such an assertion so extension writers could verify this to be true.
What you want is the compiler to promise that no matter what the history is, the output will always be bit-by-bit the same, that is, the output of the compile function above is independent from its first argument. And it's hard to do that because stateful compilers generally have a large state space and compilers are a complex kind of animal.


On Tue, Jan 5, 2016 at 8:00 AM, Kamal Marhubi <ka...@marhubi.com> wrote:
On Tue, Jan 5, 2016 at 10:13 AM P. Oscar Boykin <oscar....@gmail.com> wrote:
Just a note: there is nothing non-functional about seeing the previous outputs. It just means your compiler is a kind of fold rather than a map function. It would still be a pure function.

Standard compiler:
compile :: Sources -> Artifacts

Incremental compiler:
compile :: Maybe Artifacts -> Sources -> Artifacts

(Using Haskell notation here)

Bazel would require that the incremental compilation output would be the same as the output starting from scratch. In this notation, I think bazel's requirement would be that given sources1 and sources2

compile Nothing sources2 == compile (Just (compile sources1)) sources2

which isn't automatically true. So it's possible for the compile to be a pure function, but for it not to satisfy proper incrementality and give bit-for-bit identical outputs for the incremental build.

-Kamal



--

pa...@lucidchart.com

unread,
Jun 8, 2016, 4:57:06 PM6/8/16
to bazel-discuss, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, phi...@google.com, mst...@google.com
> Unfortunately I don't have any constructive suggestions right now, except that...breaking your targets up into smaller ones may be the best solution.

Not only is very fine-grained dependencies unmanageable, it imposes the additional restriction that the programmer find non-cyclical ways to divide the code. It can become an exercise that has no other purpose than to appease the build gods.

> if we had a hypothetical skylark API that provided ctx.previous_outputs which give me the outputs from the most recent successful build of this target, which of course may be None, then I think we'd be set.

This seems to be the only choice left. I know of no other.

> it would be quite easy to accidentally break incremental correctness in rules with such a property.

It's already an understanding among rule writers that it should be deterministic. E.g. don't depend on /dev/random. Also, don't depend on undefined behaviors.

E.g. https://github.com/google/closure-compiler/issues/438 Okay, that's not the best example because it was due to a different Java version, but I've seen *many* times where Google Closure Compiler in the past failed code on one machine but passed it on another machine with an identical Java version. (Maybe a race condition, maybe some other randomness introduced by the Java runtime.)

Point is, there's no magic force than can prevent a rule-writer from writing non-deterministic code.

---

It's actually rather ironic. The FAQ

> Bazel tries to minimize expensive compilation steps. If you are only using interpreted languages directly, such as JavaScript or Python, Bazel will likely not interest you.

Yet for expensive compilation -- Scala, Rust, Haskell -- those are exactly the cases that Bazel makes hugely **slower**. I'd like to be able to support these incremental compilers, and any others.

Please don't make it {Fast, Correct} - Choose one

Philipp Wollermann

unread,
Jun 9, 2016, 6:51:49 AM6/9/16
to pa...@lucidchart.com, bazel-discuss, ulf...@google.com, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, mst...@google.com
+Ulf Adams FYI, as we talked this morning about how / whether to support stateful compilers.

Ulf Adams

unread,
Jun 9, 2016, 7:55:35 AM6/9/16
to Philipp Wollermann, Paul Draper, bazel-discuss, oscar....@gmail.com, Kamal Marhubi, Janak Ramakrishnan, Michael Staib
We have a well-defined API to support incremental compilers. The question is whether we need a second API to do so, and I'm rather unconvinced that we do.

There is an additional question of whether they can be enabled by default (and disabled on release builds?) or disabled by default and enabled on interactive builds, which is orthogonal to the first.

On Thu, Jun 9, 2016 at 12:51 PM, Philipp Wollermann <phi...@google.com> wrote:
+Ulf Adams FYI, as we talked this morning about how / whether to support stateful compilers.

On Wed, Jun 8, 2016 at 10:57 PM <pa...@lucidchart.com> wrote:
> Unfortunately I don't have any constructive suggestions right now, except that...breaking your targets up into smaller ones may be the best solution.

Not only is very fine-grained dependencies unmanageable, it imposes the additional restriction that the programmer find non-cyclical ways to divide the code. It can become an exercise that has no other purpose than to appease the build gods.

> if we had a hypothetical skylark API that provided ctx.previous_outputs which give me the outputs from the most recent successful build of this target, which of course may be None, then I think we'd be set.

This seems to be the only choice left. I know of no other.

>  it would be quite easy to accidentally break incremental correctness in rules with such a property.

It's already an understanding among rule writers that it should be deterministic. E.g. don't depend on /dev/random. Also, don't depend on undefined behaviors.

E.g. https://github.com/google/closure-compiler/issues/438 Okay, that's not the best example because it was due to a different Java version, but I've seen *many* times where Google Closure Compiler in the past failed code on one machine but passed it on another machine with an identical Java version. (Maybe a race condition, maybe some other randomness introduced by the Java runtime.)

Point is, there's no magic force than can prevent a rule-writer from writing non-deterministic code.

Determinism != Correctness. I'd define correctness roughly like this:

A build system is correct if the output of an incremental build is consistent with the output of a clean build from the same source, where consistency means that you could come up with a hypothetical sequence / timing of running the actions pertaining to the build to obtain that output.

The idea of that definition being that we can define correctness of a build system independently of the correctness or determinism of the individual tools. This is useful for two reasons: (1) there are scenarios in which build systems fail to produce correct output even if the tools are all fully correct and deterministic, and (2) otherwise you depend on all tools being correct and deterministic, which is obviously not the case. (Note that this precludes concurrent modifications to the file system, though we're also trying to ensure that Bazel always picks up changes made while the build is running in the next build at the latest.)

Consider the case of C++ compilation; it is incorrect according to this definition to cache a C++ compilation if all the header files used during that compilation are unmodified. This is because of the header search path - you can add a header file to an earlier search path entry, which would not get detected by this policy. This can be fixed by also checking that certain header files are _not_ found in certain search path entries, and a correct build system will do so.

There may still be bugs in Bazel that cause it to be incorrect according to this definition; internally, we have managed to make Bazel "good enough" such that users (almost) never run "bazel clean", and that they file bugs if they encounter cases where a clean build produces different results from an incremental build. My list of known correctness issues is very short, and all of them are very rare, i.e., it's more common that the Jvm crashes due to a Hotspot bug than that Bazel produces incorrect output.

We also realize that, in some cases, relying on the correctness of tools rather than safe-guarding against their possible incorrectness can result in faster build times. That's why we added the worker API, which anyone can use. This allows you to make the tradeoff between performance and trust (in the correctness of the tools) yourself.

However, we will not enable workers for any tool by default if we have legitimate concerns about the correctness of the specific tool. For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler, and we haven't encountered any correctness issues recently. However, back when I experimented with it, Javac was using second-precision timestamps to detect file changes, even on file systems that actually support sub-second timestamps, and this was causing issues in some scenarios. Before enabling incremental Java compilation by default, it would be good to check whether that specific issue was fixed or change the Java compiler to use an external source of change notifications, e.g., Bazel passes file content checksums to the worker which could be used instead of timestamps.

pa...@lucidchart.com

unread,
Jun 9, 2016, 10:16:18 AM6/9/16
to bazel-discuss, phi...@google.com, pa...@lucidchart.com, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, mst...@google.com
In a way, Bazel is introducing functional programming to the imperative build world. But as with functional programming, there are times when imperative has too large of benefits (often performance related) to ignore.

So, (especially as a Scala programmer ;) I understand the conundrum.

> There is an additional question of whether they can be enabled by default (and disabled on release builds?) or disabled by default and enabled on interactive builds, which is orthogonal to the first.

I'm not sure what is meant by the first option. The second sounds great.

I have no problem with a CI server compiling in whole package units. I just can't ask devs to wait minutes in between their one-line changes.

> We have a well-defined API to support incremental compilers.

> For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler

I wasn't aware you used an incremental Java compiler. Do you use Ant's, or Eclipse's, or one of your own design?

Can you point to me the code?

Java shares Scala's problem that only *after* compilation (or a step just as expensive) can dependencies be determined. I.e. there isn't (and can't be) a `gcc -M` equivalent that can run before hand.

So anyway, I'm very interested in seeing what has been done with Java.

pa...@lucidchart.com

unread,
Jun 9, 2016, 1:05:58 PM6/9/16
to bazel-discuss, phi...@google.com, pa...@lucidchart.com, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, mst...@google.com
> For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler

This says that Bazel does *not* use an incremental java compiler.

https://groups.google.com/d/msg/bazel-discuss/TVAJeoL8xmI/SEDJsuwbBQAJ

P. Oscar Boykin

unread,
Jun 14, 2016, 12:20:04 AM6/14/16
to pa...@lucidchart.com, bazel-discuss, Philipp Wollermann, Kamal Marhubi, Janak Ramakrishnan, Michael Staib
Out of curiosity, Paul, are you using:


That has been pretty usable for us (at Stripe) albeit so far on a smaller repo. We do have pretty fine grained targets.

I would like to find time to add support for bazel's worker processes. Here is a first stab:

which was quite good but had a few issues:
1. uses scala 2.10, and the rules currently only support scala 2.11
2. uses zinc, and twitter had correctness issues using zinc with pants. I think we can get a lot of win just instantiating a new standard compiler but have the jit able to have run on the code.

pa...@lucidchart.com

unread,
Jun 15, 2016, 12:25:36 AM6/15/16
to bazel-discuss, pa...@lucidchart.com, phi...@google.com, ka...@marhubi.com, jan...@google.com, mst...@google.com
> Out of curiosity, Paul, are you using:
> https://github.com/bazelbuild/rules_scala
>
> That has been pretty usable for us (at Stripe) albeit so far on a smaller repo. We do have pretty fine grained targets.

No, honestly I haven't used Bazel for Scala at all.

I just know our compilation with sbt. Even with the "worker process" optimization that sbt essentially already does, a clean build on a respectable laptop can be a couple minutes for a single project.

We have several dozen projects with 1-30 source files, a couple dozen more with 31-200 source files, and several projects with 200-600 source files. The larger ones use the Play framework, which does a fair bit with implicits. Oh, and tests (specs2) are on top of those numbers, which use even more implicits.

Sure, we can start teasing the big projects apart in non-cyclical ways. I won't say that would be a bad thing. But it's just not something devs have to worry about now. Adopting Bazel today either requires either refactoring ten of KLOCs of code, or incurring massive hits to build times. I can't sell that.

> I would like to find time to add support for bazel's worker processes.

Is a worker process not subjected to the same sandboxing requirements? Is that how you used zinc?

(Also, do you know much about fsc? Is that just a worker process?)

> I think we can get a lot of win just instantiating a new standard compiler but have the jit able to have run on the code.

This would have build times similar to `sbt compile clean compile`. Nice, but not great.

P. Oscar Boykin

unread,
Jun 15, 2016, 12:34:22 AM6/15/16
to Paul Draper, bazel-discuss, Philipp Wollermann, Kamal Marhubi, Janak Ramakrishnan, Michael Staib
In bazel you will almost never do a clean compile in normal development, so that is one win. So, it is not the case that you would get times comparable to `compile clean compile`. Also, when upstream code only changes implementations, not signatures, often you don't need to rebuild downstream (and the scala rules support this). We have not seen any problems trusting bazel to do the right thing, so I don't think a clean compile is the right benchmark. Even our CI is caching now, and so small changes to the repo don't take more than a minute or two for CI to run.

As for factoring the build, we can possibly use zinc to automatically build the targets. As it does analysis of the code, it finds the connected components and can output them. You can imagine a scala migration tool that you could run that would output those targets, and then make combined targets that export all of the old ones. For new targets, you could just keep the targets as small as possible. Here is a related issue:

The worker processes are something bazel supports which is basically the same as fsc: you keep a process resident. This should be stateless from run to run to be safe with bazel, that is the difference with zinc (sbt's compiler), which is not stateless, and why sometimes the clean build is needed.


--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/3iUy5jxS3S0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/cc8e6f99-78ea-452d-ad58-96cb3e3701e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Draper

unread,
Jun 15, 2016, 11:12:48 AM6/15/16
to P. Oscar Boykin, bazel-discuss, Philipp Wollermann, Kamal Marhubi, Janak Ramakrishnan, Michael Staib
I shouldn't have said `sbt compile clean compile`. Rather, `sbt my-project/compile my-project/clean my-project/compile`.

The GitHub issue is a good idea. But I'm not sure how possible that is. That why I would really love to see how the Java incremental compiler works (could work?) with Bazel.

Ulf said 

We have a well-defined API to support incremental compilers. 
For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler

But all I see in Bazel is a non-incremental Java compiler.

ia...@stripe.com

unread,
Jun 15, 2016, 12:00:40 PM6/15/16
to bazel-discuss, oscar....@gmail.com, phi...@google.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
Why do that clean in the middle? the goal of bazel I had thought is to never do it? 

That said, there are a few issues around scala compilation vs java where you can get more cascading builds from small changes. Scala injects runtime annotations, length info (though this one you can disable with compiler options) into the class files, which impact's the ijar signature/caching of things. 

Paul Draper

unread,
Jun 15, 2016, 12:52:44 PM6/15/16
to ia...@stripe.com, bazel-discuss, P. Oscar Boykin, Philipp Wollermann, Kamal Marhubi, Janak Ramakrishnan, Michael Staib
I was trying to get an idea of what the performance would be like with a Bazel worker.

* The first compile was to simulate a warmed-up worker.
* The clean (scoped to just that project) was because bazel starts each project from a "clean" build, i.e. no prior state for that project.
* The time for the  second compile would then be the time I could expect from a Bazel worker process.

Trying this on the largest one of our 85 projects. (Note: this project has dependencies on other projects, though they've already been compiled, which is the usual case.)

$ sbt chart-web/compile chart-web/clean chart-web/compile
[info] Loading project definition from /home/paul/lucid/main/project/project
[info] Loading project definition from /home/paul/lucid/main/project
[info] Set current project to main (in build file:/home/paul/lucid/main/)
[info] Updating {file:/home/paul/lucid/main/}chart-web...
[info] Done updating.
[info] Compiling 688 Scala sources and 13 Java sources to /home/paul/lucid/main/chart-web/target/scala-2.11/classes...
[success] Total time: 141 s, completed Jun 15, 2016 10:29:56 AM
[success] Total time: 1 s, completed Jun 15, 2016 10:29:57 AM
[info] Updating {file:/home/paul/lucid/main/}chart-web...
[info] Done updating.
[info] Compiling 688 Scala sources and 13 Java sources to /home/paul/lucid/main/chart-web/target/scala-2.11/classes...
[success] Total time: 133 s, completed Jun 15, 2016 10:32:10 AM

So 133s for a build of this project. In contrast, most one-file changes (no clean) using sbt/zinc take 2-8s.

Granted, this project is quite large. But I can see that if I split evenly in two, I could expect 65s build times. If I split it in four, 32s, etc.

---

FYI I'm interested in not just Scala, but rather *all* incremental compilers.

For example, https://github.com/google/closure-compiler is one of Google's more mature open-source projects. It has *734* Java sources, all grouped together and built with ant's incremental compiler.

Whether it's Rust, Scala, Java, Haskell, whatever, it'd be nice to figure out a solution that will make Bazel work well with most real-world projects.

pauld...@gmail.com

unread,
Dec 18, 2016, 11:30:22 PM12/18/16
to bazel-discuss, ia...@stripe.com, oscar....@gmail.com, phi...@google.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
Apologies for dredging this up, but I wanted to get clarification on two points

> We have a well-defined API to support incremental compilers.

What is this API?

> For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler, and we haven't encountered any correctness issues recently. However, back when I experimented with it, Javac was using second-precision timestamps to detect file changes, even on file systems that actually support sub-second timestamps, and this was causing issues in some scenarios. Before enabling incremental Java compilation by default, it would be good to check whether that specific issue was fixed or change the Java compiler to use an external source of change notifications, e.g., Bazel passes file content checksums to the worker which could be used instead of timestamps.

Everything I read says that Bazel has no Java incremental compilation, beyond individual libraries.

https://groups.google.com/forum/#!searchin/bazel-discuss/incremental$20java%7Csort:relevance/bazel-discuss/TVAJeoL8xmI/gM2yf3PEBAAJ
https://groups.google.com/forum/#!searchin/bazel-discuss/incremental$20java|sort:relevance/bazel-discuss/qr2wbm6pitg/Kk1xH6WXEwAJ

What's the story?

Philipp Wollermann

unread,
Dec 19, 2016, 12:49:24 AM12/19/16
to pauld...@gmail.com, bazel-discuss, ulf...@google.com, ia...@stripe.com, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
+Ulf Adams Can you confirm what you wrote in an earlier mail on this thread about incremental compilers?

I'm not aware of any such support myself, besides allowing persistent workers to keep state in memory between work requests. This is proven to help with performance, for example by not having to parse unchanged input files over and over again.

The more traditional approach of feeding the output files of the last run of an action as additional inputs to the next run is something that AFAIK Bazel doesn't support.

There have been requests for that in the last years, but we've always been wary of adding this feature. We once had something like that for incremental linking, I think, but it was removed because it made the code very complex, hard to reason about correctness and didn't help much in the end.

--

You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/3f48734e-19d2-43aa-8a20-1e887ab47ca0%40googlegroups.com.

Ulf Adams

unread,
Dec 20, 2016, 4:07:31 AM12/20/16
to Philipp Wollermann, pauld...@gmail.com, bazel-discuss, ia...@stripe.com, P. Oscar Boykin, Kamal Marhubi, Janak Ramakrishnan, Michael Staib, Paul Draper
On Mon, Dec 19, 2016 at 6:49 AM, 'Philipp Wollermann' via bazel-discuss <bazel-...@googlegroups.com> wrote:
+Ulf Adams Can you confirm what you wrote in an earlier mail on this thread about incremental compilers?

I'm not aware of any such support myself, besides allowing persistent workers to keep state in memory between work requests. This is proven to help with performance, for example by not having to parse unchanged input files over and over again.

The more traditional approach of feeding the output files of the last run of an action as additional inputs to the next run is something that AFAIK Bazel doesn't support.

There have been requests for that in the last years, but we've always been wary of adding this feature. We once had something like that for incremental linking, I think, but it was removed because it made the code very complex, hard to reason about correctness and didn't help much in the end.

Apologies for the terminological confusion. The original prototype of Javac workers that I worked on was checking timestamps of source files, and I think it was caching parsed source files (ASTs?) as well, but it's possible that I misunderstood what Javac was doing internally or that that part got lost in the productionization of the original prototype.

In any case, workers are not sandboxed right now (right?), so they can technically squirrel files off to the side if they wanted to. Now, since we want to sandbox workers, maybe we should make that more explicit - one way or another?

That said: What are we trying to achieve here? Good performance scala builds? If that's the goal, then we should look at that more specifically to see what exactly is needed to make scala builds perform well, and - in particular - set up at least one benchmark that we can measure and play with to see what the impact of the various options is. Ultimately, if making scala compile fast requires making certain assumptions more explicit, or even providing a new API, then I'm all for it.

However, I'm against trying to come up with a framework if we haven't even looked at a specific use case or three (three would be better), or providing APIs without first collecting actual data that demonstrates the need for said APIs.
 

On Mon, Dec 19, 2016 at 5:30 AM <pauld...@gmail.com> wrote:
Apologies for dredging this up, but I wanted to get clarification on two points

> We have a well-defined API to support incremental compilers.

What is this API?

> For example, we've done (hundreds of?) thousands of builds with the incremental Java compiler, and we haven't encountered any correctness issues recently. However, back when I experimented with it, Javac was using second-precision timestamps to detect file changes, even on file systems that actually support sub-second timestamps, and this was causing issues in some scenarios. Before enabling incremental Java compilation by default, it would be good to check whether that specific issue was fixed or change the Java compiler to use an external source of change notifications, e.g., Bazel passes file content checksums to the worker which could be used instead of timestamps.

Everything I read says that Bazel has no Java incremental compilation, beyond individual libraries.

https://groups.google.com/forum/#!searchin/bazel-discuss/incremental$20java%7Csort:relevance/bazel-discuss/TVAJeoL8xmI/gM2yf3PEBAAJ
https://groups.google.com/forum/#!searchin/bazel-discuss/incremental$20java|sort:relevance/bazel-discuss/qr2wbm6pitg/Kk1xH6WXEwAJ

What's the story?

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CA%2BAhZoi-LQ7hBFEC8Zk08vgnAMrh%3D3ki4rNpGdSMmydwHVkRJg%40mail.gmail.com.

pauld...@gmail.com

unread,
Dec 22, 2016, 3:19:23 AM12/22/16
to bazel-discuss, phi...@google.com, pauld...@gmail.com, ia...@stripe.com, oscar....@gmail.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
> In any case, workers are not sandboxed right now (right?), so they can technically squirrel files off to the side if they wanted to. Now, since we want to sandbox workers, maybe we should make that more explicit - one way or another?

My current plan, FYI, was to disable sandboxing, run zinc (https://github.com/typesafehub/zinc), and squirrel away files. (Or I guess I could keep sandboxing and keep memory.)

> That said: What are we trying to achieve here? Good performance scala builds?

Yes. Specifically, good performance when after making small changes and building locally -- i.e. the developer edit-compile-test cycle. CI builds can take longer; don't care.

> If that's the goal, then we should look at that more specifically to see what exactly is needed to make scala builds perform well

FYI, there are two major projects to improve over scalac performance: fsc and zinc. fsc runs as a daemon and caches stuff (not sure about these details). zinc monitors file/class dependencies during compilation writes the details to files. The latter has a number of sophisticated optimizations like optimistic compilation (recompile for just the known changes, possibly adding to that set and starting again) or public interface hashing (I think Buck does this too). It is the choice of sbt, the most popular build tool for Scala projects.

Martin Odersky, the creator of Scala and scalac provides some insights about Scala compile times: http://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed/3612212#3612212

> and - in particular - set up at least one benchmark that we can measure and play with to see what the impact of the various options is. Ultimately, if making scala compile fast requires making certain assumptions more explicit, or even providing a new API, then I'm all for it.

> However, I'm against trying to come up with a framework if we haven't even looked at a specific use case or three (three would be better), or providing APIs without first collecting actual data that demonstrates the need for said APIs.

Very fair.

I could help here, given some more detail. A project that comes to mind is the Play framework (https://github.com/playframework/playframework). It's the most popular Scala web framework and includes Java support as well. It's already organized into a couple dozen projects and builds with sbt.

Are you looking to see what happens if we use bazel on this project? If we change one line, how long compilation takes?

---

While concrete example are invaluable, I want to point out that this isn't an issue unique to Scala. Currently, Bazel has very good support for Google's languages, like Go, Java, C/C++. These either compile rather quickly (Go, Java) or have easily-determined per-file relationships (C/C++).

There are languages like Scala and Rust (and maybe Swift?) that take much longer to compile, mostly because of type inference. These already have stateful compilers that could greatly improve the usability of Bazel.

P. Oscar Boykin

unread,
Dec 22, 2016, 12:00:54 PM12/22/16
to pauld...@gmail.com, bazel-discuss, phi...@google.com, ia...@stripe.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
The current scala rules do support the worker strategy, so performance should be as good as fsc ever was.

Zinc does have correctness issues. Pants uses zinc and at Twitter we had to occasionally clear all the cached state. This was really confusing for junior developers because either on each failed compilation you clear the cache to make sure, or you have good intuition about what smells like a cache bug. I'm not anxious to return to that strategy.

The current scala rules also do interface hashing like you mention. The problem is that scala adds quite a lot of metadata about scala types to what is considered the interface. One major thing it adds is inlining information which includes the method length. This means almost any change will invalidate that.

Secondly, things that are "private" to scala are often not private as far as java is concerned, so changes to those things still show up in the interface.

At Stripe one approach that gives good performance is to basically have one target per file and use workers. This gives less than 1 sec/small change in most cases. A second trick is to pass scalac the option to not include scala inline information and just let the jvm handle that.

-Yskip-inline-info-attribute

This makes many more changes not change the public interface.

If I were going to work on faster builds for scala for bazel I would work on three things:

0) build a zinc rules that could use zinc to automatically create minimal build files for a directory. In this way, the user only manages he dependencies of one directory but then periodically regenerates static targets that are minimized. This may make the tiny build target approach more palatable.

1) a better implementation of interface hashing that understands more scala and can possibly ignore more changes than using the default ijar. This may be able to be a change to ijar so we don't need a separate tool, or it may be a fork.

2) a PR for bazel so builds are not a function from source to output, but also pass an optional state:

(Source, Option[State]) => (Output, State)
We would reset the state anytime the dependencies changed, but not after a source change. This would allow us to use zinc or something like it but have bazel ensure that the State is only local to a particular view of dependencies but trust the compiler to get it right monitoring the set of source files.

A side note, one my colleagues wrote an except blog post about to some proof of concept work he has been doing on making a faster scala compiler:

https://medium.com/@gkossakowski/kentucky-mule-limits-of-scala-typechecking-speed-6a44bd520a2f#.70m4lwoiy

His work suggests there could still be a lot of fruit in optimizing scalac (though possibly through some serious refactors).

pauld...@gmail.com

unread,
Dec 22, 2016, 2:54:45 PM12/22/16
to bazel-discuss, pauld...@gmail.com, phi...@google.com, ia...@stripe.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
> 0) build a zinc rules that could use zinc to automatically create minimal build files for a directory. In this way, the user only manages he dependencies of one directory but then periodically regenerates static targets that are minimized. This may make the tiny build target approach more palatable.

This requires that you generate rules without cyclical dependencies, no? And the same with your approach at Stripe?

P. Oscar Boykin

unread,
Dec 22, 2016, 8:30:05 PM12/22/16
to pauld...@gmail.com, bazel-discuss, phi...@google.com, ia...@stripe.com, ka...@marhubi.com, jan...@google.com, mst...@google.com, pa...@lucidchart.com
Yes. Everything with a cyclic dependency must be in a single target.
On Thu, Dec 22, 2016 at 09:54 <pauld...@gmail.com> wrote:
> 0) build a zinc rules that could use zinc to automatically create minimal build files for a directory. In this way, the user only manages he dependencies of one directory but then periodically regenerates static targets that are minimized. This may make the tiny build target approach more palatable.

This requires that you generate rules without cyclical dependencies, no? And the same with your approach at Stripe?

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/3iUy5jxS3S0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/8473efcd-25da-44bb-af1f-04100ab0994c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages