pluto, a build system inspired by Shake

134 views
Skip to first unread message

Franklin Chen

unread,
Nov 13, 2015, 1:29:31 PM11/13/15
to Shake build system
I'm a Shake user. I thought I'd post a link to a paper about Pluto, a new build system inspired by Shake, for those who are interested in build systems: http://www.informatik.uni-marburg.de/~seba/publications/pluto-incremental-build.pdf

Luke

unread,
Dec 2, 2015, 5:20:08 PM12/2/15
to Shake build system
Interesting concept. Thanks for posting. Something I noticed though is that Shake is listed as only supporting timestamping. Shake in fact supports hashing and file-existence as well (I think the same as the provided stampers from pluto). I'm not sure if Shake allows for custom defined stampers though which really is the deeper pluto concept it seems.

Neil Mitchell

unread,
Dec 3, 2015, 4:49:25 PM12/3/15
to Luke, Shake build system
Thanks for the link. I had been reading that paper already, but only
got halfway through it, and was going to share my thoughts once I
finished it. That's unlikely to happen for a few weeks, so I'll share
some informal half-written notes here and do the full thing properly
later. As these are half-written notes, they should not be used to
cause offense, or for serious decision making, only for discussion
starting points.

Firstly, it's good to see people thinking about build systems and
writing papers, that way leads to better theory, and I'd love to steal
anything good from the paper for Shake :). Alas, I didn't find
anything that looks useful and suitable yet, but there are interesting
ideas.

By default Shake's file rule does timestamping, timestamping and
hashes (with efficiency gains if the timestamp hasn't changed), hashes
only on input files. Note that their comments are correct when the
Shake paper was written, but that did say you could add hashing
trivially on the outside, so somewhat accurate. Shake is
parameterisable (the file rule could be written on the outside), so
you can have any strategy you like. In practice, once you have
timestamping and unchanging files (so you can run a rule and not
change the timestamp and stop the build) that is equivalent in power
to having any hashing/equality scheme you like, and depend on a stamp
file instead of the real input. Consequently, the fact Shake has
hashing built in is just a UI convenience, but a useful one. Once I
generalise string rules
(https://github.com/ndmitchell/shake/issues/192), you'll be able to
more easily declare your own rules which work with the traditional
need/want style and have a different equality - but I doubt anyone
will want to - a stamp/hash file is simpler and more convenient.

They discuss at length that their build system automatically rebuilds
if the rules change, but in my opinion that's inaccurate. If you put
each Shake rule in a separate file, and disallowed all abstraction
between rules, then Shake could have that property with the same
limitations/benefits (just need the filename at the top of the rule).
In fact, I follow that strategy in key places where it seems
worthwhile (https://github.com/ndmitchell/shake/blob/master/src/Test/Docs.hs#L42).
The redo build system also takes that approach. I consider this to be
useless in practice - eliminating shared abstractions and segregating
rules destroys the build system to gain some benefit at even greater
cost. It treats whitespace changes in the rules as rule breaking
changes, which is inevitable, and horrible. It's basically a gentle
refinement over depending on the whole Makefile in every rule. The
fact it is automatic brings some benefit.

In contrast, Ninja, Fabricate and Tup all get auto rebuilding exactly
right. If the build changes in a meaningful way it will identify the
subset that changed, after seeing through all abstraction, and then
rebuild the right pieces. The trade off for Ninja and Tup is that all
actions must be single line shell commands. The trade off for
Fabricate is that it's a forward build system, unlike every other
build system you know, with certain conceptual trade offs (well,
that's not true if you include the Forward build mode in Shake HEAD -
but I'll talk about that after the release ;) - for info Shake Forward
mode will have auto rebuilding as required). Note that while Make
doesn't track the build system, but that's an implementation issue -
it certainly could.

I think S3.1 indicates why stateful languages are problematic -
essentially they recreate build rules afresh each file so they don't
inadvertently mutate state. I'm glad not to have that problem!

Their build rules seem tied to files, which is a shame, since with
their parametric Stamper they could have generalised it. Having
doesFileExist/getDirectoryFiles be tracked is convenient. Alas, this
lack of tracking means that their build system doesn't really
guarantee clean builds.

Their Latex example shows that it can track dependencies that are only
discovered later - but they don't mention this is equivalent to needed
(I just used a file that cannot be a build system generated file), vs
need (build it now before I use it). I suspect this means their Latex
rules aren't generally correct, and thus aren't compositional.

Their notion of fixed points in the build system is interesting. As
far as I can tell, only Latex requires this crazy behaviour, and even
then, there's no guarantee that Latex reaches a fixed point. Latex is
uniquely weird in many respects, not least that by default error
messages ask you on stdin for the fix. I think emulating this outside
the build system is much simpler, see
http://stackoverflow.com/questions/14622169/how-to-write-fixed-point-build-rules-in-shake-e-g-latex.
However, in practice for the two build systems I've written that build
Latex, one runs it 3 times always (or 4 if there is a bibtex), and the
other runs a custom thing in MikTex that just does it the right number
of times but appears as a single command. If anyone has a compelling
example of baking it inside the build system, I'd be interested.

I couldn't tell if they have unchanging files. It's not mentioned.

I challenge their claim about being the only precise Latex builder,
since you can replicate the same heuristics in Shake quite easily, and
they are still heuristics - they are "seeing inside" Bibtex files, but
ignoring things like irrelevant whitespace./command changes in .tex
files. It's a case of how far you want to go, so I don't think there
is any notion of precise, once you start opening the black box that is
bibtex/latex processes.

They are single threaded. Ouch. I assume that's an implementation
limitation, since I can't see anything in their model that
necessitates it. Of course, doing multithreaded in Haskell is
embarrassingly easy, sometimes I feel like it's cheating.

In summary, the things it adds over Shake are:

* Custom timestamps, but Shake already supports that, albeit with a
more complex user interface if you genuinely want to do different
stamps per rule.

* Metadependencies, for the dependency on build rules. A definite
weakness of Shake, and one I suggest is tackled mostly with Metadata
(http://ndmitchell.com/downloads/slides-defining_your_own_build_system_with_shake-09_oct_2015.pdf).
I don't think their solution is the right one. I think the Fabricate
one is way nicer, which is why I'm adding it to Shake.

* Cycles, not convinced it's useful to build in, or that their one is
safe, but it can be encoded outside if you want.

Thanks, Neil
Reply all
Reply to author
Forward
0 new messages