binaries with different package lists

Evan Laforge

unread,

Apr 28, 2016, 10:24:33 PM4/28/16

to Shake build system

I've been building my project with shake for a while. How it works is
I have multiple HsBinary definitions which just have a name and a main module.
It gets the transitive imports, converts them to *.hs.o and needs them, and
then links. The *.hs.o just runs ghc, with some extra wrinkles like looking
for a *.hsc and needing the .hs file if so, etc. Both compiling and linking
use a single global list of packages that is enough to make all the binaries
happy. A cabal rule uses the global packages list to emit a .cabal file, which
can then be used with cabal install --only-dependencies to get all the deps.
There are also rules to build haddock, collect *_test.hs and *_profile.hs
modules and build test binaries, etc.

This has all worked well enough for a long time, but I recently added some
binaries with some heavyweight package dependencies and I want to be able to
turn those off. It seems like this inevitably leads in the direction of
needing to declare library targets in addition to binary targets, because
now there isn't a one-size-fits-all package list. This seems to lead to a
fundamentally different design, in that I would have e.g.:

data Library = Library
{ libSrcs :: [FilePath]
, libPackages :: [Package]
, libDeps :: [Library]
} deriving (Show)

data Binary = Binary
{ binName :: FilePath
, binMain :: FilePath -- ^ main module, as a FilePath
, binDeps :: [Library] -- ^ link packages from these deps
}

Then instead of a single *.hs.o rule, I find the wanted Binary, then need its
libraries, which in turn make a specific o-rule for each of its sources, in
order to give it the right packages list. The global things like tests and
haddock would take the union of the declared sources and their packages for the
enabled binaries.

Previously I would just add a source file and it would be compiled as soon as
someone imported, but with this approach I'd now have to explicitly put that
source file in a Library target so I know what packages it needs. Of course I
can also infer that by looking at the imports, but it doesn't seem like such a
big convenience since I actually want to know via a build failure if a binary
imported the wrong thing and suddenly got a heavyweight new dependency.

I've been reluctant to take this step because it's just simpler (and maybe
faster?) to have a single o-rule and infer all dependencies. However, as far
as I can tell, the moment I want to not have a single global package list,
I'm basically forced into this approach, or a messier moral equivalent, like
inferring package list based on the path.

It's also not lost on me that if I then put those targets in separate files
I wind up with something much like blaze / bazel... which is a proven way to
scale at least. And on the nice side I'd perhaps wind up with a reusable core.

The other way would be the "recursive shakefile" with independent shakefiles
for each library target, but surely it's better to express dependencies in
haskell and shake rather than haskell and shake, then fork()/exec(), then
haskell and shake again.

But it also makes me think that surely someone else has used shake to build
something without a global package list, and if so, did they wind up with
the explicit Library target approach? Something else entirely?

Thanks!

Neil Mitchell

unread,

Apr 29, 2016, 9:11:24 AM4/29/16

to Evan Laforge, Shake build system

Hi Evan,

> It gets the transitive imports

How does it get the transitive imports? Does it `ghc -M` to produce a
Makefile? Or scan through looking for lines beginning with import?

> This has all worked well enough for a long time, but I recently added some
> binaries with some heavyweight package dependencies and I want to be able to
> turn those off. It seems like this inevitably leads in the direction of
> needing to declare library targets in addition to binary targets, because
> now there isn't a one-size-fits-all package list. This seems to lead to a
> fundamentally different design

Given it sounds like your existing design works well for you, I'd
consider what you can keep for now. The packages a file uses are
entirely driven by its imports. If you are using `ghc -M` there is a
flag -include-pkg-deps which will give you exactly what packages a
module imports. If you are using import scanning, you could have a
list of modules that imply the "heavy" modules - e.g. if something
imports Network.Wai.Handler.Warp it must depend on the warp package.

> Previously I would just add a source file and it would be compiled as soon as
> someone imported, but with this approach I'd now have to explicitly put that
> source file in a Library target so I know what packages it needs.

The imports precisely specify the metadata of what imports what and
what packages. If you have to list that twice you end up with lots of
boilerplate. I usually infer what I can in most situations, which
makes changes more localised.

> Of course I
> can also infer that by looking at the imports, but it doesn't seem like such a
> big convenience since I actually want to know via a build failure if a binary
> imported the wrong thing and suddenly got a heavyweight new dependency.

For each binary you could have a list of packages it uses, and then
use that explicitly when linking. That way if the individual files use
more they get compiled properly but they get a link failure.

> The other way would be the "recursive shakefile" with independent shakefiles
> for each library target, but surely it's better to express dependencies in
> haskell and shake rather than haskell and shake, then fork()/exec(), then
> haskell and shake again.

I wouldn't take that route, for all the reasons that Recursive Make is
Considered Harmful (see http://aegis.sourceforge.net/auug97.pdf). If
something can't see the dependencies it will inevitably go slower or
have the wrong parallelism.

> But it also makes me think that surely someone else has used shake to build
> something without a global package list, and if so, did they wind up with
> the explicit Library target approach? Something else entirely?

I've done this once. I went with the approach of each package has a
directory containing its source files, and there is metadata saying
which packages each package depends on. From looking up the directory
the source file is in you can construct the package list. It works
pretty well - no duplication, no unexpected dependencies.

Thanks, Neil

Evan Laforge

unread,

May 2, 2016, 9:03:48 PM5/2/16

to Neil Mitchell, Shake build system

Sorry about the late reply, I just wanted to make sure things would
actually work out.

On Fri, Apr 29, 2016 at 6:11 AM, Neil Mitchell <ndmit...@gmail.com> wrote:
> Hi Evan,
>
>> It gets the transitive imports
>
> How does it get the transitive imports? Does it `ghc -M` to produce a
> Makefile? Or scan through looking for lines beginning with import?

The latter. At the time I don't remember if I thought it would be
faster than ghc -M or just didn't know about ghc -M. Given a Map
ModuleName Package of course I can also get the packages required.
That map is available from ghc-pkg, though I might have to cache that,
and rebuild manually whenever the package situation changes. Though,
as I actually want the explicit list as discussed below, I don't think
I need to bother with all that.

> Given it sounds like your existing design works well for you, I'd
> consider what you can keep for now. The packages a file uses are
> entirely driven by its imports. If you are using `ghc -M` there is a
> flag -include-pkg-deps which will give you exactly what packages a
> module imports. If you are using import scanning, you could have a
> list of modules that imply the "heavy" modules - e.g. if something
> imports Network.Wai.Handler.Warp it must depend on the warp package.

Finding the dependents isn't the problem, it's that the o-rule needs
to know the packages. For that I wrote a inferPackages :: FilePath ->
[Package] which looks at the directory to figure out the packages.
This is what I was considering "morally equivalent" to a Library {
srcs = "A/**/*.hs", deps = [package1, package2, ...] }.

The problem is then about the "global" targets, e.g. haddock
generation. Instead of just **/*.hs, I need only the source files
that can be compiled given the installed packages (not sure why
haddock needs the package, but it does). This means start with the
enabled binaries and chase down their sources. Which I guess I can
also do perfectly well with the import chaser.

So, I implemented that, and it seems like it works, just with the
tradeoff that it's a lot slower than just getting **/*.hs. But if
libraries are explicitly declared then I can replace that with
A/**/*.hs or whatever.

So it still seems like an explicit Library type is the cleanest
approach. With filename globbing for the srcs list, it doesn't really
duplicate any information I wouldn't have to explicitly write anyway,
since no matter what I need a way to map a source file to a package
list. Or if I explicitly limit srcs to be a directory and contain all
*.hs files underneath, then I can generate haddock for or collect
tests from that library just with filename globbing.

Perhaps if it works out nicely I'll try to extract the generalized
bits into a reusable framework.

Reply all

Reply to author

Forward