How to batch expensive commands with shake?

35 views
Skip to first unread message

ericw...@gmail.com

unread,
Sep 28, 2017, 3:11:55 PM9/28/17
to Shake build system
Hi!

I am new to shake and haskell.

I have a bunch of source files, around 2000 and each is used, together with some other dependencies, to produce a result each.

To process a single file, I can run one command, for example

process -O outdir a1.in

and it will produce a1.out in the directory outdir.

The command `process` is somewhat expensive, but can be batched.

By this I mean, I can run

process -O outdir <filelist>

to process several files at once and it will be considerably faster,
then starting `process` several times.

I now would like to batch all files together, which might need rebuilding, as I know that most of the time, all results will be needed in a build.

I thought about using `&?>` using a function that maps each aN.out to [ "a1.out" .. "a2000.out" ] and an action that calls need [ "a1.in" .. "a2000.in" ] but then I can only run all input files through `process` which is unnecessary long.

Am I trying something stupid?

Is there a version of `need` which returns a list of those dependencies, which had been considered "out-of-date" (rule for dependency run or source file has a different time stamp since last run), so I can implement my approach above? (Maybe using some own Ruletype, which uses something faster than a list.)

Yours sincerely,

Eric

Neil Mitchell

unread,
Oct 1, 2017, 6:08:58 PM10/1/17
to ericw...@gmail.com, Shake build system
Hi Eric,

For a new Haskell/Shake user you seem to have very quickly covered a
lot of ground that took everyone else a long time to get to! This
particular problem has been discussed on the issue tracker:

* https://github.com/ndmitchell/shake/issues/502
* https://github.com/ndmitchell/shake/issues/502#issuecomment-267415011
- where we discuss bulk updates
* https://github.com/ndmitchell/shake/issues/502#issuecomment-269541442
- where we basically suggest the version of need you describe

There's nothing available yet, but I'm hoping to get to it relatively soon.

Thanks, Neil

ericw...@gmail.com

unread,
Oct 2, 2017, 4:19:47 AM10/2/17
to Shake build system
Hi Neil,

after some additional thought, it appears to me, this is more
complicated, then I first thought.

The needHasChanged (to borrow the term from the issue) is only
one part of it. I did a prototypical implementation of it for
source files with a custom builtin rule type. But it should
be not that complicated, to add it shake for File Rules in
general. (My route of implementation would be:
1. Add a new Ruletype FileHasChangedQ -> FileHasChangedA
2. In its builtin rule, delegate most of the work to
ruleLint and ruleRun from Development.Shake.Internal.Rules.File
and analyse the result of ruleRun.
3. Construct FileHasChangedA from the resulting FileA and the
RunChanged value.
4. Provide the API-Call needHasChanged for it and export that.

If you want to consider it, I could try to implement that and send
you a pull request. But I am a haskell beginner, so you would
certainly need to go over it.

For the specific use case, of some batch process a second part is needed.
The user rule implementing the batch process needs to know not only
which dependencies are considered out of date, but also, which targets
are considered inconsistent and need to be rebuild too. In my opinion
shakes property of checking for messed with targets is a pretty nice
one and I would not like to give up on it. (And in my use case it is not
practical to check for this myself, as that would amount to rebuilding
each target and diff the results.)

I would suggest some variant of the Files Rule here, where the user
rule gets as a parameter the list of inconsistent targets too,
if that is possible. (I will try to prototype that, too, when I have
the time for it. Probably tomorrow, as it is a german holiday, or the
upcoming weekend.)

Yours sincerely,

Eric

PS.: Is the google group the right place for such a discussion or should
I open a ticket and bring the discussion to the appropriate ticket comments?

PSS.: Thanks for your answer to my stackoverflow question. (I am Krom there).

Krom

unread,
Oct 6, 2017, 1:35:05 AM10/6/17
to Shake build system
Hi!

With regard to needHasChanged I reconsidered and think that it is probably easiest to implement it directly in the exisiting File.hs and simply throw away the extra information as needed. For example in "need" or "needBS".

Yours sincerely,

Eric

Neil Mitchell

unread,
Oct 14, 2017, 5:49:00 PM10/14/17
to Krom, Shake build system
Hi Eric,

Thanks for the PR that you provided in https://github.com/ndmitchell/shake/pull/539

I met Eric at the Haskell eXchange yesterday and we came up with a design for a function that might provide the necessary primitives - named batch. That's now in the repo, and has the docs/signature:

-- | Batch different outputs into a single 'Action', typically useful when a command has a high
-- startup cost - e.g. @apt-get install foo bar baz@ is a lot cheaper than three separate
-- calls to @apt-get install@. As an example, if we have a standard build rule:
--
-- @
-- \"*.out\" 'Development.Shake.%>' \\out -> do
-- 'Development.Shake.need' [out '-<.>' \"in\"]
-- 'Development.Shake.cmd' "build-multiple" [out '-<.>' \"in\"]
-- @
--
-- Assuming that @build-multiple@ can compile multiple files in a single run,
-- and that the cost of doing so is a lot less than running each individually,
-- we can write:
--
-- @
-- 'batch' 3 (\"*.out\" 'Development.Shake.%>')
-- (\\out -> do 'Development.Shake.need' [out '-<.>' \"in\"]; return out)
-- (\\outs -> 'Development.Shake.cmd' "build-multiple" [out '-<.>' \"in\" | out \<- outs])
-- @
--
-- In constrast to the normal call, we have specified a maximum batch size of 3,
-- an action to run on each output individually (typically all the 'need' dependencies),
-- and an action that runs on multiple files at once. If we were to require lots of
-- @*.out@ files, they would typically be built in batches of 3.
--
-- If Shake ever has nothing else to do it will run batches before they are at the maximum,
-- so you may see much smaller batches, especially at high parallelism settings.
batch
:: Int
-> ((a -> Action ()) -> Rules ())
-> (a -> Action b)
-> ([b] -> Action ())
-> Rules ()

In general, if you're talking about concrete development stuff, go to the bug tracker. The mailing list is a useful catch-all for anyone who doesn't know what the next steps are, but it tends to be used more rarely.

Thanks, Neil
Reply all
Reply to author
Forward
0 new messages