I am working on an idea to use Shake as a system to manage automate a computational project (as part of my PhD research). The current "state-of-the-art" is to just write a huge bash script that simulates a bunch of stuff, and when something breaks, do a bunch of commenting, copy-pasting, and repeat. I am trying to move to a "descriptive" approach, where you describe how a particular thing is computed, and what it depends on, and shake figures out the rest.
For example:
I have to simulate 150 things based on model A0, which give me update information so that I can create a new model A1, and repeat all 150 (or not, depending on the update perhaps) to get A2, etc.
Each simulation has its own small dependencies, some are files, but some are previous simulations.
I've been digging around in the source code and have been thinking about writing a new type of "need" called needCompute, which looks at a DB and checks to see if a certain thing has been computed yet. In general, I want a "need" method that doesn't look for a file, but rather the status of a particular computation. I imagine storing the status in some sort of persistent file or database, but I'd love to try to reuse what is already in Shake.
So for example:
------------------------------------------------
want [NextModel]
NextModel %> \x -> do
-- Source 1, Source 2, etc are all simulations that need to be done
needCompute $ map Source [1..10]
-- updates model based on results from Source computations
updateModel
Source %> \src_id -> do
-- run simulation for src_id
system' "./run_src" [(show src_id)]
--------------------------------------------------
In this example, the (%>) function "completes" the (Source x) computations in the database/file, and when they are all completed, the "NextModel" computation can proceed.
I'm still thinking out a lot of the details here, especially when it comes to how the type abstractions can work in a general way for different types of computation -- Specifically how you would specify a type of computation (Source Int, Gradient [Source], ModelUpdate Gradient, etc), while trying to take advantage of the type system polymorphism.
cheers,
Max