When I used SCOOP/DEAP to externally calibrate/validate the CREST
hydrological model for a project at NASA Goddard, I started by modifying
the command line interface of CREST so that all the parameters were
exposed to the command line. I also wrote out the basic statistics to
standard out. This allowed me to use Popen to trigger the execution of
CREST in parallel, and read back the statistics (which can then were
evaluated by whatever algorithm you choose). This workload basically
turns CREST/FDS into a fitness function, and allowed me to run 100 or so
jobs in parallel.
BTW, there are some things you will want to know when doing something
like this. You probably want to put a sleep function in place and have
them spawn a second or 10 apart from each other. If you have 100's of
instances starting up and accessing the same file at a time, it can
thrash the disk and waste a lot of time. Also, do not assume that you
can run 100's or thousands of runs or population size. You will have to
play with both to figure out how to best manage computer resources, as
well as genetic convergence. You want to pick population sizes which
are big enough that it takes generations to converge, but not large
enough that it takes 1000's of generations to maybe converge. All of
that is part of the art of setting these simulations up.
Also, since in my case, the "evaluation function" took 1 to 4 hours to
run each one, I wrote the full parameter list + the output statistics to
a work in progress file. Before spawning the jobs, I also wrote out a
parameter list for the next test. That way, if I was in the middle of a
long run and the computer crashed, I did not lose 20 generations of runs
I did not have to start all over again from scratch (which might be as
much as 10,000 CPU hours at that point). So, I also wrote in a
"restart" in case it was interrupted...
Hope this helps.
EBo --