Implementing a Distributed Pool Architecture within inspyred

84 views
Skip to first unread message

Andrew Dunn

unread,
Jun 23, 2014, 1:23:08 PM6/23/14
to insp...@googlegroups.com
Greetings! Thank you for the high quality documentation (and examples) with this library.

I'm looking to implement a 'distributed pool architecture' [1], [2] using inspyred. This post is mainly to seek advice as to where I should begin with the implementation, and to head off potentially conflicting design assumptions made within inspyred that would be ultimately limiting.

From what I understand, the 'distributed pool architecture' would require me to create the initial population and fitness values external to the EA. Then multiple GA processes would interact with the population pool, sampling a set of the population for [selection, crossover, mutation] and evaluation. Eventually writing back a subset of the new population to the pool based on some replacement rules.

With inspyred I wanted to write out how I think I'll accomplish this, with hopes for some critique.

- Create an evaluator function to wrap my simulation
- Create a generator function that samples the pool
- Create a wrapper for the EA to create instances of the EA until termination

The wrapper for the EA is necessary because the EA will only be interacting with a subset of the population from the pool and will only run for a limited number of generations before terminating and returning candidites back to the pool. This wrapper will have to have its own termination criteria, where as the EA within the wrapper would be set to terminate after a specific amount of generations. Upon internal EA termination, the wrapper would submit individuals back to the pool to overwrite (based on replacement criteria) individuals with less fitness. Then the Wrapper begins again: generator(sample from pool), EA, writeback.

If there are aspects of inspyred that I'm overlooking in this initial concept please let me know.

Reference:
[1] Deb, K., Zope, P., & Jain, A. (2003). Distributed computing of pareto-optimal solutions with evolutionary algorithms. Evolutionary Multi-Criterion Optimization, 534–549. Retrieved from http://link.springer.com/chapter/10.1007/3-540-36970-8_38

[2] Roy, G., Lee, H., Welch, J. L., Zhao, Y., Pandey, V., & Thurston, D. (2009). A Distributed Pool Architecture for Genetic Algorithms. … , 2009. CEC’09. IEEE …, 1177–1184. doi:10.1109/CEC.2009.4983079

Aaron Garrett

unread,
Jun 23, 2014, 1:29:46 PM6/23/14
to insp...@googlegroups.com
Sure. I think you can do all of that here. The specifics might take a little thought, but the micro-EC would probably be a good starting point for ideas (http://inspyred.github.io/recipes.html#micro-evolutionary-computation). You could probably allow the top-level wrapping EC to use its archive to be the pool. And you could draw from the archive whenever you create the low-level EC and pass the initial population in as seeds. Yeah, I think all of that would fit together without any issues.
Let me know if you have any questions.



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364



--
You received this message because you are subscribed to the Google Groups "Inspyred" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inspyred+u...@googlegroups.com.
To post to this group, send email to insp...@googlegroups.com.
Visit this group at http://groups.google.com/group/inspyred.
For more options, visit https://groups.google.com/d/optout.

Andrew Dunn

unread,
Jun 24, 2014, 4:12:59 PM6/24/14
to insp...@googlegroups.com
I'm a bit hung up on one idea so far. The ea takes in a generator function, which I assume it uses to initialize the first population, and derivative populations.

I believe that it will be a lot easier for me to pass the ea a population that has already been created (and assigned fitness values), then let it continue with the generator function.

Is there a way to start the ea with a population that has already has the fitness assigned?

Aaron Garrett

unread,
Jun 24, 2014, 4:16:08 PM6/24/14
to insp...@googlegroups.com
Maybe someone else sees something, but I don't think there's any clean way to do that. 



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364


Andrew Dunn

unread,
Jun 24, 2014, 4:41:13 PM6/24/14
to insp...@googlegroups.com
After digging a bit further it looks like you've got code (inspyred/inspyred/ec/ec.py) starting at line #422 that initializes the first population.

I see that I am going to run up against a key assumption for how the EvolutionaryComputation class was developed. I've got to think through this, but I think that I might have to (for proof of concept) clone and modify the code to remove this assumption.

For a distributed pool implementation the ea will be 'working' for a couple generations on a set of Individuals that have already been initialized with both a genotype and phenotype. I'll have several 'worker' processes sampling from the pool population and feeding instances of the ea with the individuals to evolve further. I think that I can fake out the generator by writing my own generation function that basically does a select from the pool, however the evolve function assumes (and likely rightly so) that it needs to compute phenotype/fitness for the individuals that were generated.

Any brainstorming would be appreciated! What you've got here is really awesome, I don't imagine you could have thought of this use case ahead of time (or maybe you did and it's an in-ideal method for approaching ea implementation).

Aaron Garrett

unread,
Jun 24, 2014, 4:51:11 PM6/24/14
to insp...@googlegroups.com
Let me think about your problem, and I'll bet I can come up with a workable solution. Can you give me the broad strokes of what you're trying to do exactly?



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364


Aaron Garrett

unread,
Jun 25, 2014, 1:15:48 AM6/25/14
to insp...@googlegroups.com
OK. I've worked on this for a few hours now, and I tap out. I think I'm close to a solution, but dealing with the multiprocessing library is a pain when things won't pickle. And I've lost the desire to keep fighting the battle tonight. I'm attaching the code that I have so far. Maybe it will give you some ideas. I'll admit that this is not particularly clean. What you are wanting to do seems to be feasible, but some things in the library have to be re-fitted to make it work.



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364


distpool.py

Aaron Garrett

unread,
Jun 25, 2014, 2:35:23 AM6/25/14
to insp...@googlegroups.com
I retract my previous tap-out. I just can't let things like this go. So HERE is something that (I think) works like you describe and looks MUCH cleaner than that crapbasket I churned out a little while ago. This is the better way to think of what's happening, as far as inspyred goes. Hopefully you can follow what's going on in the code. Let me know if you have questions. (No guarantees that this is bug-free; it's very late.)



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364


distpool.py

Andrew G. Dunn

unread,
Jun 25, 2014, 10:01:41 AM6/25/14
to insp...@googlegroups.com
This is a bit of a bummer. I typed up a long response and submitted.... which never showed up. Hopefully both posts don't come through because there might be an embarrassing amount of discrepancies between the two.

Thanks a ton for this code, I'm glad that I got you caught with the bug. I have not spent too much time with it yet, I plan to after I finish writing this. I wanted to communicate the idea more clearly. I lost power last night, so apologies this isn't in some fancy graphviz chart. Please reference the attached photo for the following (I had to resize to 2000px max which was about 45% of the original photo size, if you want the original I can email directly).

Assumptions:
 - eventually this will need to run in a multi-node HPC environment (socket/mpi)

Design:
---------------------------------------------------------
Pool:
 - networked (socket/mpi)
 - stores the Individuals
 - initializes Individuals (has a generation function)
 - is queried by Worker(s), picks Individuals from population based on how many the worker wants (picks Individuals uniformly at random)
 - is submitted to by Worker(s), overwrite Individuals based on their fitness
 - responsible for termination criteria

Worker:
 - networked (socket/mpi)
 - has three modes of operation
   1) initial fitness (the pool uses the workers to assign fitness to initial population)
   2) EvolutionaryComputing (asks Pool for some candidates, selects, mutates, crossovers, evaluates (via pp or mp), ranks, returns*)(multiple generations, configurable) *returns the same amount of candidates that it was given
   3) termination (terminates based on message from Pool, or a timeout of communication for a preset time)
 ---------------------------------------------------------

Implementation:
I forked the existing code and I think that I need to do the following:
 - Write the Pool (there isn't a concept like this already (unless you've answered that with the previous attachment))
 - Write the Worker
 - Modify Individual (needs to have a unique identifier added, we'll be pickling back and forth so it won't be an instance of the same object when it returns from the worker)
 - Create a new DistributedEvolutionaryComputation based on EvolutionaryComputation (I think I'll just add a config option to the args dictionary that will bypass the initial generation/fitness)

Caveats:
 - I'll commit often during the development so, if there comes a time where you feel this could be brought into inspyred, the pull requests will be easier.
 - I'm going to start with socket based communication and pickle back and forth (because I can't test mpi on my laptop).


Any suggestions would be incredible!
IMG_20140625_085219.resized.jpg

Andrew G. Dunn

unread,
Jul 2, 2014, 12:01:48 PM7/2/14
to insp...@googlegroups.com
To do a distributed pool some aspects of the EA need to run at the pool level also. Notably the generator (initially) and the replacer. The 'Workers' in this pool are taking a sample of the population and returning back a list of individuals (after some termination criteria). That returned list (from the worker) needs to be incorporated back into the pool population using a replacement strategy.

I'm doing a multi-objective problem, with your nsgaII implementation as a reference. I've read through the nsga_replacement function and I think I know how it works, but I wanted to clarify some of my assumptions:
 - I can pass the nsga_replacement a list of parents and offspring and it will return the survivors (fittest)
 - I would actually pass the parents as population and pass an empty list as parents (as parents isn't used in the implementation)
 - nsga_replacement expects the Individual.fitness to be a list


P.S:
I really like the design principles behind inspyred, so I'm uncomfortable with how I'm mucking them up to get this concept done. Once completed I'd really like to re-consider how I'm mixing the pool and ea responsibilities.

Aaron Garrett

unread,
Jul 2, 2014, 12:52:04 PM7/2/14
to insp...@googlegroups.com
I'm not sure I understand. The most recent code that I sent was doing most of that, I thought. 

1. It creates a pool of N individuals and evaluates them to determine their fitness.
2. A pool of W workers is created, each with population size P, such that W*P = N.
3. The workers then begin their independent evolutions. Each generation, the following happens:
    a. A sample of P individuals is drawn from the pool, and then that sample has tournament selection applied to it to give P parents (very likely to have duplicates of the better individuals).
    b. Those parents undergo uniform crossover and Gaussian mutation to produce P offspring.
    c. Those offspring are evaluated (if they haven't been evaluated already), and then they replace the parents using generational replacement..
    d. The P individuals remaining are then put back into the pool using the archiver, which inserts individuals from worker w into the pool at locations w*P through (w+1)*P-1.

Is that not what you've described?



Aaron Garrett, Ph.D.
Assistant Professor
Computer Science
131 Ayers Hall
Jacksonville State University
Jacksonville, AL  36265
agar...@jsu.edu
256-782-5364


arthur....@gmail.com

unread,
Nov 3, 2016, 1:41:31 PM11/3/16
to Inspyred
Hi,

Do you still the above code? I'm trying to run the fitness evaluations with mpi on a cluster, and could really use some example on how to do this. Thanks a lot!

Avramiea Arthur

unread,
Nov 4, 2016, 2:01:38 PM11/4/16
to Inspyred
Fixed using python package schwimmbad for pooling :).

Mark Coletti

unread,
Oct 10, 2017, 10:09:10 AM10/10/17
to Inspyred
On Friday, November 4, 2016 at 2:01:38 PM UTC-4, Arthur-Ervin Avramiea wrote:
Fixed using python package schwimmbad for pooling :).

I'm in the same boat as the original poster: I'd like to use inspyred in a supercomputing context, which means implementing an asynchronous steady-state EA using MPI. The earlier mailing list traffic gave some sample code that may have addressed that; has that since been vetted and integrated into inspyred?

I'm also curious about the use of schwimmbad for pooling.  Arthur-Ervin, did you do a pull request with your changes?  Or, otherwise have your changes available for sharing?

Cheers!

Mark Coletti
-- 
Reply all
Reply to author
Forward
0 new messages