I've put together a pretty nice little symbolic regression tool using DEAP 1.1.0, so thanks guys.
It uses a paretro front, the two objectives being "fitness" and "tree size." To try to keep bloat from overwhelming everything.
I recently added demes over MPI, so I use multiprocessing to parallelize evaluation, but each deme has it's own MPI process (I can't use SCOOP on my cluster). However, this made me notice something, I noticed that I was getting a lot of duplicates in the migrating population. Sometimes every single migrant was the same function.
So I investigated the populations, and I found that, indeed, there were a lot of duplicates in the populations. With NGSA-II, half the individuals were sqrt(X), which was the best current match for the function. So when the best migrants were selected, they were ALL sqrt(X). SPEA2 did a better job of maintaining diversity, but there were still a lot of duplicates. (Maybe 10% were sqrt(X) at the same time in the computation.)
Does anyone have a suggestion for controlling this problem? I can think of a two of options:
1. Check for duplicates in every generation, and replace them with new random expressions.
2. Use the fortin2013 NSGA-II version.
https://github.com/DEAP/experimental/blob/master/fortin2013/fortin2013.pyThanks,
Jim