Some questions about the fitness and selection functions

1,030 views
Skip to first unread message

Albert-Miquel Sánchez

unread,
Dec 2, 2013, 11:45:07 AM12/2/13
to deap-...@googlegroups.com
Hello,

I'm a new user to several subjects such as genetic algortihms, deap and python, so I have to apologize for really newbie questions. However, despite being new to all of this, I've been able to get into work quickly thanks to the deap tool and its documentation, which I really appreciate.

I'm trying to optimize a multi-objective problem as follows. I have a fortran function (called from python) that when I introduce three real numbers (three inputs) I get 11 real numbers (indeed, some of the are arrays), which are the solutions to the three inputs. In my problem I know the solutions and I want to find the three inputs. In my evaluation function I'm only performing a division between the eleven results I'm obtaining introducing the three inputs of the current individual into the Fortran function, with the desired ones, and I subtract one to each result. When the results are similar to the desired ones, the evaluation function returns eleven values close to zero, obtaining then the function I have to minimize. Otherwise they are more or less far from zero depending on the initial values. A general question about this is if this is a suitable fitness function or it should be improved by using a different and more sophisticated technique? I find it really simple and I'm not sure if it is good enough.

And two more questions more focused on deap. I started using a code similar to the one given in the One Max Problem example (only modifying the mutate function to mutGaussian), since it was easy to understand and begin with. However, I didn't get to obtain good results with it, even though I tried with several combinations of the variables CXPB, MUTPB, NGEN and MU, and also with mu, sigma and indpb of mutGaussian. After studying some examples given with deap, I modified the code, obtaining one similar to the example of NSGA2 (which main difference is the selection function), then with better results, but, from my point of view, not optimal ones. Since I've spent several hours with it, I would like to ask if anyone can help with the functions I need to modify in order to improve the results in a situation like the one I've described. Any comment or suggestion would be really helpful.

Thank you very much in advance!

François-Michel De Rainville

unread,
Dec 2, 2013, 12:02:38 PM12/2/13
to deap-...@googlegroups.com
2013/12/2 Albert-Miquel Sánchez <wuampa...@gmail.com>
Hello,

Hi and welcome to the DEAP usergroup
 

I'm a new user to several subjects such as genetic algortihms, deap and python, so I have to apologize for really newbie questions. However, despite being new to all of this, I've been able to get into work quickly thanks to the deap tool and its documentation, which I really appreciate.

Thanks, it is always nice to know that are work serves new people!
 

I'm trying to optimize a multi-objective problem as follows. I have a fortran function (called from python) that when I introduce three real numbers (three inputs) I get 11 real numbers (indeed, some of the are arrays), which are the solutions to the three inputs. In my problem I know the solutions and I want to find the three inputs. In my evaluation function I'm only performing a division between the eleven results I'm obtaining introducing the three inputs of the current individual into the Fortran function, with the desired ones, and I subtract one to each result. When the results are similar to the desired ones, the evaluation function returns eleven values close to zero, obtaining then the function I have to minimize. Otherwise they are more or less far from zero depending on the initial values. A general question about this is if this is a suitable fitness function or it should be improved by using a different and more sophisticated technique? I find it really simple and I'm not sure if it is good enough.

On that point, we will need a little more detail. Are your 11 values somehow anti-correlated? The common case for using mutliobjective in EA is when you have opposing goals like the value and weight objectives in the knapsack problem. It seems to me that your 11 values might be brought to a single one using a root-mean-squared which can help a lot your evolution.
 

And two more questions more focused on deap. I started using a code similar to the one given in the One Max Problem example (only modifying the mutate function to mutGaussian), since it was easy to understand and begin with. However, I didn't get to obtain good results with it, even though I tried with several combinations of the variables CXPB, MUTPB, NGEN and MU, and also with mu, sigma and indpb of mutGaussian. After studying some examples given with deap, I modified the code, obtaining one similar to the example of NSGA2 (which main difference is the selection function), then with better results, but, from my point of view, not optimal ones. Since I've spent several hours with it, I would like to ask if anyone can help with the functions I need to modify in order to improve the results in a situation like the one I've described. Any comment or suggestion would be really helpful.

When using multiobjective you have to make sure to use MuPLusLambda or MuCommaLambda algorithm. I'll quote my answer to a previous question

eaSimple replaces the whole population on each generation with the offspring keeping the number of individuals to a constant number of say N. The selection should select multiple times the best individuals (with some probabilities) so that they can reproduce more often in the next generation.

What NSGA-2 and SPEA-2 does is that they sort all the individuals according to domination and some other (important) criteria to breaks ties, select the N first (deterministically and only once each) to fill the new population and drop the last ones.

What happens in eaSimple with NSGA-2 is that there will be no selection at all, since the populaiton contains only N individuals, so none are discarded.

Another point might be that an 11 objective optimization is kind of hard even if your objectives are independent! If you can use the RMS technique you should be able to find a solution more easily.
 

Thank you very much in advance!

Cheers,
François-Michel
 

--
You received this message because you are subscribed to the Google Groups "deap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deap-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Albert-Miquel Sánchez

unread,
Dec 3, 2013, 7:07:35 AM12/3/13
to deap-...@googlegroups.com
Thank you very much for your prompt answer. Following your suggestions, I've brought to all variables to a single one using the root-mean-squared. However I still have some problems. For instance, if I define one single objective, as:

creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual",  list, fitness=creator.FitnessMin)

I get some errors when using eaSimple:

Both weights and assigned values must be a sequence of numbers when assigning to values of <class 'deap.creator.FitnessMin'>. Currently assigning value(s) 33.46388742620334 of <type 'float'> to a fitness with weights (-1.0,).

To avoid this I have to define at least two objectives. Another error I get is when using the following functions which are supposed to be inside tools, as in the example of One Max Problem (short version):

stats.register("avg", tools.mean)
stats.register("std", tools.std)


Obtaining:

'module' object has no attribute 'mean'
'module' object has no attribute 'std'

respectively. And finally, I've seen that the solution is highly dependent on the order of the objectives of the evaluation function. It seems that the first one is mainly considered, since it is the only one that either keeps the same minimum value or decreases, meanwhile the other ones sometimes increase the minimum value from one generation to the next one (which means that the best individual for this objective is not kept from one generation to the next one). I guess this is due to the selection function (I'm currently using selTournament). However, if I get using only one single objective, this won't matter, unless you suggest a different selection function.

François-Michel De Rainville

unread,
Dec 3, 2013, 8:38:50 AM12/3/13
to deap-...@googlegroups.com
Your first error comes from the fact that for single objective you must return a tuple of one value in the evaluation function (we see that like single objective is a special case of multiobjective). Note the comma in

return my_value,

The second error is that you seem to have installed version 1.0 of DEAP while reading the examples of version 0.9. The statististics function used in 1.0 are from numpy and mean and std have been removed from the tools module.

For your third error, the tournament selection does select with a lexicographic order so yes the first objective is the most important and the behaviour you observe is the correct one.

Regards,
François-Michel



2013/12/3 Albert-Miquel Sánchez <wuampa...@gmail.com>

--

Albert-Miquel Sánchez

unread,
Dec 12, 2013, 6:22:07 AM12/12/13
to deap-...@googlegroups.com
Thank you for your answers. Effectively, it was the comma. The code now works and I've been doing some tests. Apparently, the algorithm finds a solution which is close to the right one, but, from my point of view (and from my interests), is not good enough. I've been doing some tests, modifying the probabilities of mating and mutating, and other variables, without a considerable success. I've uploaded the code, which can be found here. In the evaluation function I use the function Function which has 6 inputs and 11 outputs (I've not uploaded this function, because it is written in fortran and I think is not important what it is doing, since it is a quite complicated algorithm). In order to know the performance of the genetic algorithm, I ran first the Function with the inputs:

input1 = 1.2
input2 = 0.01
input3 = 1.05
input4 = 0.5
input5 = 0.5
input6 = 0.5

The 11 outputs can be found in the evaluation function (with the names outputX_D, and, as can be seen, six of them are arrays), and are used to evaluate each solution. Then I run the genetic algorithm to see if it was capable to find the six inputs by knowing the associated outputs. The best solution I've found has been obtained with:

CXPB, MUTPB, NGEN, MU = 0.7, 0.3, 100, 2000

and it is:

input1 = 1.105006374626574
input2 = 0.01983052496770099
input3 = 1.0514238493734502
input4 = 0.32738923436912504
input5 = 0.7
input6 = 0.3

with an evaluation of 138.22061155449907. As can be seen, input4, input5 and input6 are quite far from the solution (indeed, input5 and input6 are on the
boundaries), and input1 can be also improved. The best ones are input2 and input3. I've uploaded the code to see if someone else can identify any error or
any part susceptible to be improved. As before, any suggestion will be welcome.

Thanks and regards.

Albert-Miquel Sánchez

unread,
Dec 12, 2013, 6:31:33 AM12/12/13
to deap-...@googlegroups.com
I forgot to comment that the best individual is usually found very early, around the 12th generation, and since then the improvement of the fitness value is very poor. I guess that may be the consequence of a not optimal code (maybe the evaluation function is not good enough?)...

EBo

unread,
Dec 12, 2013, 9:10:06 AM12/12/13
to deap-...@googlegroups.com
I have have been a little to busy and have not been keeping up with
this thread, but a couple of points from my personal experience that
might help:

1) if you are converging TO quickly on the solution, try increasing the
populations size -- it is well understood in population genetics that if
you have a very small population that you can loose a lot of the genetic
variation. I think they call it a genetic bottleneck.

2) you can also decrease how many offspring are selected from the most
fit individuals (basically decreasing the slope of change).

3) 12 generations to converge is not bad. I often see this in well
behaved systems. What I would be more concerned with is the structure
of the population after that point -- are they exact copies of the best
or are the genes scattered about?

Hope this helps,

EBo --


François-Michel De Rainville

unread,
Dec 12, 2013, 10:24:37 AM12/12/13
to deap-...@googlegroups.com
I see some problems in your code, but that might be just because I don't exactly know what you want to do.

- In your evaluation function, your RMS error looks really strange. I don't see why you would be adding the ratio between the desired and obtained output. You should normalize first and then compute the RMSE 

- The square of a difference is always positive so you won't need the abs(.).

- The gaussian mutation is applied to every attribute of your individuals. Since they are not in the same range this might lead to some values always going out of bound. For variable 3 you have a range of 0.2, the gaussian mutation std is set to 0.5, the greater majority of the mutation will send this value out of bound.

A solution to this is to have all the attributes in the same range, say [0 1], and "decode" them before evaluation by rescaling them to the good range.

Or you can change your operators for bounded ones: cxSimulatedBinaryBoundedmutPolynomialBounded

- The selective pressure applied by the tournament may be too high, this will most likely make you population converge faster but most often than not to a local optima. A value between 2 and 4 for the tournament size should help you reach a better optima (not to say global).

- And a little trick, you can "explode" a sequence variable to an argument sequence with the *-magic. This would shorten the call to Function. Otherwise, the [] operator still applies for individuals.

>>> def func(a, b, c):
...     print a, b, c
... 
>>> vec = [1,2,3]
>>> func(*vec)
1 2 3

Your individuals are lists.


Cheers,
François-Michel

Albert-Miquel Sánchez

unread,
Dec 12, 2013, 10:59:05 AM12/12/13
to deap-...@googlegroups.com
Thank you for your answers. I'll try to apply all the suggestions.

Regarding to the evaluation function, I want to evaluate how close are the actual outputs to the desired ones. To this end at first I decided to do: abs((Actual_Ouput/Desired_output) -1). If they are equal, this goes to 0, otherwise is greater than 0. But, I suspected that for solutions which values much under the desired ouputs, would give results also close to 0. Therefore I decided to use the relation: abs(Actual_Ouput-Desired_output). Finally, in order to make it more robust I left the rms of both relations (and that's the reason I forgot to erase the abs function).

Sorry, but I didn't understand what you mean with "You should normalize first and then compute the RMSE". How do I have to normalize?

Albert-Miquel Sánchez

unread,
Dec 16, 2013, 5:05:55 AM12/16/13
to deap-...@googlegroups.com
I've covered almost all sugestion commented before and the code is working better now. I've only missed the first one: "In your evaluation function, your RMS error looks really strange. I don't see why you would be adding the ratio between the desired and obtained output. You should normalize first and then compute the RMSE". What do you mean with normalize?

Thanks

François-Michel De Rainville

unread,
Dec 16, 2013, 8:42:57 AM12/16/13
to deap-...@googlegroups.com
If you can know the range of your error for each output, you should normalize them to have a comparable range between each output. For example, if you want to minimize two output, one (y1) that varies between 1e6 and 2e6 and the other (y2) between 1e-6 and 2e-6, the RMSE will be dominated by the larger one and the optimization will most likely focus on it because it is easier to minimize for a comparable difference.

That said, it was only the addition of the difference to the ratio that looked weird to me. You can ignore the comment if you want. You are certainly the most qualified person to define your evaluation function based on your problem.

Regards,
François-Michel

Le 2013-12-16 à 05:05, Albert-Miquel Sánchez a écrit :

I've covered almost all sugestion commented before and the code is working better now. I've only missed the first one: "In your evaluation function, your RMS error looks really strange. I don't see why you would be adding the ratio between the desired and obtained output. You should normalize first and then compute the RMSE". What do you mean with normalize?

Thanks

Reply all
Reply to author
Forward
0 new messages