GP Static depth limit not enforced

510 views
Skip to first unread message

Manuel Belmadani

unread,
Mar 15, 2016, 1:08:26 AM3/15/16
to deap-users
Hi everyone,

My software uses a Strongly Typed GP.  On some of my more challenging datasets, I have the impression the solutions get stuck in a local maximum and essentially just adds redundant information to the solutions, making a very deep tree with no improvement over the actual output solution. It used to be pretty bad, but I got around it by setting a static limit as so:

 """ Static Limit for GP Tree """
            MAX_HEIGHT = 17 # Koza 1999                                                                                                                                                                                                      
            self.toolbox.decorate("mate", deap.gp.staticLimit(operator.attrgetter('height'), MAX_HEIGHT))
            self.toolbox.decorate("mutate", deap.gp.staticLimit(operator.attrgetter('height'), MAX_HEIGHT))


And that seemed to have fixed it, until I moved to a more challenging dataset. Out of 114 datasets, 18 of them terminated with this error:

Traceback (most recent call last):
  File "mysoftware.py", line 209, in <module>
    offspring, log = engine.run(options.ngen)
  File "/home/hpc3019/development/myproject/engine.py", line 133, in run
    timelimit=self.options.timelimit
  File "/home/hpc3019/development/myproject/evoalgo.py", line 96, in eaFortin
    fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
  File "/home/hpc3019/development/myproject/stgpfitness.py", line 160, in fitnessfunction
    pattern = self.toolbox.compile(expr=program)        
  File "/home/hpc3019/.local/lib/python2.6/site-packages/deap/gp.py", line 464, in compile
    return eval(code, pset.context, {})


How can I resolve this issue?

I'm using modified versions of the eaSimple with the improved NSGA-IIR (Fortin, F.-A. & Parizeau, M. Revisiting the NSGA-II crowding-distance computation. in Proceedings of the 15th annual conference on Genetic and evolutionary computation 623–630 (ACM, 2013).)
Where should I be looking, or what other information could help shed light on this? Could it be in the way I implemented my custom EA? Maybe my grammar design is flawed? 

The strangest thing is that it used to be much worse; it wouldn't do this on trivial datasets. Then I ran it on 13 medium difficulty datasets, for NSGA-IIR, SPEA2 (with mu plus lambda), for 10 different random seeds (260 individual runs in total), and it happened only once. Now I'm running harder datasets, and it seems to occur more frequently.

Any suggestions or recommendations on where to look, or what to do, would be greatly appreciated.


François-Michel De Rainville

unread,
Mar 15, 2016, 8:49:02 AM3/15/16
to deap-users
Can you please add the error?

Thx

--
You received this message because you are subscribed to the Google Groups "deap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deap-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manuel Belmadani

unread,
Mar 15, 2016, 1:51:46 PM3/15/16
to deap-users
Ahh, sorry, did not notice it was left out. It's the standard DEAP error for this situation. 

MemoryError: DEAP : Error in tree evaluation : Python cannot evaluate a tree higher than 90. To avoid this problem, you should use bloat control on your operators. See the DEAP documentation for more information. DEAP will now abort.

Marc-André Gardner

unread,
Mar 15, 2016, 2:08:34 PM3/15/16
to deap-users
Hi Manuel,

That's an interesting problem. I see maybe two causes for that :
1) It's not what we think. If we look at the code in gp.py (https://github.com/DEAP/deap/blob/master/deap/gp.py) :

    try:
        return eval(code, pset.context, {})
    except MemoryError:
        _, _, traceback = sys.exc_info()

As you can see, DEAP assumes that a MemoryError is always caused by a tree too high. Of course, in practice, it may not be always this cause. So, for instance, if, when evaluating a tree, you use a too big dataset (and overflows the available memory), DEAP will display this error, even if, in this particular case, the message is misleading. That would explain why you do not experience this issue with smaller datasets.
Could you add some code in gp.py to assert the height of the trees causing this error?

2) The staticLimit decorator applies only on the operators explicitely decorated (of course). Are you sure that you do not do any other operation on the trees, including in the initialization? If you want, you can easily output the min/mean/max height with the stats module, so you can see when the problem arise. Is this at the first generation or after?

Thanks for your time, we will try to work this issue out!

Marc-André

François-Michel De Rainville

unread,
Mar 15, 2016, 2:16:49 PM3/15/16
to deap-users
3) When using decorators, be sure to use the return value of the function not the inplace modification. The former will have the result of the decorator and the latter only of the variation.

Manuel Belmadani

unread,
Mar 15, 2016, 2:50:49 PM3/15/16
to deap-users
Marc-André: 
The memory issue is interesting. I had not considered it.

real    900m32.246s
user    899m26.060s
sys     0m32.833s
1757    50         0.977758        1               0.926626        0.0199056       -3618.73        -687.94         -5565.8         1418.56
1758    50         0.977758        1               0.926626        0.0199056       -3618.73        -687.94         -5565.8         1418.56
1759    50         0.977758        1               0.926626        0.0199056       -3618.73        -687.94         -5565.8         1418.56
1760    50         0.977758        1               0.926626        0.0199056       -3618.73        -687.94         -5565.8         1418.56
Traceback (most recent call last):

  ...(same trace as before)...
MemoryError: DEAP : Error in tree evaluation : Python cannot evaluate a tree higher than 90. To avoid this problem, you should use bloat control on your operators. See the DEAP documentation for more information. DEAP will now abort.

real    725m42.452s
user    724m14.344s
sys     1m22.018s
1806    50         0.964538        1               0.882966        0.0318138       -3942.1         -498.117        -6169.68        1786.99
1807    50         0.964538        1               0.882966        0.0318138       -3942.1         -498.117        -6169.68        1786.99
1808    50         0.964538        1               0.882966        0.0318138       -3942.1         -498.117        -6169.68        1786.99
1809    50         0.964538        1               0.882966        0.0318138       -3942.1         -498.117        -6169.68        1786.99
Traceback (most recent call last):
  ...(same trace as before)...
MemoryError: DEAP : Error in tree evaluation : Python cannot evaluate a tree higher than 90. To avoid this problem, you should use bloat control on your operators. See the DEAP documentation for more information. DEAP will now abort.

I both cases, this occurs after a few hours, after the population seems to stall (Those are my objectives' stats.) I'll modify gp.py and report what I find. 

My datasets aren't huge (maybe 20Mbs at most), but I might have a memory leak somewhere. The software does seem to get slower and slower as the generation increases, and the solutions shouldn't be much more complex to evaluation.

I'm also running this on a cluster, I had not considered that as a factor, but would you think that maybe be an issue? If the different nodes share memory, for example (I'm just throwing this out there, not sure if this would actually make sense.)

I don't think I'm using any other operators than mating and mutating. I'll check the initialization, but I don't think I decorated anything else with the static limit. Logging the depth of the trees sounds like a good idea.

François-Michel: 
Could you give me an example of returning an inplace modification versus the value of the function? I'm using decorators pretty much exactly like in the documentation, unless it's a subtle difference and I goofed up.

François-Michel De Rainville

unread,
Mar 15, 2016, 2:53:51 PM3/15/16
to deap-users
ind_2 = toolbox.mutate(ind_1)

ind_1 is mutated but the decoration has not been applied
ind_2 has both operations done.

Manuel Belmadani

unread,
Mar 21, 2016, 10:25:16 PM3/21/16
to deap-users
Alright, sorry for being MIA for a few days. I'm going through your suggestions, and I think I may have a lead. I included the logging of the maximum depth of the trees in the population.

In logbook.record(), I added

maxdepth=numpy.max([x.height for x in population])

In gp.py, I added:

assert (expr.height <= 17), "Static limit not respected. expr "+str(expr)+" has height "+str(expr.height)
(Hardcoded value, unfortunately. Does anyone know how to easily  the decoratored static limit from gp.py?)

And seems like maxdepth exceed 17.
gen evals new maxdepth avg     max     min std       avg       max min     std     
0   0     0       4       0.489572 0.517073 0   0.0700974 -0.607054 1   -1.10812 0.267261
0   50   55     7       0.501637 0.523923 0.5 0.00495394 -0.770868 -0.648516 -3.14516 0.402548
Writing gen 0 at path ./OUT/default/FORTIN/42/0.checkpoint
1   50   81     7       0.505192 0.582418 0.5 0.0143706 -0.945452 -0.650123 -3.14516 0.603596
2   50   113     10       0.516542 0.583815 0.5 0.025932   -1.63016 -0.652377 -11.7192 1.7722  
3   50   149     10       0.531343 0.583815 0.501933 0.0269473 -2.29389 -0.777242 -11.7192 1.78354 
4   50   182     12       0.546434 0.619048 0.50678 0.0267584 -2.79703 -0.930487 -11.7192 1.72253 
5   50   206     11       0.569386 0.666667 0.529491 0.0319262 -3.66291 -1.01325 -11.7192 1.86898 
6   50   238     13       0.607038 1       0.53255 0.0874244 -3.85223 -0.693147 -11.7192 2.38831 
7   50   263     13       0.69657 1       0.560764 0.157543   -3.23488 -0.693147 -11.7192 2.93072 
8   50   292     11       0.765498 1       0.560764 0.187721   -3.66894 -0.693147 -11.7192 3.63597 
9   50   321     11       0.800616 1       0.560764 0.19084   -4.08592 -0.693147 -11.7192 3.84219 
10 50   345     14       0.816608 1       0.583815 0.169276   -4.02753 -0.693147 -15.1533 3.62791 
11 50   376     14       0.839427 1       0.588235 0.151574   -4.07066 -0.693147 -15.1533 4.07989 
12 50   409     15       0.863144 1       0.596916 0.141477   -4.13747 -0.693147 -15.1533 4.30664 
13 50   432     15       0.76096 1       0.596916 0.128621   -6.07579 -1.38674 -15.1533 4.01724 
Traceback (most recent call last):
  File "mysoftware.py", line 213, in <module>
    offspring, log = engine.run(options.ngen)
  File "/home/y/development/myproject/engine.py", line 134, in run
    CLUTCH=self.options.clutch
  File "/home/y/development/myproject/evoalgo.py", line 111, in eaFortin
    fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
  File "/home/y/development/myproject/stgpfitness.py", line 157, in fitness_function
    pattern = self.toolbox.compile(expr=program)        
  File "build/bdist.linux-x86_64/egg/deap/gp.py", line 467, in compile
AssertionError: Static limit not respected. expr add(primitive_charclass(False, True, True, or_(False, True)), add(add(add(add('C', primitive_charclass(and_(or_(True, False), or_(True, True)), not_(not_(True)), and_(or_(True, not_(not_(or_(True, True)))), not_(True)), or_(True, True))), add(primitive_charclass(False, False, True, False), primitive_charclass(not_(False), or_(False, False), and_(True, False), and_(and_(or_(not_(not_(True)), not_(and_(False, False))), not_(and_(not_(or_(not_(True), or_(False, and_(or_(and_(or_(True, True), or_(False, False)), and_(not_(not_(True)), not_(True))), not_(or_(not_(False), or_(False, True))))))), and_(True, False)))), True)))), primitive_charclass(not_(and_(or_(True, and_(and_(and_(or_(True, False), not_(True)), not_(not_(True))), not_(and_(or_(False, False), and_(True, True))))), not_(True))), or_(False, False), and_(True, True), and_(False, not_(not_(or_(and_(not_(and_(not_(True), and_(False, True))), and_(not_(not_(True)), not_(not_(True)))), True)))))), add('G', 'A'))) has height 18


Any suggestions?

I also realize that my grammar has a lot of recursive boolean expressions. Would that obscure the evaluation of the depth in some way?

I checked that decoration was properly applied  (François-Michel's suggestion)and didn't see anything suspicious, but I'll double check, because my trace seems to support that idea.
Message has been deleted

Manuel Belmadani

unread,
Mar 21, 2016, 11:27:12 PM3/21/16
to deap-users
I should've double-checked earlier. Turns out that the code I used from the NSGA-IIR repository did not return the decorated individual.

Before:
        offspring = toolbox.preselect(population, len(population))
        offspring = [toolbox.clone(ind) for ind in offspring]
                                                                                                                                                                                                                        
        for ind1, ind2 in zip(offspring[::2], offspring[1::2]):                                                                                                                                             
            if random.random() <= cxpb:                                                                                                                                                                     
                toolbox.mate(ind1, ind2)                                                                                                                                                                    
            if random.random() > mutpb:                                                                                                                                                                     
                toolbox.mutate(ind1)                                                                                                                                                                        
            if random.random() > mutpb:                                                                                                                                                                     
                toolbox.mutate(ind2)                                                                                                                                                                       
            del ind1.fitness.values, ind2.fitness.values      

After:
        offspring = toolbox.preselect(population, len(population))
        offspring = [toolbox.clone(ind) for ind in offspring]
        offspring = varAnd(offspring, toolbox, mu, cxpb, mutpb)


The new version does not seem to produce the error anymore. varAnd would essentially accomplish the same thing as the manual version, correct?

I'll rerun my batch of jobs to see if the error recurs, but hopefully this did the trick. 

François-Michel De Rainville

unread,
Mar 22, 2016, 9:10:00 AM3/22/16
to deap-users
Where did you find the NSGA-IIR code?

Manuel Belmadani

unread,
Mar 22, 2016, 2:14:57 PM3/22/16
to deap-...@googlegroups.com
I think this was it: https://github.com/DEAP/experimental/blob/master/fortin2013/nsga2.py#L100

I may have added a few things; it's been a while.

--
You received this message because you are subscribed to a topic in the Google Groups "deap-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/deap-users/g2oV4wH3Tps/unsubscribe.
To unsubscribe from this group and all its topics, send an email to deap-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages