deap gp classification tree example / implementation?

639 views
Skip to first unread message

William Smits

unread,
Jul 13, 2013, 2:51:37 PM7/13/13
to deap-...@googlegroups.com
Has anyone used deap for a genetic programming classification tree similar to the one in this paper?  Scroll down to the credit card example for quick idea.   http://sci2s.ugr.es/keel/pdf/specific/articulo/scopus006.pdf
Basically constituting the use of boolean and or not, if then else functions, and the use of ranged parameters.  
I am not seeing how to create the proper functions with constraints using the existing libraries.

For instance the if then else function must have  the constant or return term be in the middle/second child.

Anyone familiar with deap have a reference example, or feel like cranking out an example function?  I would think a classification tree would be common?

I was able to create a gp classification tree in pure python, but the code is ugly and inefficient.  

Marc-André Gardner

unread,
Jul 13, 2013, 4:37:52 PM7/13/13
to deap-...@googlegroups.com
Hi,

There is already a classification tree example with the spambase dataset (it can be found in examples/gp/gp_spambase.py). However, it is not exactly the same paradigm, as it uses STGP (Strongly-typed GP) to directly use the dataset features.

The biggest issue is that the authors someway seemed to have reinvented the wheel. In the paper, they redefine the crossover and mutation operators so they match their needs. This is not standard GP (nor standard GP classification approach), and so it is not implemented in DEAP.
You may implement it by using STGP, as stated above (you would have one class for "rules" and the other for "conditions"), or by redefining the crossover / mutation operators.

That said, you will still have to implement also the specific GP operators presented in the paper (merge and elimination).

I must said that I'm unsure of the exact problem your facing, maybe I missed something in your message. Do not hesitate to ask again if you have any remaining question.

Have fun with DEAP,

Marc-André Gardner

William Smits

unread,
Jul 15, 2013, 6:07:26 PM7/15/13
to deap-...@googlegroups.com
Thanks for the reply, agreed, ignore the custom mutation and crossover, i am mainly trying to recreate the basic functions/ example problem. 
The spam example is exactly what I was looking for, it has the if-then-else, and, or, not, already made, and the strongly typed format is clear.
My problem currently is I am not seeing on how to create the range based parameter functions, eg jobtype, salary, education functions referencing the paper.     
Not seeing how I would evaluate an input without having a paramater input / child node?
{X0,X1,X2,} range functions for input arg1 x0
python/psuedo code attempt at basic function for those?
def X0(x0):
  if x0 <1:
    return True
  else:
    return False

def X1(x0):
  if 1<x0<2:
    return True
  else:
    return False
def X2(x0):
  if 2<x0<3:
    return True
  else:
    return False 

Marc-André Gardner

unread,
Jul 16, 2013, 4:27:18 PM7/16/13
to deap-...@googlegroups.com
Hi again,

The simplest way to do so is to change a bit your dataset so it becomes a bit (or boolean) vector. For instance, from the instance "Job=laborer, Education=High school, Salary=200-490", you produce the following vector : [1 0 0 0 1 0 0 0 0 1 0 0]
This vector can also be seen as : {J0=True, J1=False, J2=False, J3=False, E0=True, E1=False, ...}

In this way, you can directly add J0, J1, and so on, as boolean terminals.

This should work. However, will it work fine is of course an entirely different question ;-)
Intuitively, I would say that this approach will not be as efficient as others since the differenciation is the same between the terminals representing the same features than between the terminals reprensenting different ones.
For instance, you may end up with a condition like :
IF J0 AND J1

which will obviously be always false, since J0 and J1 are actually representing the same attribute...

Nevertheless, even if there would be other ways to implement this classification problem, overall I think this would be a good starting point.
If you would like to learn more about GP classification, I suggest you this survey : https://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5340522 (restricted access only, unfortunately), especially the guidelines section.

Have a good day,

Marc-André Gardner
Reply all
Reply to author
Forward
0 new messages