New strategies tested for multi-class classification

33 views
Skip to first unread message

Mihai Oltean

unread,
Apr 26, 2016, 1:34:17 AM4/26/16
to Multi Expression Programming
Hi there,

In the past several days I have spent a lot of effort on implementing / testing new strategies for multi-class classification in conjunction with MEP.

Here are some of them:

1. For each gene, of the MEP chromosome, we compute the value of that expression on each training data. For each class k we get the range where the outputs for data belonging to that class fall (lets say that for data belonging to class 0 all outputs are in [-3...7] interval) and then we compute the center of that range (which is 2 in this case). Then we classify each data to the nearest center. If there are data that are at the same distance from multiple centers, we take the first one. The number of incorrectly classified data is the fitness.

2. We compute the range of values for each class (as described above at item 1). Then, for each data we check if it falls on multiple ranges. For instance data #3 has output (generated by a gene) equal to 2 which falls in the range of values for class 0 and class 7. Falling in the range of values of more than one class is not a good thing (because we want to have the data separated), so the number of data falling into multiple classes is the fitness (which should be minimized).

I have made some tests with these strategies, but all are worse than what I have so far.

regards,
mihai

Mihai Oltean

unread,
Apr 26, 2016, 3:59:20 AM4/26/16
to Multi Expression Programming
Another strategy tested:

This is inspired by the current binary classification strategy which discovers the threshold automatically.

Assume that all classes have the same number of items. Lets say that we have 1000 items and 10 classes, that means 100 items / class. The same reasoning can be applied if the classes are not balanced.

We have the values of an expression for all training data (1000 in our case). We sort these values ascendingly. 
If the items would be perfectly separable, it means that the first 100 items would belong to one class, the next 100 would belong to another class ... and so on.

We don't know where each class will fall (for instance items belonging to class #7 can be in the first 100 positions, or in the next 100 positions ... or in the last 100 positions etc), so we have to make a decision here: For instance we can say that a class is allocated to the range of positions where it has the most number of representatives. For instance, if class #7 has been represented most in range 300-400, we decide that that slot is for class #7. All other items not belonging to class #7 and still falling within slot 300-400 will be counted as incorrectly classified. The fitness is total number of incorrectly classified items (for all classes).

We have also tested this strategy against problems with 10 classes, and the results are not as good as the official multi-class strategy that we currently have implemented in MEPX.

still experimenting,
mihai
Reply all
Reply to author
Forward
0 new messages