Hi,
There is a new classification strategy starting with MEPX 2022.02.07 (and in libmep).
It is called Closest Center and maybe is the simplest (to explain and implement), so far.
Each particular MEP formula (remember that MEP encodes multiple formulas in a single chromosome) is applied to all training data.
- For each class, we keep the set of values generated by that formula. If we have a problem with 3 classes, we get 3 sets of values.
- We compute the center of mass of the set (for each class; if we have 3 classes we obtain 3 centers). The center of mass was chosen instead of geometric center in order to reduce the influence of the outliers.
- Then, each data is classified to class with the nearest center. When distances to different centers are equal, the class with the lowest index is chosen.
- The fitness, of a formula, is the number of incorrectly classified data by that formula.
This computation of centers is made for each instruction (formula) in the chromosome. The best gene (providing the minimum number of incorrectly classified data) gives the fitness of the chromosome.
I currently run some experiments to compare all 5 classification strategies in MEPX.
Some strategies are several years old, but I did not make a experimental comparison yet.
Now it is time to do that.
I'll post a draft document with the results of the comparison in few days.
regards,
Mihai