Gini coefficient

21 views
Skip to first unread message

angel b.

unread,
Feb 4, 2022, 9:37:40 AM2/4/22
to HeuristicLab
Hello
I am trying to confirm the results of the Gini coefficient in Genetic Programming - Symbolic Regression.
HL provides two values, one on the estimated bounded values and one on the labels.
I try to reproduce the calculations and I am unable to reach the results of the program.
Could you help me with the calculations in each case?
I attach a link to: .hl file with the chosen solution, excel file with the exported solution and excel file with the calculations I have done (calculation on the estimated values without bounding, bounded and Brown's formula).
Thanks in advance

https://www.dropbox.com/s/si4862jd59znw5q/Public.7z?dl=0

Kommenda Michael

unread,
Feb 8, 2022, 3:31:25 AM2/8/22
to heuris...@googlegroups.com

Dear Angel,

 

I have created an excel file where I manually calculated the normalized gini index for the estimated training values. Therefore, I have opened your attached solution, copied the estimated class values into excel and discarded the test values. Afterwards, I followed the instructions from [0] to calculate the normalized gini index. Basically, you sort the values, normalize the values and calculate the area under the cumulative curve. The only difference to the described procedure is that I have ordered the values in descending order. The second tab in the attached excel does the same calculations, but the values are order by the target so that the max obtainable gini index is calculated. Afterwards, I just divided both gini indexes (actual and maximum) and achieved a normalized value-0.744074858274709 that is up to the 13th decimal place the same as the value in HL.

 

An alternative description to calculate the gini index is given in [1] and our implementation can be found at [2].

 

I hope this helps,

Michael

 

 

[0] https://theblog.github.io/post/gini-coefficient-intuitive-explanation/

[1] https://www.kaggle.com/c/ClaimPredictionChallenge/discussion/703

[2] https://svn.heuristiclab.com/svn/core/trunk/HeuristicLab.Problems.DataAnalysis/3.4/OnlineCalculators/NormalizedGiniCalculator.cs  

--
You received this message because you are subscribed to the Google Groups "HeuristicLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to heuristiclab...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/heuristiclab/8a65e041-6fb5-4f2b-9cc3-63ab2ab5f1b9n%40googlegroups.com.

Gini.xlsx

angel b.

unread,
Feb 8, 2022, 10:13:00 AM2/8/22
to HeuristicLab
Thank you very much, michael
The gini coefficient calculated on the estimated values works perfectly.
There is another Gini coefficient  (Norm.Gini coeff. in HL)  that uses the class values produced by a classifier. Could you tell me how it works?
Thanks in advance

Gabriel Kronberger

unread,
Feb 8, 2022, 10:17:26 AM2/8/22
to noreply-spamdigest via HeuristicLab
Hi Angel,

the code is exactly the same for calculating the Gini index for the class values and the Gini index for the outputs of the discriminant function. 

The Gini coefficient is calculated for the bounded values for classification solutions produced via genetic programming (called symbolic classification in HL).

Yes, in fact both Gini values are included in the Run collection views (e.g. in the Table including all parameters and results for all runs).

Best, Gabriel
Reply all
Reply to author
Forward
Message has been deleted
0 new messages