TadpoleShare vs TadpoleChallenge results in benchmark methods

52 views

Skip to first unread message

Mónica Hernández Giménez

unread,

Jan 27, 2021, 4:20:25 AM1/27/21

to TADPOLE

Dear TadPole organizers,

I am interested in studying the performance of the different machine learning methods

in the diagnosis problem. I have downloaded the codes from TadpoleShare. Then I have

tried to reproduce the metrics reported in

https://tadpole.grand-challenge.org/Results/

for the Last Visit and SVM benchmark methods.

For the D2 on D4 experiment and Last Visit I am getting

mAUC 0.741

BCA 0.760

while in the table it is stated that the metrics should be
0.774
0.792.

For the D2 on D4 experiment and SVM benchmark I am getting

mAUC 0.796

BCA 0.767

while in the table it is stated that the metrics should be

0.836

0.764.

For the mAUC and SVM it seems to me a big difference...

Do you have any guess on who is responsible of the obtained differences?

Maybe the Tadpole dataset is dynamic and we are dealing with different

patients than when the table was generated?

Maybe the implementation of the algorithms have evolved and they are now different?

Have you experience the similar changes on the metrics?

Best regards.

Esther Bron

unread,

Jan 28, 2021, 9:29:44 AM1/28/21

to TADPOLE

Dear Monica,

Thank you for your interest in TADPOLE and the TADPOLE-SHARE algorithms.

The performances that you report for the TADPOLE-SHARE benchmark algorithms are the same as we are currently getting using that code. These are indeed somewhat different from the performances by original benchmark algorithms reported on the website. This is caused by the TADPOLE-SHARE algorithms being a complete reimplementation of the benchmark methods, in which basically every line of code was rewritten. This results in easier to understand and easier to use code, but indeed also in some perfomance differences. The datasets should be the same (D1D2 for training, D4 for evaluation), and I do not expect differences in those to be the reason. We tried to work back on the exact differences but did not find them yet (I expect that it might be for example the differences in the used approach for imputation of missing values). We are still working on comparison of those algorithms and including more algorithms in TADPOLE-SHARE. For this, we have decided that our goal is not to get the exact same performance, but to have sharable and usable algorithms with interpretable code, with a performance close enough to the original.

Hope this answers your questions, and please let me know if you have additional questions.

Kind regards,

Esther Bron

Op woensdag 27 januari 2021 om 10:20:25 UTC+1 schreef Mónica Hernández Giménez:

Reply all

Reply to author

Forward

0 new messages