About default training settings with clinicadl

33 views
Skip to first unread message

Melvin Selim Atay

unread,
Sep 5, 2023, 6:28:02 AM9/5/23
to Clinica

Hi,
Before moving on testing the networks I implemented. I have been trying to run the defaults and see how it performs on my selection of ADNI data. 

Firstly, I want the default CNN with 5 convolutional layers not to overfit in 3D classification so much and be able to lower the validation loss during the training. Only using baseline data does not allow model to generalize to avoid this problem, I wanted to use all of the images which split according to the subjects and tested my splits there is no data leakage.

There is something that confuses me, the `--longitudinal` flag only works on training data split, even though to split I used the option `--not_only_keep_baseline` option to create the validation.tsv and test.tsv files. It creates validation data with the baseline scans of the validation set, I renamed it validation_data.tsv to avoid naming confusion. (Normally in the reciprocal folders it creates data.tsv.)

When running classification in the maps folder clinicadl creates group splits, in there I noticed that it only used the baselines from the validation split. In the code I notice that it can only look for the baselines for the validation split. It works well on training split even helps on fast convergence of the training accuracy.

Once I figure the settings working out, I will definitely use k-fold crossvalidation, drop out, different optimizers as well.

I added the relevant tsv files, and my training script.

Thanks for your support.

Mel.



trainDL1.sh
test_baseline.tsv
validation.tsv
validation_data.tsv
test.tsv
train_data.tsv
validation_baseline.tsv
train.tsv
train_baseline.tsv
train+validation.tsv

camille brianceau

unread,
Sep 13, 2023, 11:35:16 AM9/13/23
to Clinica
Hi, 

I have reviewed the code and I've figured that even when using the `--not_only_keep_baseline` option to create the validation.tsv file, you cannot specify its use in the train command. Unfortunately, the only workaround I can suggest is to rename the validation.tsv file to validation_baseline.tsv. However, this solution is far from ideal but I think it would work as the split command is specifically designed to prevent data leakage. 

I am going to implement this option in clinicadl and it will be available in the next release. 

I hope I understand everything, 

Do not hesitate if you have other questions. 

Bests regards, 
Camille 

Melvin Selim Atay

unread,
Sep 13, 2023, 12:21:59 PM9/13/23
to Clinica
Hi,
Thanks for your detailed reply.
I understand the effort on preventing data leakage and appreciate that a lot. Thanks. But splitting while keeping longitudinal data according to subjects so validation.tsv only contains the corresponding additional scans and that's all.
I understand the workaround and it seems like improving. There is also another issue arise while MAPS folder is being created it makes a file train+validation.tsv contains all the files not only the baseline all of the scans and it has exact same number of subjects with the original label.tsv . I do not have detailed information how clinicadl handles it but it may be also causing a possible leakage.
Thanks I'm looking forward to the next release.
Kind regards,
Melvin.

Melvin Selim Atay

unread,
Sep 15, 2023, 10:54:43 AM9/15/23
to Clinica
Maybe I should add a clarification and a side note, In my split folder that I used for training there exist only validation.tsv, validation_baseline.tsv, train.tsv, train_baseline.tsv I keep prediction and test files in a different directory to avoid any possibility of being used for training.
Reply all
Reply to author
Forward
0 new messages