How to generate the Leaderboard datasets using Python

196 views
Skip to first unread message

Neil Oxtoby

unread,
Sep 15, 2017, 7:21:56 AM9/15/17
to TADPOLE
Just starting out with Python? Want to make a TADPOLE leaderboard submission? This discussion is for you.

To assist with generating valid leaderboard submissions to TADPOLE Challenge, we have provided "helper scripts" in the TADPOLE GitHub repository (currently under the evaluation folder). Some scripts are written in Python, some in MATLAB. (MATLAB is not free, but we hope that our MATLAB code will work with the free alternative Octave, although we have not tested it.)

A key step in generating a leaderboard submission is having access to the leaderboard datasets LB1, LB2, and LB4, which are all subsets of TADPOLE dataset D1 (see TADPOLE_D1_D2.csv from the TADPOLE data download). For this, you need to be able to run makeLeaderboardDataset.py.

To help anyone who is just starting out with python, below is a sequence of steps that one of the TADPOLE organisers had to follow to generate the leaderboard datasets, using a Mac. We hope that they might be useful for users of other computers, and we encourage you to contribute to the discussion.

=== Example steps to generate TADPOLE Leaderboard Datasets on a Mac ===
1. Download the TADPOLE standard datasets, including TADPOLE_D1_D2.csv.
2. Install a terminal application if you do not have one. (Macs come with one installed.)
3. Install python using the terminal. For this we first installed a free software package manager known as Homebrew (Mac only):
brew install python3
4. Install extra dependencies:
pip3 install pandas
5. Create a TADPOLE data folder and change into it:
mkdir TADPOLEData; cd TADPOLEData;
6. Download the TADPOLE code from GitHub here and move it into the TADPOLEData folder. If you have Git installed, you can use the following command in a terminal:
7. Copy TADPOLE_D1_D2.csv into TADPOLEData/TADPOLE/ (the TADPOLE code folder downloaded from GitHub). For example, if TADPOLE_D1_D2.csv is in TADPOLEData, then this will work if you have been following the steps above:
cp TADPOLE_D1_D2.csv ./TADPOLE/
8. Change into the evaluation folder, where you can find makeLeaderboardDataset.py:
cd ./TADPOLE/evaluation
8. Run the python script to generate the leaderboard datasets:
python3 makeLeaderboardDataset.py

If successful, you will find TADPOLE_LB1_LB2.csv and TADPOLE_LB4.csv in the TADPOLEData/TADPOLE/evaluation folder.

As of this writing, the only script we have written for actually generating a (very simple) leaderboard submission is in the MATLAB language: TADPOLE/evaluation/TADPOLE_SimpleForecastExampleLeaderboard.m

You are welcome to use this script as a starting point for generating your own submissions. If you happen to translate this into Python, or R, et al., then please share your code with the TADPOLE Challenge Community (using this forum).

Thanks!

aviv...@gmail.com

unread,
Sep 21, 2017, 3:10:56 PM9/21/17
to TADPOLE
Hi,
I generated the leaderboard's datasets using your scripts, and I noticed there are only 2 patients in LB2 (patients with RID 2 & 3). 
Am I missing something? It is a very small test set.

Thank you,
Aviv

Neil Oxtoby

unread,
Sep 22, 2017, 5:28:59 AM9/22/17
to TADPOLE
Hi Aviv,

Sorry to hear that. When I follow the steps, I get 858 rows where the LB2 column equals 1.

Have you tried using the Makefile? It automates the steps.

Try this from the evaluation folder, making sure that TADPOLE_D1_D2.csv is in the parent folder:
make leaderboard


Vikram Venkatraghavan

unread,
Oct 9, 2017, 9:30:06 AM10/9/17
to TADPOLE
Hi Aviv,

I get the same problem when I use python instead of python3 in the above scripts. Make sure you use python3 and you would get a proper leaderboard dataset.

Vikram

gura...@gmail.com

unread,
Oct 18, 2017, 4:17:24 PM10/18/17
to TADPOLE
Hi Vikram,

Otherwise, I think changing the lines

    LB2[maskCurrSubjADNI1] = 1
    LB4[maskCurrSubjADNIGO2] = 1

to

    LB2[np.where(maskCurrSubjADNI1==True)[0]] = 1
    LB4[np.where(maskCurrSubjADNIGO2==True)[0]] = 1

resolves the problem (using the "index" of selected RID's to set LB2 and LB4)
 
 Best,
 
 guray

Reply all
Reply to author
Forward
0 new messages