Understanding the prediction sets

J F

unread,

Aug 25, 2021, 6:15:09 AM8/25/21

to TADPOLE

Hi everyone,

I'm new to the challenge and have some difficulty understanding how to use the D2 data for a simple submission. I want to make predictions for ADAS13.

The spreadsheet provided for the simple submission asks for several RIDs and specific forecasting dates (starting 2018-01, up to 2022-12). However, RID==2, for instance, only has data up to 2015-09 in the D1_D2 data csv. Does that mean I can utilize any data that is provided in the csv to forecast this specific time-range, regardless of whether the D2 in the csv column is labelled 1?

That would mean I can build a time-dependent model that takes in longitudinal data for each individual for making predictions for D2. Correct?

However, although not necessary for the simple submission, if I understand correctly, D3 is a purely cross-sectional data set, meaning that I could only inform my model using 'baseline' data (as if it's the baseline of a clinical trial). That would ask for a different kind of model that does not use longitudinal data but only cross-sectional data of an individual.

Are these two different kinds of challenges, or am I looking at it wrong?

Thanks!

Neil Oxtoby

unread,

Aug 26, 2021, 8:43:07 AM8/26/21

to TADPOLE

Welcome!

In short, you can train your forecasting model on any data you like. ADNI and/or otherwise.

We provided D1 (which includes D2 and D3) from ADNI to help, but there are no restrictions on data, nor model, for any of the tasks in TADPOLE Challenge. Forecasts are required only for individuals where `D2==1`, but models can be trained on any data.

Having said that, it's against the spirit of the D3 sub-challenge of TADPOLE to train a model on longitudinal data from individuals in D3 — because in a clinical trial you would only be guaranteed to have a single visit for each individual recruited into the trial. The idea in this sub-challenge is to have a trained forecasting model (for example using ADNI data from individuals NOT in D3) that can take the D3 data as "input" to produce forecasts. Obviously this is easier if you use the longitudinal data from the D3 individuals themselves to train your model, but this is not in the spirit of this sub-challenge.

Cheers

J F

unread,

Aug 27, 2021, 8:02:30 AM8/27/21

to TADPOLE

Thanks for your helpful response!
It's clear to me now how D3 is used.

On a related note, is it acceptable to use information from later dates of certain individuals than those given in D1/D2?
For instance, RID==2 has measurements taken in 2016 and 2017 but these are not given in D1/D2. I could add that information to the data (without adding any measurements from 2018 onwards) but I'm assuming this is also against the spirit of the competition (as I'll have to predict less far forward in time when assing my prediction against D4 for instance) ?

Kind regards,
Jeroen

Neil Oxtoby

unread,

Aug 27, 2021, 8:42:07 AM8/27/21

to TADPOLE

I think what you propose is within the spirit of the challenge (use any data you can find), but I would be hesitant to directly compare your forecasts with previous challenge submissions.

Interesting that you found additional visits. Without looking at the data (I don't have time right now), I would guess that these visits weren't included in `ADNIMERGE` at the time of assembling the TADPOLE dataset for some reason or another.

Reply all

Reply to author

Forward