Hello,
We are researchers at the University of Michigan trying to use the MSK Chord
data (using msk_chord_2024.tar.gz), and we are aiming to perform several longitudinal and survival analyses across cancer types. We wanted to ensure that our interpretation the START_DATE used in each file is correct before doing any further analyses.
Does the START_DATE value for each patient used in the data_timeline_diagnosis.txt
represent the date of diagnosis, and is the value of START_DATE in all other files
relative to the date of diagnosis? For example, if Patient x has START_DATE = -100
in data_timeline_diagnosis.txt, does an observation for Patient x in
data_timeline_progression.txt with START_DATE = 100 mean that this observation
in data_timeline_progression.txt was 200 days after diagnosis and does an
obervation for Patient x in data_timeline_performance_status.txt with
START_DATE = 500 mean that this observation in data_timeline_performance_status.txt
was for 600 days after diagnosis?
We have observed a few cases where the START_DATE provided did not make much
sense (e.g., some patients had certain treatments before diagnosis), so we
wanted to ensure we were interpreting START_DATE across files correctly.
Thanks