Question about START_DATE in MSK Chord

22 views
Skip to first unread message

Nicholas Henderson

unread,
Nov 5, 2025, 10:04:15 AMNov 5
to cBioPortal for Cancer Genomics Discussion Group
Hello,

We are researchers at the University of Michigan trying to use the MSK Chord
data (using msk_chord_2024.tar.gz), and we are aiming to perform several longitudinal and survival analyses across cancer types. We wanted to ensure that our interpretation the START_DATE used in each file is correct before doing any further analyses.

Does the START_DATE value for each patient used in the data_timeline_diagnosis.txt
represent the date of diagnosis, and is the value of START_DATE in all other files
relative to the date of diagnosis? For example, if Patient x has START_DATE = -100
in data_timeline_diagnosis.txt, does an observation for Patient x in
data_timeline_progression.txt with START_DATE = 100 mean that this observation
in data_timeline_progression.txt was 200 days after diagnosis and does an
obervation for Patient x in data_timeline_performance_status.txt with
START_DATE = 500 mean that this observation in data_timeline_performance_status.txt
was for 600 days after diagnosis?

We have observed a few cases where the START_DATE provided did not make much
sense (e.g., some patients had certain treatments before diagnosis), so we
wanted to ensure we were interpreting START_DATE across files correctly.

Thanks

Nikolaus Schultz

unread,
Nov 5, 2025, 10:50:05 AMNov 5
to Nicholas Henderson, cBioPortal for Cancer Genomics Discussion Group
Hi Nicholas,

In the MSK-CHORD data set, all times are relative to when the tumor sample was sequenced (that is time zero for each patient). So you can have events that happened before sequencing (negative time values) or after (positive). I hope this makes sense.

Niki.
 

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/4983c2d6-845e-4ad2-b9e2-4e4c62e7cd4en%40googlegroups.com.

Nicholas Henderson

unread,
Nov 12, 2025, 2:58:37 PMNov 12
to cBioPortal for Cancer Genomics Discussion Group
Hi Nikolaus,

Thanks so much for the clarification. The data makes much more sense to us with this information. 

I did want to double check about the date of diagnosis information so that we're not interpreting any of the outcomes the wrong way. Our understanding is that the date of diagnosis with cancer is contained in the data_timeline_diagnosis.txt file. So, if a patient has a value of START_DATE =  -200 in the data_timeline_diagnosis.txt file, that means that patient was first diagnosed with cancer 200 days prior to the date of tumor sequencing. Is this correct?
Reply all
Reply to author
Forward
0 new messages