Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

MSK-CHORD sampling surgery times

40 views
Skip to first unread message

Andi L.

unread,
Mar 26, 2025, 12:08:04 PMMar 26
to cBioPortal for Cancer Genomics Discussion Group
Dear all,

in the (awesome) MSK-CHORD dataset there is a table called "data_timeline_surgery". As I understand, it contains timepoints of surgeries, annotated with "SAMPLE" when the surgery produced a sample to be sequenced with IMPACT, and annotated with "PROCEDURE" for other, purely medical events. 

Often, more than only the surgery timepoint of the CHORD-sample(s) are listed. I wonder if there is some way to link other surgery timepoints that are annotated with "SAMPLE" to specific sample IDs of non-CHORD IMPACT samples. 

For instance, P-0000585 contributes one  primary IDC sample (T01-IM3) to MSK-CHORD, whose surgery time is listed in the table. However, there are 4 additional "SAMPLE"-annotated surgery timepoints in the table, and the patient contributes one non-CHORD metastatic sample (T02-IM6) to the latest AACR GENIE release. Is there any way to tell which of the 4 surgeries produced the non-CHORD sample?

That would be amazing and help me out a lot.

Best wishes and many thanks again to MSK for releasing this great dataset!
Andi

Pieter Lukasse

unread,
Mar 31, 2025, 7:22:34 AMMar 31
to Andi L., cBioPortal for Cancer Genomics Discussion Group
Hi Andi,

Thanks for the great question. One option you have is to take a look at the study data files. You can go to the Study view and click "download" button next to the study name (top left of the page). This will download a tar.gz file with the study files. 

I looked around at the files myself and found the following lines, which seem to give some indirect link (based on the same "START_DATE" == 622) between a surgery and a sample that happens to have the same suffix you mentioned (T02-IM6). However, this is all on a different patient (P-0002880):  

data_timeline_specimen_surgery.txt
PATIENT_ID      START_DATE      STOP_DATE       EVENT_TYPE      SUBTYPE SAMPLE_ID       SEQ_DATE
P-0002880       622             Sample acquisition              P-0002880-T02-IM6       746

data_timeline_surgery.txt
PATIENT_ID      START_DATE      STOP_DATE       EVENT_TYPE      SUBTYPE
P-0002880       622             SURGERY PROCEDURE

Given the mismatch on patient ids between your question and my response, I'm not sure if this helps you. Please let me know.

Best,

Pieter



--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/c86bd6ab-2389-4fa4-9d78-b26f0ffaa393n%40googlegroups.com.

Andi L.

unread,
Mar 31, 2025, 10:12:22 AMMar 31
to cBioPortal for Cancer Genomics Discussion Group
Hi Pieter,

thank you very much for your help here. Indeed as your example shows, combining these two tables allows us to match specific surgery dates to sample acquisition dates. However, exclusively within-CHORD samples have sample acquisition dates and actually not all of these (only about 80% according to my calculations) have a surgery entry on the same day in the other table. Further, about half of these surgery entries that match the sample acquisition dates are not annotated with "SAMPLE" but instead annotated with "PROCEDURE", so I'm not happy anymore with my reasoning from the initial post that "SAMPLE"-annotated surgery entries likely correspond to sample acquisition dates.

Considering all this, my unfortunate conclusion is that without further data, linking specific non-CHORD IMPACT samples to the timelines in CHORD is generally impossible. If anyone knows better, please let us know!

Best,
Andi
Reply all
Reply to author
Forward
0 new messages