[On-prems instance] Unknown sample id found

38 views
Skip to first unread message

Vincent Duchaos

unread,
Jun 10, 2025, 10:36:16 AM6/10/25
to cBioPortal for Cancer Genomics Discussion Group
Hi,

I’m importing a new study in my local (docker-compose) instance of cBioPortal.
The validation is OK, but when importing I get the JAVA error
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Unknown sample id 'TCGA_02_0086-01' found in tab-delimited file: /study/tcga_xiang/data_mrna_microarray.txt

But this ID does not appear in my study (grep -Ri TCGA_02_0086-01 study/tcga_xiang/ returns nothing).
The correct ID is TCGA_02_0086, I don’t understand where the import tool find the “-01” part.
And it’s not proper to this sample, I get the same error on the next sample if I delete 0086 sample.

I checked that my sample ids are the columns of data_mrna_microarray.txt, presents in the SAMPLE_ID column of data_clinical_sample.txt

grep -i TCGA_02_0086 tcga_xiang/* returns
tcga_xiang/data_clinical_sample.txt:beta_0086 TCGA_02_0086 0.0303286157605351
tcga_xiang/data_mrna_microarray.txt:Hugo_Symbol TCGA_02_0086


Thanks for your help.

Prasanna Jagannathan

unread,
Jun 10, 2025, 1:29:50 PM6/10/25
to Vincent Duchaos, cBioPortal for Cancer Genomics Discussion Group
Hi Vincent

Thanks for contacting cBioPortal support.

Is this a public study that you are importing? If so, please share the link to download the study.

If it is a non-public study, then you can compare it to a public study that has the same data types. It is then possible to compare your study files with the public study files and find any differences.

Please reply only to the <cbiop...@googlegroups.com> email.


thanks
Jag

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/e535fe5d-829a-4791-9c8b-452eb062a08an%40googlegroups.com.

Vincent Duchaos

unread,
Jun 12, 2025, 8:11:57 AM6/12/25
to cBioPortal for Cancer Genomics Discussion Group
Hello,
Thanks for your answer.
I compared the data format with another microarray data file, in gbm_tcga study. Then I replaced '_' by '-' in my ids (TCGA-02-0086)
I get rid of the previous error, but I get another one :
java.lang.RuntimeException: Error: Sample TCGA-02-0011-01 was previously linked to another patient, and not to beta-0011

Is it possible that samples ID are shared between studies ? Because I find TCGA-02-0011 in a gbm_tcga study previously imported in my instance.
If so, have I to keep the same link sample ID - patient ID between all my studies? 
Or am I forced to name my patient based on the sample ID ? (for example TCGA-02-0086-01A for sample TCGA-02-0086)

Thanks

Prasanna Jagannathan

unread,
Jun 13, 2025, 11:28:02 AM6/13/25
to Vincent Duchaos, cBioPortal for Cancer Genomics Discussion Group
Hi Vincent

The usual convention is that each study will have its patients named as <studyname>-p-<number> and its samples named as <studyname>-p-<number>-s-<number>

This eliminates any clashes in using the same sample ID for different patient IDs.

However if the same patient occurs in two studies, please refer to cBioPortal FAQ on how this is handled.

https://docs.cbioportal.org/user-guide/faq/#how-does-cbioportal-handle-duplicate-samples-or-sample-ids-across-different-studies


Please reply only to cbiop...@googlegroups.com

thanks
Jag

Reply all
Reply to author
Forward
0 new messages