Doubt about some samples

25 views
Skip to first unread message

Gerardo Ramirez Mejia

unread,
Nov 19, 2025, 10:18:31 AM (9 days ago) Nov 19
to cbiop...@googlegroups.com
Hello.
I downloaded some data from the portal, and at the same time I dowloaded the same project (TCGA - BRCA) using the TCGA Biolinks in R.
When comparing the data, I noticed that in your curated database some samples from TCGA Biolinks in R are not included.
I noticed the samples you don't include have barcodes like these:
TCGA-A7-A26E-01B-06R-A277-07
TCGA-A7-A0DB-01C-02R-A277-07
TCGA-A7-A0DC-01B-04R-A22O-07
TCGA-A7-A13D-01A-13R-A12P-07
TCGA-A7-A13D-01B-04R-A277-07
TCGA-A7-A13E-01A-11R-A12P-07
 (barcodes obtained from the TCGA Biolinks, these are just examples)
The most of the time the samples with codes 01B or 01C were not included, meanwhile samples with codes 01A have a lower exclusion rate.

In the portal you mentioned the process of data curation but I was wondering if there is a special cause that lead you to exclude the samples. I hope this question is appropriate.

Thank you in advance.

Greetings
Gerardo

Prasanna Jagannathan

unread,
Nov 20, 2025, 10:25:12 AM (8 days ago) Nov 20
to Gerardo Ramirez Mejia, cbiop...@googlegroups.com
Hi Gerardo

Thanks for emailing the cBioPortal team.

Is the TCGA-BRCA dataset from the below cancer.gov website?


If not, can you please share the link where these additional samples are available?

Please reply only to "cbiop...@googlegroups.com" <cbiop...@googlegroups.com> email address.

thanks
Jag



--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/LV2PR22MB3608BDDFDACA66EF4115E9D386D6A%40LV2PR22MB3608.namprd22.prod.outlook.com.

Nikolaus Schultz

unread,
Nov 20, 2025, 11:52:55 AM (8 days ago) Nov 20
to Prasanna Jagannathan, Gerardo Ramirez Mejia, cbiop...@googlegroups.com
Hi Gerardo,

For TCGA samples, cBioPortal only includes one sample per patient. That’s why the additional samples you are referring to are missing. For that reason we also truncate the sample IDs to only show the disease / center code (two digits or letters) and the patient ID code (four digits or letters), i.e. TCGA-xx-xxxx.

Niki.


Gerardo Ramirez Mejia

unread,
Nov 20, 2025, 12:48:09 PM (8 days ago) Nov 20
to Nikolaus Schultz, Prasanna Jagannathan, cbiop...@googlegroups.com
Dear Nikolaus

Thank you for your response. Now I feel more confident to process the downloaded data.

Greetings
Gerardo 

El 20 nov 2025, a la(s) 10:52, Nikolaus Schultz <nsch...@gmail.com> escribió:

 Hi Gerardo,

Gerardo Ramirez Mejia

unread,
Nov 20, 2025, 12:48:14 PM (8 days ago) Nov 20
to Prasanna Jagannathan, cbiop...@googlegroups.com
Dear Prasanna.

Thanks a lot for your kindly response.

The additional samples were observed using the script of TCGA Biolinks in Rstudio.

Greetings
Gerardo 

El 20 nov 2025, a la(s) 9:25, Prasanna Jagannathan <jagn...@gmail.com> escribió:


Reply all
Reply to author
Forward
0 new messages