GDC TCGA BRCA

136 views

Skip to first unread message

Richard

unread,

Jan 13, 2023, 12:31:48 PM1/13/23

to UCSC Xena and Cancer Genomics Browser

I have some questions about GDC TCGA BRCA data vs TCGA BRCA:

- In TCGA BRCA data (Legacy data), dataset: gene expression RNAseq - IlluminaHiSeq from https://tcga.xenahubs.net have 20,531 identifiers corresponding to about 20000 genes. However, in GDC TCGA BRCA data ( Harmonized Data), dataset: gene expression RNAseq - HTSeq - Counts from hub: https://gdc.xenahubs.net, there are 60,489 identifiers. What is the difference between them? Why are there 60,489 identifiers?
- In TCGA BRCA data (Legacy data), I can get MC3 gene-level non-silent mutation (somatic mutation (SNP and INDEL)). However, there is no gene-level non-silent mutation data in GDC TCGA BRCA. How can I get this type of data from GDC TCGA BRCA?
- Legacy data vs Harmonized Data. Which dataset should I use for analysis?

"The "legacy" gene expression data refers to the original processed data (the gene expression analysis methods, genome reference and gene models used may differ between cancer types/projects). The harmonized data was produced by the GDC by reprocessing the data using a single analysis pipeline." Is that true??

Thanks.

Mary Goldman

unread,

Jan 19, 2023, 2:14:49 PM1/19/23

to Richard, UCSC Xena and Cancer Genomics Browser

Hi Richard,

Apologies for the delay in my reply! Please see inline below for my answers. If you have any further questions, please email us at genome...@soe.ucsc.edu

Best,

Mary

-----

Mary Goldman (she/her), Design and Outreach Engineer

UCSC Xena

UC Santa Cruz Genomics Institute

Revealing life's code

---------- Forwarded message ---------
From: Richard <ooc...@gmail.com>
Date: Fri, Jan 13, 2023 at 9:31 AM
Subject: [ucsc-cancer-genomics-browser] GDC TCGA BRCA
To: UCSC Xena and Cancer Genomics Browser <ucsc-cancer-ge...@googlegroups.com>

I have some questions about GDC TCGA BRCA data vs TCGA BRCA:

This is because the GDC mapped to a different set of genes (one with 60,489 genes/transcripts) than the legacy TCGA data. The legacy TCGA data mapped to a set of 20,531 genes.

- In TCGA BRCA data (Legacy data), I can get MC3 gene-level non-silent mutation (somatic mutation (SNP and INDEL)). However, there is no gene-level non-silent mutation data in GDC TCGA BRCA. How can I get this type of data from GDC TCGA BRCA?

Unfortunately the GDC does not provide gene-level non-silent mutation data. You can contact the GDC with any questions or comments you might have here: https://gdc.cancer.gov/support.

- Legacy data vs Harmonized Data. Which dataset should I use for analysis?

Yes. Again, you can contact the GDC with any questions or comments you might have here: https://gdc.cancer.gov/support.

Thanks.

--
You received this message because you are subscribed to the Google Groups "UCSC Xena and Cancer Genomics Browser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ucsc-cancer-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ucsc-cancer-genomics-browser/b6217677-00e6-4efc-bed3-4f37afb7fdfdn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages