- In TCGA BRCA data (Legacy data), dataset: gene expression RNAseq - IlluminaHiSeq from
https://tcga.xenahubs.net have
20,531 identifiers corresponding to about 20000 genes. However, in GDC TCGA BRCA data ( Harmonized Data), dataset: gene expression RNAseq - HTSeq - Counts from hub:
https://gdc.xenahubs.net, there are
60,489 identifiers. What is the difference between them? Why are there
60,489 identifiers?
- In TCGA BRCA data (Legacy data), I can get
MC3 gene-level non-silent mutation (somatic mutation (SNP and INDEL)). However, there is no gene-level non-silent mutation data in GDC TCGA BRCA. How can I get this type of data from GDC TCGA BRCA?
- Legacy data vs Harmonized Data. Which dataset should I use for analysis?
Thanks.