Source of the UCSC Data

54 views
Skip to first unread message

david.p...@gmail.com

unread,
Aug 3, 2016, 11:36:07 AM8/3/16
to UCSC Xena and Cancer Genomics Browser
Hello,

Our group is working with the KIRP gene expression(IlluminaHiSeq) data and KIRP gene-level mutation (broad curated) data. Within these downloaded sets, we are using the KIRP clinical data as well.

I am aware that these data are from the TCGA. However, can you please explaoin the following:

How is this data derived from the TCGA?

Are these data any different from data downloaded directly the TCGA portal?

Is this original TCGA data as described in the TCGA's publication on the KIRP data (Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med. 2016 Jan 14;2016(374):135-45.)?

Your help is appreciated,

David

Jing Zhu

unread,
Aug 3, 2016, 8:14:14 PM8/3/16
to UCSC Xena and Cancer Genomics Browser, david.p...@gmail.com
>How is this data derived from the TCGA? 
All phenotype data or clinical data are automatically download from TCGA DCC and processed. Our team curated several extra phenotype data:  overall survival, primary_site, primary_disease . 

>Is this original TCGA data as described in the TCGA's publication on the KIRP data (Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med. 2016 Jan 14;2016(374):135-45.)? 
We did not go to the publication web site and manually download the data specifically generated by the paper. 

Also, new data is only released to UCSC Xena, http://xena.ucsc.edu . The last data update on Cancer Browser is in Feb 2015.  The bulk downloaded data should have the same format. 

Also please note that TCGA DCC has been replaced with GDC see https://tcga-data.nci.nih.gov/docs/publications/tcga/?

Jing

Jing Zhu

unread,
Aug 5, 2016, 2:15:52 AM8/5/16
to UCSC Xena and Cancer Genomics Browser, david.p...@gmail.com
Hi David,


from there, you can find the link to "phenotype" dataset page:
https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.KIRP.sampleMap/KIRP_clinicalMatrix&host=https://tcga.xenahubs.net

Click on the link "all identifiers" : https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?host=https%3A%2F%2Ftcga.xenahubs.net&dataset=TCGA.KIRP.sampleMap%2FKIRP_clinicalMatrix&label=Phenotypes&allIdentifiers=true

The last url is the list of "identifiers" phenotype variable names we use in Xena. All the variables start with "_" like "_OS" for overall survival time variable are xena curated such as the following list.

_OS
_OS_IND
_OS_UNIT
_PATIENT
_RFS
_RFS_IND
_RFS_UNIT
_TIME_TO_EVENT
_TIME_TO_EVENT_UNIT
_cohort
_primary_disease
_primary_site

 Hope this is helpful.

Jing

>Hi, 
>
>Thanks so much. Can you further elaborate on what you mean by "Our team curated several extra phenotype data:  overall survival, primary_site, primary_disease"? And are there any other curated phenotype fields?
>
>Also, what I am trying to ultimately understand is whether or not phenotype data downloaded from the UCSC is different in anyway from data downloaded directly from the TCGA portal (aside from the curated fields described above). 
>
>Thank you,
>David


On Wednesday, August 3, 2016 at 8:36:07 AM UTC-7, david.p...@gmail.com wrote:

Jing Zhu

unread,
Aug 5, 2016, 3:20:17 PM8/5/16
to UCSC Xena and Cancer Genomics Browser, david.p...@gmail.com
On the UCSC Xena site, the dataset detailed page includes information on the "unit" of values in the dataset, same values are stored in the download files. 


In this case, the values is log2(normalized_count+1).  Mean-centering (or subtract column mean) are applied dynamically on the fly, actual data in the database and in the files are in unit shown in the dataset detailed page. 

We would appreciate if you can cite or acknowledge "UCSC Xena"  ( http://xena.ucsc.edu ),  if you use the browser or its underlying data for your research and publications.  Thanks. 

Jing

>This is very helpful thank you so much. My last question is the following: When I downloaded IlluminaHiSeq KIRP gene expression data directly from the UCSC cancer browser, is >this data that is already mean centered? I know that the view can be toggled in the cancer browser between mean centered and non-mean centered, but is the data downloaded >mean centered?

>Thank you,
>David

Reply all
Reply to author
Forward
0 new messages