Re: FAQ: where did we obtain the PAM50 calls on the TCGA breast cancer cohort?

2,409 views
Skip to first unread message

Jing Zhu

unread,
Jul 1, 2013, 12:46:52 AM7/1/13
to ucsc-cancer-ge...@googlegroups.com

On Friday, May 3, 2013 11:26:28 AM UTC-7, Jing Zhu wrote:

"PAM50 array":  The PAM50 calls for microarrays are available from the Nature 2012 paper associated Supplemental table 1.  We obtained the PAM50 calls (based off the Agilent 244K custom gene expression microarrays), as well as ER, HER, PR status calls from the paper, which were used to create  the view the PAM50 profile in TCGA breast cancer data. Click here to  see the view. 


"PAM50 RNAseq": We have also included preliminary PAM50 calls based on the Illumina HiSeq 2000 RNA Sequencing platform from the TCGA Analysis Working Group (AWG). Note that these calls are not final and are subject to
change. 

The PAM50 calls is based on centroid correlation distance to LumA, LumB, Basal, Her2-E, normal-like centroid and a sample could be nearly equally close to more than one centroid, so such samples can flip sometimes.    For example flipping between LumA and LumB or flipping between luminal B and HER2-E happens sometimes.


cora...@gmail.com

unread,
Jan 9, 2014, 3:09:41 PM1/9/14
to ucsc-cancer-ge...@googlegroups.com, bsa...@tgen.org, sna...@tgen.org
Dear Dr.Zhu,

We have some RNA-Seq samples that we would like to do subtype classifications.
Can I access the Pam50 RNA-Seq training data that was used for your analysis?

Best,
Sara

Jing Zhu

unread,
Jan 9, 2014, 8:03:19 PM1/9/14
to ucsc-cancer-ge...@googlegroups.com, bsa...@tgen.org, sna...@tgen.org, cora...@gmail.com
Dear Sara.

The original PAM50 data with the PAM50 publication can be found at UNC website: https://genome.unc.edu/pubsup/breastGEO/clinicalData.shtml  under "JS Parker et al., Journal of Clinical Oncology 27(8):1160-7 (2009)" . I believe this is trained on microarray data, not RNAseq data. It includes both the data, and the predictors.

On our browser, we download the nature publication's PAM50 classification which uses microarray data. They are displayed on default. There are also a version of UNC's pam50 classification on using the RNAseq data available on the browser.  If you are interested in using the TCGA data as training data, you can download the genomic data in bulk then further process.  You can match the array data with array driven classification and RNAseq data with RNAseq driven classification.  
 
Jing

hotti...@gmail.com

unread,
Feb 2, 2014, 8:28:04 PM2/2/14
to ucsc-cancer-ge...@googlegroups.com, bsa...@tgen.org, sna...@tgen.org, cora...@gmail.com
Hi Dr. Zhu,

I download the TCGA Breast Cancer data from the UNC web you posted. In the file "clinical_data.", there are three columns about PAM50 classification results.What are the difference between them?
1) The first one is "PAM50Call", is this based on TCGA Microarray data set?
2) The second one is "PAM50_mRNA_nature2012". Is this the PAM50 results
published in TCGA BRCA nature paper based on Microarray data? So what is the
difference between "PAM50Call" and "PAM50_mRNA_nature2012"?
3) The third one is "PAM50Call_RNAseq". Is this based on TCGA RNA-seq data.

I also read the original PAM50 paper (JS Parker et al)from your UNC web.
1) The PAM50 predictor looked like the optimized centroids for 5 BRCA subtypes,
whose subtype information are obtained by previous unsupervised cluster
methods. So when we want to cluster a new sample, we only need to compare the
distance between the new sample and the centroids of 5 subtypes (PAM
predictor). Is my understanding right?
2) Can I use the PAM50 predictors download from UNC to predict a new sample
whose data set is by Microarray but different platform/protocol from those
used in Parker's paper?
3) How do you apply the PAM50 predictor on TCGA RNASeq data? Do you use some
RNASeq data samples (whose subtype information is known) to train new PAM50
predictor for RNASeq data set?

Thank you and have a nice day.

Feng


在 2014年1月9日星期四UTC-5下午8时03分19秒,Jing Zhu写道:

Jing Zhu

unread,
Feb 4, 2014, 2:46:09 PM2/4/14
to ucsc-cancer-ge...@googlegroups.com, bsa...@tgen.org, sna...@tgen.org, cora...@gmail.com, hotti...@gmail.com
Hi Feng,

I download the TCGA Breast Cancer data from the UNC web you posted. In the file "clinical_data.", there are three columns about PAM50 classification results.What are the difference between them?

There are four columns in the bulk download file with reference to PAM50.

- Integrated_Clusters_with_PAM50__nature2012 : from Nature 2012, by a method called "cluster of clusters" using subtype calls from several datatypes individually, including whole genome mRNA expression and PAM50 calls using the 50 genes only.  

- PAM50Call : this call was made by TCGA AWG analysis working group using array data before the nature 2012 publication. After the paper was out, we DEPRECATED this column. It has been replaced with "PAM50_mRNA_nature2012" column. We kept the deprecated column in our database only for the reason that our "old" bookmarks will still function. (if a bookmark was built in the past using a column that is removed, that bookmark will fail).   Thank you for pointing this out, we will remove all deprecated columns from future data bulk downloads. 

- PAM50Call_RNAseq :  PAM50 call based on RNAseq data (we obtained this from TCGA breast AWG, analysis working group). This is not part of the nature 2012 publication. 

- PAM50_mRNA_nature2012 :  PAM50 call from the TCGA breast nature 2012 publication


1) The first one is "PAM50Call", is this based on TCGA Microarray data set?

see above.

2) The second one is "PAM50_mRNA_nature2012". Is this the PAM50 results
   published in TCGA BRCA nature paper based on Microarray data?
Yes.
 
So what is the
   difference between "PAM50Call" and "PAM50_mRNA_nature2012"? 
PAM50_mRNA_nature2012 is the one from the paper.  PAM50Call is work in progress while the group was working on the paper.

3) The third one is "PAM50Call_RNAseq". Is this based on TCGA RNA-seq data.

 Yes.

I also read the original PAM50 paper (JS Parker et al)from your UNC web.
1) The PAM50 predictor looked like the optimized centroids for 5 BRCA subtypes,
   whose subtype information are obtained by previous unsupervised cluster
   methods. So when we want to cluster a new sample, we only need to compare the  
   distance between the new sample and the centroids of 5 subtypes (PAM  
   predictor). Is my understanding right?   
2) Can I use the PAM50 predictors download from UNC to predict a new sample
   whose data set is by Microarray but different platform/protocol from those
   used in Parker's paper?  

3) How do you apply the PAM50 predictor on TCGA RNASeq data? Do you use some  
   RNASeq data samples (whose subtype information is known) to train new PAM50
   predictor for RNASeq data set?

In my opinion, genomic profiles developed from a specific array is best to be apply to the same platform.  Although the true biological profile is the same, the measurement of those profiles are different between platforms.  If your data is from a drastically different platform, like RNAseq, one needs to rebuild the classifier using the new data, or to find supports that the profile is across-platform.  You could contact the UNC group about the extendability of the PAM50 classifiers from their publication. 

Best. Hope this is helpful. 

Jing

jinyu...@stonybrook.edu

unread,
Jun 2, 2014, 3:29:08 PM6/2/14
to ucsc-cancer-ge...@googlegroups.com

Dear Dr. Zhu,

Thanks for your kind reply. I am wondering where did you get TCGA breast AWG PAM50 calls based on mRNAseq data? I could not find it on TCGA webpage. Or could you please send me a copy of it? The most recent the best.

Thank you very much!

Jinyu

Mary Goldman

unread,
Jun 2, 2014, 6:59:33 PM6/2/14
to jinyu...@stonybrook.edu, ucsc-cancer-ge...@googlegroups.com
Hi Jinyu,

You can download the most recent AWG PAM50 calls based on the RNAseq
from our website. First, add the "PAM50 subtype from RNAseq data (TCGA
AWG)" feature to the heatmap. Next, click on the "Tools" button and
choose "Download". For all the samples in the dataset choose the
option "Clinical data in cohort ".

Best,
Mary
-------------
Mary Goldman
UCSC Cancer Browser
https://genome-cancer.ucsc.edu/
> --
> You received this message because you are subscribed to the Google Groups "UCSC Cancer Genomics Browser" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ucsc-cancer-genomics...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hyun Jung Park

unread,
Jul 11, 2014, 10:57:08 PM7/11/14
to ucsc-cancer-ge...@googlegroups.com, jinyu...@stonybrook.edu
Dear Mary, 

While trying to obtain a good stratification of breast tumor subtypes, 
I pulled this thread, and successfully download your AWG PAM50 for my project. 
My question is how I can cite your data, because I am now writing a paper with this. 

Regards, 
HJ. 

> To unsubscribe from this group and stop receiving emails from it, send an email to ucsc-cancer-genomics-browser+unsub...@googlegroups.com.

Mary Goldman

unread,
Jul 14, 2014, 12:51:45 PM7/14/14
to Hyun Jung Park, ucsc-cancer-ge...@googlegroups.com, Jinyu Li
Hi Hyun,

Please cite our paper here: Goldman M, Craft B, Swatloski T, Ellrott
K, Cline M, Diekhans M, Ma S, Wilks C, Stuart J, Haussler D, Zhu J.
The UCSC Cancer Genomics Browser: update 2013. Nucleic Acids Research
2012; doi: 10.1093/nar/gks1008.

We also recommend citing TCGA since they generated the raw data. They
discuss how they would like their work to be cited at the bottom of
this page: http://cancergenome.nih.gov/publications/publicationguidelines.

Best,
Mary
-------------
Mary Goldman
UCSC Cancer Browser
https://genome-cancer.ucsc.edu/


> > To unsubscribe from this group and stop receiving emails from it, send an email to ucsc-cancer-genomics...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "UCSC Cancer Genomics Browser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to ucsc-cancer-genomics...@googlegroups.com.

qrious.da...@gmail.com

unread,
Nov 4, 2014, 12:03:31 PM11/4/14
to ucsc-cancer-ge...@googlegroups.com, jinyu...@stonybrook.edu
Dear Mary,
In the dataset (gene expression by RNAseq (IlluminaHiSeq)), I do not find PAM50_RNAseq calls for most of the samples!
How can I obtain AWG PAM50 calls based on the RNAseq for all samples?
Thanks

--
Best
Dr. Singh

Mary Goldman

unread,
Nov 4, 2014, 12:28:48 PM11/4/14
to qrious.da...@gmail.com, ucsc-cancer-ge...@googlegroups.com, Jinyu Li
Hi Dr. Singh,

You will need to contact the TCGA AWG to see if they have made calls on the rest of the samples in the cohort. I recommend contacting TCGA directly: https://tcga-data.nci.nih.gov/tcga/tcgaContact.jsp and they will forward you on to the AWG.


Best,
Mary
-------------
Mary Goldman
UCSC Cancer Browser
https://genome-cancer.ucsc.edu/

---------- Forwarded message ----------
From: <qrious.da...@gmail.com>
Date: Tue, Nov 4, 2014 at 9:03 AM
Subject: Re: [ucsc-cancer-genomics-browser] Re: FAQ: where did we obtain the PAM50 calls on the TCGA breast cancer cohort?
To: ucsc-cancer-ge...@googlegroups.com
Cc: jinyu...@stonybrook.edu


fluo...@gmail.com

unread,
Jan 14, 2015, 4:09:39 AM1/14/15
to ucsc-cancer-ge...@googlegroups.com
Hi,

I just started some work using TCGA. I downloaded the TCGA MicroArray data using TCGA website, and the clinical data using TCGA assembler. As far as I can see there is no PAM50 data in these files. When I run Genefu's intrinsic.cluster.predict using PAM50. I get huge discrepancies with the PAM50 from the supplementary table from the original TCGA nature paper.

My question is , does anyone know how they made those PAM50 calls?

thanks,
Wouter

Mary Goldman

unread,
Jan 14, 2015, 5:25:15 PM1/14/15
to fluo...@gmail.com, ucsc-cancer-ge...@googlegroups.com
Hi Wouter,

I would recommend contacting TCGA directly at https://tcga-data.nci.nih.gov/tcga/tcgaContact.jsp.

Best,
Mary
-------------
Mary Goldman
UCSC Cancer Browser
https://genome-cancer.ucsc.edu/

---------- Forwarded message ----------
From: <fluo...@gmail.com>
Date: Wed, Jan 14, 2015 at 1:09 AM
Subject: [ucsc-cancer-genomics-browser] Re: FAQ: where did we obtain the PAM50 calls on the TCGA breast cancer cohort?
To: ucsc-cancer-ge...@googlegroups.com


Reply all
Reply to author
Forward
0 new messages