Control Tissue in OV RNAseqV2? Fallopian Tube vs. Ovary

98 views
Skip to first unread message

AllyUIC

unread,
Aug 2, 2015, 9:07:03 PM8/2/15
to cBioPortal for Cancer Genomics Discussion Group
Hi all,

I'm new to RNAseq, bioinformatics, programming, etc., so please excuse my naivety on the subject...

As the literature seems to support both the fallopian tube and ovary as possible tissues of origin for ovarian cancer, I was wondering which tissue the TCGA OV database was using as a control to normalize its data, particularly the RNAseq data. I would like to normalize the raw tumor mRNA expression levels to both a fallopian tube and ovarian normal tissue if the raw data exists for both these tissues in order to compare the two.

Any help would be appreciated! Thanks!

Nikolaus Schultz

unread,
Aug 3, 2015, 4:05:15 AM8/3/15
to cbiop...@googlegroups.com, AllyUIC
Hi,

The TCGA RNA-Seq expression data is not normalized to any tissue - these are simply the counts per tumor sample normalized by gene length and library size (RPKM and RSEM).
The only exception here is the Agilent expression data (available for the early projects, such as GBM, ovarian, lung squamous), where each tumor sample was hybridized again a normal control sample (universal reference cDNA. a mix of different tissues).

Each TCGA project has a few normals included, you can download that data directly from the TCGA website. The samples have a -11 in their barcode at TCGA-xx-xxxx-01x. For the ovarian cancer project, there are a couple of normal fallopian tube samples.

Niki.

 

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ally Young

unread,
Aug 3, 2015, 7:54:24 PM8/3/15
to Nikolaus Schultz, cbiop...@googlegroups.com
Thanks Niki! I was initially using the z-scores downloaded from the TCGA provisional site, which I thought were calculated using values from normal tissue. 

I've been attempting to download the other data you referenced, but have been having a little bit of trouble figuring out exactly how to do it. I downloaded the TCGA Assembler recently and have been playing around with that. 

Do you know if there is a protocol or description anywhere of for how to calculate the RPKM or RSEM. For example, which database of gene lengths were used for the TCGA calculations. 

Again, thank you for all your help! Ally
--
Alexandria N. Young
UIC MSTP | M2 MD/PhD Candidate
UIC COM SET Rep. | C&TB, I&M Courses
UIC MSTP SAC Member
TPA Conference Director | Epsilon, Zeta, Beta Alpha

Nikolaus Schultz

unread,
Aug 4, 2015, 4:02:36 AM8/4/15
to Ally Young, cbiop...@googlegroups.com
Hi Ally,

Our z-scores are always calculated relative to the subset of samples that are diploid for a given gene (so each gene gets a different background population). 

I suggest you download normalized expression from the Broad’s Firehose website  You will get one matrix of normalized expression data per study, which should include normal samples (TCGA-xx-xxxx-01…) - . See for example this link for TCGA breast cancer:
In this folder
you can find this file:

RPKM and RSEM calculations are probably best explained in the supplementary materials of the various TCGA papers - but I may be wrong and others on this board may have a better answer?

I hope this is helpful.

Niki.

Ally Young

unread,
Aug 4, 2015, 4:18:44 PM8/4/15
to Nikolaus Schultz, cbiop...@googlegroups.com
Just to be clear, that subset of diploid tissue for the z-score calculation is always tumor?

Thank you for the Broad Institute's page! I was able to download the normalized gene count. Do you happen to know how these values were normalized?

Regarding RPKM, I believe the resource contained on this page likely details their protocol... https://wiki.nci.nih.gov/pages/viewpage.action?pageId=71439191

Sorry for all the questions and thank you so much for all your help!
Alexandria N. Young
UIC MSTP | G1 MD/PhD Candidate |  SAC Executive Board
UIC COM | SET Macro-Curricular Rep. | PEP Tutor

Nikolaus Schultz

unread,
Aug 4, 2015, 4:30:59 PM8/4/15
to Ally Young, cbiop...@googlegroups.com
Correct… the reference for the z-scores is always tumors, and a different subset for each gene.

Niki
Reply all
Reply to author
Forward
0 new messages