Thank you for this information.
A few more questions for one or both of you:
I think for the Firehouse Legacy datasets, "all samples" or "diploid samples" refers to the samples within the given study (i.e. tumor type). However, for the PanCancer Atlas, is it true that some of the datasets have been normalized across all the studies included in the PanCancer Atlas. Perhaps it is just the dataset titled "mRNA Expression, RSEM (Batch normalized from Illumina HiSeq_RNASeqV2)" that was normalized in this way, but I am not 100% sure. From what I understand, the "normal samples" refers to adjacent normal tissue samples within the given tumor type (so not consistent across tumor types, but useful since it is relative to normal tissue and not other tumors).
Which are the mRNA datasets that are normalized against all the studies, and which are the ones normalized within that trial?
We're thinking that if we use data normalized across all the studies, that we will be able to sort for relative high expression of our gene, across tumors.
Thanks very much,
Jim