what does provisional data means

Benjamin

unread,

Jun 27, 2017, 6:57:47 PM6/27/17

to cBioPortal for Cancer Genomics Discussion Group

Hi guys,

When you search a gene, you always get a provisional dataset and a publication dataset, and it seems the provisional dataset has more cases than the published one. What's the difference between these two type of data?

If i want to start an analysis, which dataset should i use?

I would appreciate any help,

Thanks

Ben

Nikolaus Schultz

unread,

Jun 28, 2017, 9:09:49 AM6/28/17

to cbiop...@googlegroups.com, Benjamin

Hi Ben,

The provisional data set for each TCGA cancer type contains all data available from the Broad Firehose. The publication data sets reflect the data that were used for each of the publications. This is usually a subset of the provisional data, since manuscripts were often written before TCGA completed their goal of sequencing 500 tumors.

There can be differences between provisional and published data. For example, the mutation data in the publication usually underwent more QC, and false positives might have been removed or, in rare cases, false negatives added. RNA-Seq and copy-number values may also differ slightly, as different versions of analysis pipelines could have been used. And the clinical data for the publication is probably also of higher quality or may contain a few more data elements, sometimes derived from the genomic data (e.g., genomic subtypes).

I usually recommend to use the published data set - it also has the advantage that it is static and will not change in the future.

Niki.

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
For more options, visit https://groups.google.com/d/optout.

CB

unread,

Jul 3, 2017, 1:58:30 PM7/3/17

to cBioPortal for Cancer Genomics Discussion Group

Provisional dataset includes the samples/data from the publication plus additional samples/data since the publication. The provisional is useful because of the additional samples and data.

Reply all

Reply to author

Forward