Ambigous results from mutational data: NA vs NaN

36 views
Skip to first unread message

Matteo Pallocca

unread,
Sep 19, 2016, 5:00:02 PM9/19/16
to cBioPortal for Cancer Genomics Discussion Group
Dear all,

I am using the very nice CGDSR package in order to fetch some mutations from TCGA data. The package exploits the Cbioportal API. 

I have some doubts about the API output. 

Case Number from a list
I managed to calculate the case number of a case list through several "dirty" was (e.g. length(row.names...). Is there any field were it's explicitly saved as a number field other than the textual "case list description"?. 

NaN vs NA mutation
If we try to fetch mutation for a particular gene, I do have a mixed vector returned from getProfileData(mycgds, all_genes, study_alteration, list_id), containing mutations, NAs and NaNs. I am quite convinced that NA stand for something similar to "wild type", and NaNs to something similar to "No Call" (e.g. the gene of interest is not covered by the exome sequencing of that patient). Am I right? But I can't find nowhere where is clearly stated. 

An example: in dataset hnsc_tcga (head and neck) some cases harbour a TP53 mutation (e.g. TCGA.CN.5356.01), some of them return NA (e.g. TCGA.CN.5361.01), others return NaN (e.g. TCGA.CV.5441.01). It's very important to understand such a difference :) 

thank you so much,
Matteo

JianJiong Gao

unread,
Sep 20, 2016, 8:23:28 AM9/20/16
to matteo....@gmail.com, cbiop...@googlegroups.com
Hi Matteo,

For NA vs WT, the best practice would be using the "sequenced case list" to infer that. For example, if we return NA or NaN for a gene and a sample, it means WT if the sample is in the sequenced case list, and "not sequenced" otherwise. Please note this only applies to whole exome projects. We are currently working on supporting targeted sequencing panels.

For your first question, I am not aware a better way than length(row.names).

Best,
-JJ

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+unsubscribe@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
For more options, visit https://groups.google.com/d/optout.

Matteo Pallocca

unread,
Sep 21, 2016, 4:44:11 PM9/21/16
to cBioPortal for Cancer Genomics Discussion Group, matteo....@gmail.com


Dear Jianjiong,

thank you so much for your tips.

Actually, I thought that I was watching just to "sequenced cases". In the above example, I am reading only the mutations present in the "sequenced" list (hnsc_tcga_sequenced, 512 samples).

So if I understand correctly, when the getProfileData returns NA or NaN (in some cases I have NA, in others NaN!) and the sample is in the "sequenced" list, I can consider it wild type, in both cases? 

Furthermore: How can I be sure from the getGeneticProfiles list that I am looking at Exome-Sequenced samples and not targeted sequencing? Do I have to just grep for "exome" in the genetic profile description? 

thank you so much,
Matteo



Il giorno martedì 20 settembre 2016 14:23:28 UTC+2, Jianjiong Gao ha scritto:
Hi Matteo,

For NA vs WT, the best practice would be using the "sequenced case list" to infer that. For example, if we return NA or NaN for a gene and a sample, it means WT if the sample is in the sequenced case list, and "not sequenced" otherwise. Please note this only applies to whole exome projects. We are currently working on supporting targeted sequencing panels.

For your first question, I am not aware a better way than length(row.names).

Best,
-JJ
On Mon, Sep 19, 2016 at 5:27 AM, Matteo Pallocca <matteo....@gmail.com> wrote:
Dear all,

I am using the very nice CGDSR package in order to fetch some mutations from TCGA data. The package exploits the Cbioportal API. 

I have some doubts about the API output. 

Case Number from a list
I managed to calculate the case number of a case list through several "dirty" was (e.g. length(row.names...). Is there any field were it's explicitly saved as a number field other than the textual "case list description"?. 

NaN vs NA mutation
If we try to fetch mutation for a particular gene, I do have a mixed vector returned from getProfileData(mycgds, all_genes, study_alteration, list_id), containing mutations, NAs and NaNs. I am quite convinced that NA stand for something similar to "wild type", and NaNs to something similar to "No Call" (e.g. the gene of interest is not covered by the exome sequencing of that patient). Am I right? But I can't find nowhere where is clearly stated. 

An example: in dataset hnsc_tcga (head and neck) some cases harbour a TP53 mutation (e.g. TCGA.CN.5356.01), some of them return NA (e.g. TCGA.CN.5361.01), others return NaN (e.g. TCGA.CV.5441.01). It's very important to understand such a difference :) 

thank you so much,
Matteo

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages