questions about the c-bioportal api

17 views
Skip to first unread message

Kun-Lin Ho

unread,
Jul 26, 2021, 2:54:16 PMJul 26
to cbiop...@googlegroups.com
Hi, my name is Kun-Lin Ho, a bioinformatics graduate student at the University of Georgia.

Thanks for developing a very useful API to make access data more easily. 

I have a few questions about the c-bioportal API.

On the website, we can select all of the studies and query the gene to see the mutations of all samples. (Somatic mutation, CNA and other mutations)

The figure is showed as follows (in this case, the gene I use to query is PIK3CA. 

The first question is :
Is it possible I can do the same thing using C-bioportal API in R (query by the gene, and see what samples have this gene mutated)?
If API in R can do this, could you please tell me what function I should use? 


I found we can choose Curated set of non-redundant studies (total 188 studies) on the website, but when we use API to get studies in c-bioportal, the results will show all of the studies (current 314)
My second question is :
Is there a function in API in R that we can also choose Curated set of non-redundant studies rather than all studies?


My final question is:
If I want to download and analyze all of the mutation data that are currently deposited in the C-bioportal, do you have any recommendations ?
(eg. using datahub or using API and extract data by querying each study and each gene) 


Many thanks
Sincerely

Kun-Lin Ho


Ino de Bruijn

unread,
Jul 27, 2021, 5:17:03 PMJul 27
to Kun-Lin Ho, cBioPortal for Cancer Genomics Discussion Group
Dear Kun-Lin Ho,

Thanks so much for reaching out! Great questions!

>The first question is :
> Is it possible I can do the same thing using C-bioportal API in R (query by the gene, and see what samples have this gene mutated)?
> If API in R can do this, could you please tell me what function I should use? 

You can indeed use the R API to do this. Note that the R API client currently only supports pulling the data and not actually making the visualization itself. I don't have an example available immediately but have you seen the R API tutorial video? If not, that might help to figure out how you can reproduce certain API calls done on the website:


Let me know if you would like more help with this one and I can try to put together some example code

>My second question is :
> Is there a function in API in R that we can also choose Curated set of non-redundant studies rather than all studies?

Unfortunately there is not but you can find the list of non-redundant cancer study ids here:


I would copy+paste those study ids into your R code for now.

My final question is:
> If I want to download and analyze all of the mutation data that are currently deposited in the cBioPortal, do you have any recommendations ?
> (eg. using datahub or using API and extract data by querying each study and each gene) 

The cBioPortalData library can download all study data, so that's one option. The other option is to download all the mutation data from datahub. The API is better for getting info on a couple of samples and one or more genes rather than all genes/samples.

Hope that helps!

Best wishes,
Ino

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/856E1A1C-6F0C-4EF5-91D1-BFF5AEBB60BC%40gmail.com.

abc7...@gmail.com

unread,
Jul 27, 2021, 9:59:35 PMJul 27
to Ino de Bruijn, cBioPortal for Cancer Genomics Discussion Group

Dear Ino:

 

Thanks for much for your reply and for providing some useful tips for my questions.

 

Yes, I already watched the C-bioportal API tutorial, but I am still unfamiliar with how to use API to do the same thing as I did on the website.

Maybe I need more examples to get more understandings of how C-bioportal API works.   

So if you can provide more examples of how to use C-bioportal API in R ( such as query by the gene, and see what samples have this gene mutated as we see on the website), that will be very helpful.

https://www.cbioportal.org/results/mutations tab_index=tab_visualize&Action=Submit&session_id=60e4dbfee4b015b63e9f3d8b&plots_horz_selection=%7B%7D&plots_vert_selection=%7B%7D&plots_coloring_selection=%7B%7D

 

Also, thanks so much for suggesting how to get data using cBioPortalData library.

I believe that way will be a very convenient way to extract the data for each dataset.

For this one, I have a quick question about getting data.

 

cBioDataPack("laml_tcga")  will extract the data from laml_tcga(studyID) and I can loop through all of the studiesId to get all of the data.

I wonder if I can get all of the data directly(from all of the studies) without looping through?

 

 

Thanks for your time and for your help.

I really appreciate it.

mram...@gmail.com

unread,
Jul 29, 2021, 9:32:57 PMJul 29
to cBioPortal for Cancer Genomics Discussion Group
Hi Kun-Lin Ho, 

I am the dev of the cBioPortalData R package.
Ino reached out to me for assistance and I'm happy to answer as best I can. 

1. We don't have a facility currently for extracting data across datasets without using a for loop or lapply. 

2. I am not aware of an endpoint that gives you the list of non-redundant cancer study ids. 

3. I would recommend using the main cBioPortalData function in the package:

suppressPackageStartupMessages({
    library(cBioPortalData)
})
cbio <- cBioPortal()
acc <- cBioPortalData(cbio, "acc_tcga", genes = 5290,
    molecularProfileIds = "acc_tcga_mutations")
#> harmonizing input:
#>   removing 91 colData rownames not in sampleMap 'primary'
assay(acc)
#>      TCGA-OR-A5LK-01    
#> 5290 "Missense_Mutation"

This gives you the same result as this: 

Because we are using Bioconductor, you would have to familiarize yourself with the SummarizedExperiment and
MultiAssayExperiment classes, if you're not already. 

4. We don't have a way to get the data directly from all studies without looping. 

Best regards,
Marcel
Reply all
Reply to author
Forward
0 new messages