retrieving expression data via the API

Han Jo

unread,

Mar 17, 2022, 8:03:51 AM3/17/22

to cBioPortal for Cancer Genomics Discussion Group

Dear *,

I am banging my head against the API and cannot find the right way to retrieve expression data. I am in particular interested in the study ccle_broad_2019.

Could anyone point me towards the right API call? I found the molecular profiles, e.g. ccle_broad_2019_protein_quantification and ccle_broad_2019_rna_seq_mrna but all I get is some meta information via the API.

I want to retrieve (ideally) all the matching RNA seq and protein expression samples from the study.

Here is snippet of what I have so far ...

Any help is greatly appreciated!

"""

from bravado.client import SwaggerClient

cbioportal = SwaggerClient.from_url('https://www.cbioportal.org/api/api-docs',
config={"validate_requests": False, "validate_responses": False,
"validate_swagger_spec": False})

studyid = "ccle_broad_2019"
ccl_study = cbioportal.Studies.getStudyUsingGET(studyId=studyid).result()
samples = cbioportal.Samples.getAllSamplesInStudyUsingGET(studyId=studyid).result()
print(len(samples))
profiles = cbioportal.Molecular_Profiles.getAllMolecularProfilesInStudyUsingGET(studyId=studyid, projection="DETAILED"
).result()
for i in profiles:
print(i.molecularProfileId)

data = cbioportal.Molecular_Data.getAllMolecularDataInMolecularProfileUsingGET(
entrezGeneId=2, molecularProfileId="ccle_broad_2019_protein_quantification",
sampleListId="ccle_broad_2019_all").result()
print(len(data))
for i in data[0]:
print(i, data[0])

"""

Gaofei Zhao

unread,

Mar 18, 2022, 11:03:29 AM3/18/22

to cBioPortal for Cancer Genomics Discussion Group

Hi Han,

You are doing it in a right way, you already get related protein related data.

To make your result more clear, you could change your last line from "print(i, data[0])" to "print(i, data[0][i])"

Then you will see something like this:

entrezGeneId ENTREZ_GENE_ID gene GENE molecularProfileId MOLECULAR_PROFILE_ID patientId PATIENT_ID sampleId SAMPLE_ID studyId STUDY_ID uniquePatientKey UNIQUE_PATIENT_KEY uniqueSampleKey UNIQUE_SAMPLE_KEY value EXPRESSION_DATA

The expression data is located in the value field (the last one).

Sasha Dagayev

unread,

Mar 18, 2022, 12:07:08 PM3/18/22

to cBioPortal for Cancer Genomics Discussion Group

Hi Han,

I wrote a package for some of these issues here: https://pypistats.org/packages/cbio-py.

If the endpoint that you're looking for isn't covered. I wrote this helper function to help transform the return objects from the Swagger client into a standard dictionary here:

def return_to_dict_converter(return_type,return_list):
if type(return_list) == list:
if return_type == 'dict':
return_list_dict = []
for return_item in return_list:
return_item_dict = {}
for att in dir(return_item):
return_item_dict[att] = getattr(return_item, att)
return_list_dict.append(return_item_dict)
return return_list_dict
elif return_type == 'native':
return return_list

Hope this helps!

Han Jo

unread,

Mar 21, 2022, 9:06:02 AM3/21/22

to cBioPortal for Cancer Genomics Discussion Group

Thanks for the input!

@Gaofei Zhao Yes, true! Sorry, I didnt specify my problem precise enough. I just wasnt able to conveniently get ALL the protein/RNA data from "a single query". So I thought my approach was wrong or suboptimal at the least.

@Sasha thanks for the hint!

I will need to fiddle around a bit but might come back to you if my solution is too clumsy.

Cheers,

Sven

debr...@mskcc.org

unread,

Mar 21, 2022, 11:36:31 AM3/21/22

to sven.g...@gmail.com, cbiop...@googlegroups.com, odag...@gmail.com, zh...@mskcc.org

Hi Sven,

The API is optimized for slicing data. That is pulling a subset of cases and a subset of genes. If you are looking to download all the data, it might be better to download the data files for the entire study from our datahub. See e.g.:

https://github.com/cBioPortal/datahub/tree/master/public/ccle_broad_2019

Hope that helps!

Best wishes,

Ino

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/494c4752-3b78-4023-8427-87b4f2ebc0afn%40googlegroups.com.

*** Only open attachments or links from trusted senders. Report phishing to inf...@mskcc.org ***

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Reply all

Reply to author

Forward