Hi Marcel,
Thanks for the response. My Python code looks like this:
profileId = f"{study_name}_mutations"
sampleListId = f"{study_name}_all"
mutations = self._client.Mutations.getMutationsInMolecularProfileBySampleListIdUsingGET(
molecularProfileId=profileId,
sampleListId=sampleListId,
projection="DETAILED"
).result()
This gets _all_ the mutations for a study in some detail, so whatever downstream analysis I want to do is possible from here. However, as you rightly point out, this is a very blunt instrument and I'm sure there could be better ways to do this. Two spring to mind:
1. Make the use-case more specific so that I can look for a better cBioPortal endpoint to get only the data that I want, rather than everything. In this case, I was just doing a proof-of-concept and wanted to plot mutation-counts-by-sample. On the R side, I can get this data much more cheaply by calling `clinicalData()`, which gives me a variety of summary statistics per-sample, including mutation count so I suppose a Python equivalent probably exists.
2. Find a way to get all the mutations, but more efficiently, as in the cBioDataPack example.
In an ideal world, I would use option 1, but I suspect that hunting around for the clever way to get at exactly the right data might not always work. Therefore, it would certainly be useful to have the more flexible (although less elegant) option 2 at my disposal. Can you point me to documentation on how I prepare the tar balls you mention, so that I can make my data available via cBioDataPack? And is there a similarly efficient method available to get these tar balls into a Pandas Dataframe on the Python side?
Thanks,
Peter