Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Variable Definition Confirmation

202 views
Skip to first unread message

Joseph Celidonio

unread,
Sep 2, 2024, 8:53:18 AM9/2/24
to cbiop...@googlegroups.com
Hello,

I was exploring the variables included in cBioPortal for head and neck squamous cell carcinoma.  I came across a variable that I am having trouble defining: "Patient Smoking History Category," which has a value from 1-4.  Can you help me define was these values indicate?  Also, if there is a document to reference for defining variables in the database, please let me know as I have been unable to find it.

Thank you for your time, and I hope you've had a great weekend.

Sincerely,
Joe Celidonio

Joseph Celidonio

unread,
Sep 3, 2024, 9:15:06 AM9/3/24
to cbiop...@googlegroups.com
Good morning,

I hope you had a great weekend.  I just wanted to follow up on my previous email.  Please let me know how to define this variable.

Additionally, is there a way to filter studies to only include patients who do not have a gene mutation?  

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Sunday, September 1, 2024 9:46:39 AM
To: cbiop...@googlegroups.com <cbiop...@googlegroups.com>
Subject: Variable Definition Confirmation
 

Tali Mazor

unread,
Sep 3, 2024, 10:21:07 AM9/3/24
to Joseph Celidonio, cbiop...@googlegroups.com
Hi Joe,

The variables in cBioPortal studies are taken directly from the original publication, so my general advice is to refer back to the original publication for definitions. That said, if you let us know which specific study you're referring to, it's possible someone on our curation team can help.

There are 2 ways you can select samples without a mutation in a particular gene.
1) In study view, filter to samples with a gene mutation. Then, go to the Custom Selection menu (upper right, next to Charts & Groups), and click "currently unselected" (this will fill the box with the sample IDs of all the samples that the gene filter hid) and then "Filter to select samples". You are now filtered down to the cases without mutations in the original gene you selected. One important final step is to make sure all of these samples were profiled for mutations - check the Genomic Profile Sample Counts table and if necessary, use that to filter down to samples where mutations were profiled.
2) If the study has gene panel data, and some samples were profiled for mutations but did not include your gene of interest on the panel, the above approach is not ideal. In this scenario, you can query for your gene of interest, and then go to the Download tab and download the Type of Genetic Alterations Across All Samples table. This table distinguishes between samples that were or were not profiled for the gene of interest.

-Tali


--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/BL1PR14MB49515B6AD66AFA8069DCFD0EE9932%40BL1PR14MB4951.namprd14.prod.outlook.com.

Joseph Celidonio

unread,
Sep 3, 2024, 10:45:00 AM9/3/24
to Tali Mazor, cbiop...@googlegroups.com
Hello Tali,

I will double check the original studies to define the smoking variable.  If you don’t mind asking your curation team to help define the variable, the original studies are included in the screenshot attached above. 

Regarding my second question, I want to clarify what I meant.  Across the 7 studies I am collecting data from, there are >100 gene mutations reported.  I want to collect data on patients who have none of these gene mutations.  Will the instructions you provided above allow me to collect data from patients without any gene mutation (i.e. they were profiled for mutations but had a negative result).



Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Tuesday, September 3, 2024 10:20:36 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: cbiop...@googlegroups.com <cbiop...@googlegroups.com>
Subject: Re: [cbioportal] Re: Variable Definition Confirmation
 
original-8FCFD138-F67E-448E-8376-431652C28A84.jpeg

Tali Mazor

unread,
Sep 3, 2024, 3:51:09 PM9/3/24
to Joseph Celidonio, cbiop...@googlegroups.com
Hi Joe,

First, I'd strongly suggest you limit your study selection so you are not mixing genome versions and not including studies with overlapping samples. Otherwise you run the risk of biasing your results. The 4 TCGA studies share most cases in common, so you are best off selecting just one of them (you can read more in our FAQ, but I generally recommend the PanCancer Atlas version).

I'll reach out to our curation team and see if they have any insight.

To identify the patients without mutations, you can use the Mutation Count chart to select cases with 0 mutations called.

-Tali

Joseph Celidonio

unread,
Sep 3, 2024, 11:21:48 PM9/3/24
to Tali Mazor, cbiop...@googlegroups.com
Hello Tali,

Regarding the studies with overlapping results, I removed duplicate patient ID’s and sample ID’s after data collection, which I thought would take care of any risk of bias.  Is there still the possibility that the same patient data gets counted under different patient ID’s or sample ID’s if I include these overlapping studies?  I want to make sure I avoid any inadvertent duplicates.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Tuesday, September 3, 2024 3:50:38 PM

Joseph Celidonio

unread,
Sep 4, 2024, 8:04:59 AM9/4/24
to Tali Mazor, cbiop...@googlegroups.com
Good morning Tali,

Apologies for all the questions these past few days, you’ve been very helpful and I appreciate your time.

I’ve taken your advice of narrowing the studies I include to prevent data overlap and differing genome sequencing methods.  In the studies I've attached below, does mutation count refer to the number of genes mutated (I.e. a mutation count of 20 refers to a patient having TP53, TTN, NOTCH1, etc.)?  Please confirm.

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Tuesday, September 3, 2024 4:02:30 PM
To: Tali Mazor <tma...@ds.dfci.harvard.edu>
original-726C46F3-2086-4B1E-8852-D7448A7B308E.jpeg

Tali Mazor

unread,
Sep 4, 2024, 9:43:24 AM9/4/24
to Joseph Celidonio, cbiop...@googlegroups.com
Hi Joe,

Mutation count refers to the number of mutations, which may or may not be the same as the number of mutated genes. A sample with a mutation count of 2 may have two mutations in the same gene (eg two different mutations in TP53), or mutations in two different genes (eg one mutation in TP53 and one mutation in TTN).

-Tali


Joseph Celidonio

unread,
Sep 4, 2024, 10:49:34 AM9/4/24
to Tali Mazor, cbiop...@googlegroups.com
Hi Tali,

Thank you for the clarification.  Is there any way to determine number of gene mutations per patient, rather than mutation count per patient?

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Wednesday, September 4, 2024 9:42:52 AM

Tali Mazor

unread,
Sep 5, 2024, 8:50:43 AM9/5/24
to Joseph Celidonio, cbiop...@googlegroups.com
Hi Joe,

I don't believe there's a way to get the number of mutated genes per case. I think for that you'll need to download the data (all data is available in our datahub: https://github.com/cBioPortal/datahub/tree/master/public) and perform that analysis yourself.

-Tali

Baby Anusha Satravada

unread,
Sep 5, 2024, 11:21:03 AM9/5/24
to cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

You can find the specifics of the values from the GDC Browser:

On the website under the attribute Tobacco_smoking_status you will find the categories. We have updated our data values as well in our portal.

Thanks,
Anusha.

Joseph Celidonio

unread,
Sep 6, 2024, 6:51:34 AM9/6/24
to Tali Mazor, cbiop...@googlegroups.com
Good morning Tali,

I just wanted to follow up on my inquiry about the smoking category variable that you reached out to your curation team about.  I haven’t heard from them yet, is there an email or phone number that I can use to follow up on this?

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Thursday, September 5, 2024 8:50:11 AM

Tali Mazor

unread,
Sep 6, 2024, 9:26:22 AM9/6/24
to Baby Anusha Satravada, Joseph Celidonio, cBioPortal for Cancer Genomics Discussion Group
Joe - See below for the answer from Anusha on the curation team.


--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.

Joseph Celidonio

unread,
Sep 10, 2024, 2:20:15 PM9/10/24
to Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello Tali,

Is there any way to separate patients based on mutations that disrupt the coding regions of DNA vs. Those that are non-disruptive?  For example, I’d like to see how many patients had disruptive vs. non-disruptive TP53 mutations in head and neck squamous cell carcinoma studies.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Friday, September 6, 2024 9:25:49 AM
To: Baby Anusha Satravada <anuvi...@gmail.com>; Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Subject: Re: [cbioportal] Re: Variable Definition Confirmation

Joseph Celidonio

unread,
Sep 12, 2024, 10:06:35 AM9/12/24
to Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Good morning,

I just wanted to follow up on my previous email, please let me know if there is a way to separate disruptive vs. non-disruptive TP53 mutations.  

I also had a follow-up question: when selecting samples of patients with a specific mutation (e.g. TP53), does this ensure that all other patients in the cohort do not have a TP53 mutation present?  For example, if I filter a study to only show patients with mutant TP53, this ensures that all other samples from the study had a wild-type TP53 gene.

Thank you for all your help.

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Tuesday, September 10, 2024 2:20 PM
To: Tali Mazor <tma...@ds.dfci.harvard.edu>; Baby Anusha Satravada <anuvi...@gmail.com>

Tali Mazor

unread,
Sep 12, 2024, 10:49:11 AM9/12/24
to Joseph Celidonio, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

You can run a query to identify cases with putative driver events vs variants of unknown significance vs no alteration. To do this, you'll use OQL (Onco Query Language). OQL allows you specify exactly what type of alteration you want to query for, eg mutations, putative driver mutations, missense mutation, amplifications, etc.

Depending on exactly how you want to define your groups, you might query for something like this:
TP53: MUT_DRIVER
TP53

Here's an example where I've queried for TP53 in 4 ways:
image.png
You can see that each track in the OncoPrint is slightly different based on the different OQL. You can further customize how a putative driver is defined using the settings menu next to the Modify Query button.

Once you have your query, go to the Comparison/Survivial tab. On the Overlap subtab, you can toggle groups on & off, and use the diagram to define new groups. If you are logged in, these groups will save to your profile and be available in the future as a group in study view (for more, review our documentation on group comparison). This allows you to save the group of cases with driver alterations, and then create and save the group of cases with non-driver alterations (eg for TP53 alterations that are not driver mutations, toggle on the TP53: MUT_DRIVER and TP53 groups, then select the non-overlapping section of the venn diagram and save this a group)

When you try to analyze for samples without an alteration, you must keep in mind whether a sample was profiled. So for example, after filtering to samples with TP53 mutations, it is true that all the other samples do not have a called TP53 mutation. But, depending on the study, the sample may or may not have been profiled for TP53 mutations. You can use the Genomic Profile Sample Counts table to filter to samples that have been profiled. Querying for a gene will also help visualize this, as samples that are not profiled are displayed differently in the OncoPrint.

-Tali




Joseph Celidonio

unread,
Sep 13, 2024, 8:00:27 AM9/13/24
to Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello Tali,

I have been trying to recreate the queries you ran to identify disruptive vs. non-disruptive TP53 in the following studies:

My goal is to run this query to separate which patients had disruptive vs. non-disruptive TP53 mutations, and then filter patients to only include tumors with a primary site in the oral cavity, and then collect the clinical data.  Would you be able to run the query similar to the example you provided above and send me the link so that the customized query is set up?  I have not been able to do so myself, and your help would be greatly appreciated.

Thank you, please let me know if you can help.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Thursday, September 12, 2024 10:48 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Joseph Celidonio

unread,
Sep 13, 2024, 10:51:10 AM9/13/24
to Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello,

Is there a way to call someone to help walk me through this process of collecting data on disruptive vs. non-disruptive TP53 mutations?  Apologies for the redundant emails, I just have a deadline I am trying to make.

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Thursday, September 12, 2024 7:47:39 PM
To: Tali Mazor <tma...@ds.dfci.harvard.edu>

Anika Bongaarts

unread,
Sep 16, 2024, 9:03:02 AM9/16/24
to Joseph Celidonio, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,
To filter samples with a primary tumor site in the oral cavity, you need to select this for both "Primary Tumor Site" and "Patient Primary Tumor Site" fields. Doing this across the four studies you've mentioned should result in 98 samples. You can review these samples here. You can download all relevant clinical data under the Clinical Data tab.
In terms of distinguishing patients with disruptive vs. non-disruptive TP53 mutations, it depends on your specific criteria for "disruptive". I suggest using a similar approach to what Tali outlined. You can download the TP53 mutation information for each sample using this link, which also includes information on whether a sample was profiled. You can also include other TP53 alterations by selecting the columns dropdown for the table labeled "Type of Genetic Alterations Across All Samples."
Additionally, I recommend reviewing the TP53 mutations for your samples by navigating to the oncoprint. Combining the clinical data with this mutation table should bring you closer to extracting the information you need.

Hope this helps If you need any more support, please remember to Reply All to this thread. 
Best,

Anika Bongaarts

Project Manager & Product Owner

(Monday, Wednesday-Friday)

E an...@thehyve.nl

T +31 30 700 9713

W thehyve.nl

    



Joseph Celidonio

unread,
Sep 16, 2024, 10:42:37 AM9/16/24
to Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Good morning,

Thank you for getting back to me.  I was able to run this analysis but I’d like to run the steps by you to confirm I did this correctly:

I selected the 4 studies and filtered for only the correct primary sites.  I then queried for driver TP53 mutations by inputting “TP53: DRIVER”.  I then went to the clinical/survival tab, set my altered group to contain the TP53: DRIVER samples, and the unaltered group to contain all other samples.  I then compared these groups using various analyses.

I understand driver mutations are slightly different than disruptive mutations, but this should suffice for my purposes.  Does this look correct to you?

Sincerely,
Joe Celidonio 

From: Anika Bongaarts <an...@thehyve.nl>
Sent: Monday, September 16, 2024 9:01:55 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: Tali Mazor <tma...@ds.dfci.harvard.edu>; Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Anika Bongaarts

unread,
Sep 16, 2024, 10:45:08 AM9/16/24
to Joseph Celidonio, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

This sounds good to me! 

Feel free to reach out if you have other questions or run into any issues.

Kind regards,

Anika Bongaarts

Project Manager & Product Owner

(Monday, Wednesday-Friday)

E an...@thehyve.nl

T +31 30 700 9713

W thehyve.nl

    


Joseph Celidonio

unread,
Sep 19, 2024, 8:00:28 AM9/19/24
to Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group, Sree Chinta
Hello,

Just to follow up on my previous question about the gene query.  I was able to run the query as highlighted in my previous email.  Is there any way to collect clinical data when grouping patients in this manner?  Additionally, is there a way to check that all patients were profiled for TP53 in both groups?

Sincerely,
Joe Celidonio

From: Anika Bongaarts <an...@thehyve.nl>
Sent: Monday, September 16, 2024 10:44 AM

de Bruijn, Ino

unread,
Sep 25, 2024, 8:31:06 AM9/25/24
to Joseph Celidonio, Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group, Sree Chinta

Hi Joe,

 

Thanks for reaching out!

 

If I understand correctly, you’re asking if there’s a way to collect all the clinical data after querying a cohort (i.e. on the Results View). You can do that by clicking on the  “Download tab” and then clicking in the altered sample row on “Virtual Study”:

 

A screenshot of a computer

Description automatically generated

 

Name it, and open the “Study View” of the cohort you just created. Now you can click on the “Clinical Data” tab and download all the associated clinical data for just the altered cases

 

> Additionally, is there a way to check that all patients were profiled for TP53 in both groups?

 

Could you elaborate on this question? if you e.g. query for TP53 the Oncoprint should show you whether all your groups have TP53 mutations. Let me know if that makes sense. If you share the URL for your query/groups I can provide some more specific pointers

 

Hope that helps!

 

Best wishes,

Ino

 

Kind regards,

Image removed by sender.

Image removed by sender.  Image removed by sender.  Image removed by sender.

 

 

Image removed by sender.

Image removed by sender.  Image removed by sender.  Image removed by sender.

 

 

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Disclaimer ID:MSKCC

Joseph Celidonio

unread,
Oct 14, 2024, 12:04:09 PM10/14/24
to de Bruijn, Ino, Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group, Sree Chinta
Good morning,

I hope you all had a great weekend.  I want to thank you for providing your guidance on how to run this gene query to identify driver vs. non-driver mutations of TP53.  I was able to follow your steps and assess for correlations using the univariate analyses under the Comparisons > Clinical tab.

I am reaching out to see if there is a way to run multivariable analyses (e.g. logistic regression and linear regression) in cBioPortal.  Alternatively, I could extract the data and run these analyses manually, but currently I do not see the option to extract the data.  Please let me know if either the option for multivariable analyses is possible in cBioPortal, or if it is possible to extract clinical data after querying for driver vs. non-driver TP53 mutations.

Thank you,
Joe Celidonio

From: de Bruijn, Ino <debr...@mskcc.org>
Sent: Wednesday, September 25, 2024 8:30 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>; Anika Bongaarts <an...@thehyve.nl>
Cc: Tali Mazor <tma...@ds.dfci.harvard.edu>; Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>; Sree Chinta <src...@njms.rutgers.edu>

Joseph Celidonio

unread,
Oct 22, 2024, 2:34:19 PM10/22/24
to de Bruijn, Ino, Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello,

I am running the analysis recommended to me a while back by Anika to assess driver vs. non-driver TP53 mutations.  I ran a query for TP53_MUTATION and TP53.  I attached a screenshot of the overview of the groups.  Does this indicate that 239 patients had a mutation in a region of TP53 that makes it considered a driver mutation, and 79 had no gene mutation?  Please confirm that I am interpreting this correctly.

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Monday, October 14, 2024 12:03 PM
To: de Bruijn, Ino <debr...@mskcc.org>; Anika Bongaarts <an...@thehyve.nl>
Screen Shot 2024-10-22 at 12.20.48 PM.png

Joseph Celidonio

unread,
Oct 24, 2024, 3:18:50 PM10/24/24
to de Bruijn, Ino, Anika Bongaarts, Tali Mazor, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello,

I just wanted to follow up on my previous inquiry.  Is there a way to query for driver vs. non-driver vs. Wild type TP53 mutations?

Sincerely,
Joe Celidonio

From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Tuesday, October 22, 2024 2:34 PM

To: de Bruijn, Ino <debr...@mskcc.org>; Anika Bongaarts <an...@thehyve.nl>
Cc: Tali Mazor <tma...@ds.dfci.harvard.edu>; Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Tali Mazor

unread,
Oct 24, 2024, 4:47:46 PM10/24/24
to Joseph Celidonio, de Bruijn, Ino, Anika Bongaarts, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

You cannot directly query for non-driver mutations, but you can create a group made up of those samples with non-driver mutations.

To do this, run a query for all TP53 mutations and TP53 driver mutations:
TP53: MUT
TP53: DRIVER_MUT

Then go to the Comparison/Survival tab. On the Overlap subtab, you can create a new group based on the overlap of the existing groups: deselect the altered/unaltered groups and instead select the groups corresponding to your query genes. In this example we can see 21 samples which have a TP53 mutation but not a TP53 driver mutation - click on that sliver of the venn diagram and create a new group (instructions with screenshots can be found here)

You can now use this new group for comparisons within the Comparison/Survival tab. If you are logged in, you can save the group to your profile, and then you can also access it within study view.

-Tali



Joseph Celidonio

unread,
Oct 29, 2024, 11:59:17 AM10/29/24
to Tali Mazor, de Bruijn, Ino, Anika Bongaarts, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello Tali,

Thank you for this reply.  Do you know if there is a way to filter the cohort for disruptive vs. non-disruptive TP53 mutations, rather than driver mutations?  Disruptive mutations are said to be mutations that are non-conservative, which occur in the DNA-binding domain or in the stop codons.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Thursday, October 24, 2024 4:47 PM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: de Bruijn, Ino <debr...@mskcc.org>; Anika Bongaarts <an...@thehyve.nl>; Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Tali Mazor

unread,
Oct 29, 2024, 3:46:47 PM10/29/24
to Joseph Celidonio, de Bruijn, Ino, Anika Bongaarts, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

I'd suggest reading through the OQL documentation (https://docs.cbioportal.org/user-guide/oql/) to see if any of the existing functionality can be used to capture the subset of mutations you're interested in. You can use OQL to filter by type of mutation (eg nonsense) and by protein position (eg to capture the DNA binding domain). Then take the same approach by querying with your OQL as well as for all mutations and then use the Overlap tab to get at your cases of interest.

-Tali


Joseph Celidonio

unread,
Nov 5, 2024, 8:00:31 AM11/5/24
to Tali Mazor, de Bruijn, Ino, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hello Tali,

Thank you for this reference, it was very helpful.  I am one step away from running this query successfully, but I could use your guidance on writing the correct syntax for this query.

The filter that I'm trying to run is to collect data on non-conservative mutations that occur within the L2-L3 DNA binding domain of TP53 (L2 is amino acid residues 163-195, and L3 is amino acid residues 236-251).  In this group I'd also want to include nonsense and frameshift mutations in TP53 that occur anywhere in the protein. I would then want to compare nonsense, frameshift, and non-conservative missense mutations within the L2-L3 DNA binding domain of TP53 with all other TP53 mutations.  This would effectively be comparing disruptive vs. non-disruptive TP53 based on the definition of disruptive TP53 mutation that I am following.  

I queried for TP53:MUT=Nonsense and TP53:MUT=Frameshift which worked just fine.  What I need help with is the non-conservative missense mutations specifically within amino acid residues 163-195 and 236-251.  I tried querying for TP53:MUT=Missense;RES=163-195,236-251 but I've received error messages.  This syntax also does not account conservative vs. non-conservative missense mutations, as I would only want to include non-conservative missense mutations in this query.  Can you help advise me on how I can run this query?

If I cannot run this query directly for non-conservative mutations, a workaround could be to download the amino acid / protein data into an excel file.  At that point I can manually filter the data as long as the mutation type (e.g. missense, nonsense, frameshift) is included in the data, and the position of mutation (e.g. amino acid sequence 163-187) is included.  This may actually be my preferred method if this is possible, but please let me know.

Thank you for all your help.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Tuesday, October 29, 2024 3:46 PM

Tali Mazor

unread,
Nov 5, 2024, 9:12:42 AM11/5/24
to Joseph Celidonio, de Bruijn, Ino, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe,

The syntax for using OQL with protein positions is:
(a-b)
So to get mutations in L2, you'll want to query for:
TP53: MUT=(163-195)
If you specifically want missense mutations in that range, then use an underscore to combine:
TP53: MUT=MISSENSE_(163-195)

You can also download the list of all mutations. Go to the Mutations tab and look at the top right of the table - next to the Columns menu is a download icon. Clicking there will download all the data in the table, including protein change and mutation type.

-Tali

Joseph Celidonio

unread,
Nov 5, 2024, 9:37:57 AM11/5/24
to Tali Mazor, de Bruijn, Ino, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Good morning Tali,

Thank you, this is very helpful.  Is there a way to determine if the missense mutation was conservative vs. non-conservative?

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Tuesday, November 5, 2024 9:12 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: de Bruijn, Ino <debr...@mskcc.org>; Baby Anusha Satravada <anuvi...@gmail.com>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Tali Mazor

unread,
Nov 6, 2024, 9:38:16 AM11/6/24
to Joseph Celidonio, de Bruijn, Ino, Baby Anusha Satravada, cBioPortal for Cancer Genomics Discussion Group
Hi Joe - I am not aware of any way to separate conservative vs non-conservative mutations within cBioPortal.

-Tali


Tali Mazor

unread,
Nov 7, 2024, 8:26:05 AM11/7/24
to Joseph Celidonio, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hi Joe,

The groups you select/deselect in the Comparison/Survival tab are only reflected within that tab; the Mutations tab reflects the full query.

On the Comparison/Survival tab, if you hover your mouse over the group name you'll see an option to "Open in study view". Click that and it will bring you to study view for the samples in that group. From there, you can explore those patients in study view or download the clinical data associated with those samples by clicking the download button (next to "Custom Selection" in the upper right).

-Tali

PS Please keep the google group cc'ed so others can benefit from this conversation.



On Wed, Nov 6, 2024 at 4:22 PM Joseph Celidonio <jc2...@njms.rutgers.edu> wrote:
Hello Tali,

Thank you for getting back to me.  After I query for TP53: MUT=MISSENSE_(163-195), TP53: MUT=MISSENSE_(236-251), TP53:MUT=Nonsense and TP53:MUT=Frameshift, is there a way to view the clinical data for this group?  When I input this query and go to the mutations tab, it still shows all 299 TP53 mutations, rather than the 120 mutations that meet the query criteria.  I've included the link for my query below.  Is there a way I can view the clinical data for just the 120 mutations that I'm querying for?

Sincerely,
Joe Celidonio




From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Wednesday, November 6, 2024 9:37 AM

Joseph Celidonio

unread,
Nov 7, 2024, 9:01:42 AM11/7/24
to Tali Mazor, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hello Tali,

Thank you, this is very helpful.

My goal is to compare groups between disruptive TP53 (by using the query criteria I outlined above) vs. non-disruptive TP53 (the mutations that do not overlap with the disruptive TP53 mutations) vs. no TP53 mutation.

For the sake of figuring out how to do this, I've included the link below so you can see what step I'm on of the query.  In the comparison/survival tab of this query, I hover over the "Disruptive TP53 All missense" group and select study view.  From there I can view the clinical data for disruptive TP53 mutations.  I made a separate group for non-disruptive TP53 mutations and can do the same thing.   The clinical data is what I want to view for all of my groups.

How can I proceed to view the clinical data for samples with no TP53 mutation (is there a query that I can run that excludes all TP53 mutations)?


Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Thursday, November 7, 2024 8:25 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>; cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>

Cc: de Bruijn, Ino <debr...@mskcc.org>; Baby Anusha Satravada <anuvi...@gmail.com>

Tali Mazor

unread,
Nov 8, 2024, 8:41:30 AM11/8/24
to Joseph Celidonio, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hi Joe,

There are a few ways you can do this, depending on exactly what you want.

If you want samples without alteration in TP53 (eg no mutation, no deletion - just has the grey boxes in OncoPrint in the query you shared), then you can use the "Unaltered group" from the query you shared.

If you want samples without mutation in TP53, you'll want to update the query you shared so the final entry is "TP53: MUT" and not "TP53". Then the unaltered samples will be those without mutation specifically.

If you want samples without mutation, you can also do this directly in Study View. Go to study view for your samples of interest, use the Mutated Genes table to select samples with TP53 mutation, then use the Custom Selection dropdown (upper right) to filter to the "currently unselected" ie samples without TP53 mutation.

No matter which of these approaches you take, there's one important final step when you get to study view for the samples of interest. You'll want to confirm that mutations were profiled for all the samples (otherwise the absence of a TP53 mutation is simply because mutations were not looked for). Find the 'Genomic Profile Sample Counts' table and filter to samples with Mutations.

-Tali



Joseph Celidonio

unread,
Dec 17, 2024, 11:04:20 AM12/17/24
to Tali Mazor, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Good morning Tali,

I hope all has been well with you.  I'm having some trouble identifying a discrepancy in my data that I was hoping you could help me with.

I have a group of patients/samples from a query of 4 studies, which I applied specific filters to for the purpose of my research.  I have split these patients into two groups based upon whether they have disruptive vs. non-disruptive TP53 mutations, based on specific criteria our team is following.  Here is a link to these groups:


You will see that there are a total of 241 unique patients (121 non-disruptive, 120 disruptive).  The issue I am having is that this number is different compared to when I filter for any TP53 mutation (click on the link below "combined study" that says "321 samples / 320 patients" --> using the "mutated genes" table, select samples for TP53 mutations only --> view clinical data).  You'll see in the clinical data tab that there are 239 unique patients and 240 unique samples for any TP53 mutation vs. 241 unique patients and 242 unique samples for the disruptive/non-disruptive TP53 mutations.

I realize there must be duplicate patients/samples in the disruptive/non-disruptive TP53 groups I've created.  However, cBioPortal is not picking up on any overlap between the disruptive and non-disruptive groups.  Can you please advise on how to resolve this discrepancy between the "any TP53: group and the "disruptive/non-disruptive TP53" groups? 

As always, your time and consideration is greatly appreciated.

Sincerely,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Friday, November 8, 2024 8:40 AM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>
Cc: cBioPortal for Cancer Genomics Discussion Group <cbiop...@googlegroups.com>; de Bruijn, Ino <debr...@mskcc.org>; Baby Anusha Satravada <anuvi...@gmail.com>

Tali Mazor

unread,
Dec 18, 2024, 2:44:34 PM12/18/24
to Joseph Celidonio, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hi Joe,

On the link you sent me, you can hover your mouse over the "Non-disruptive TP53" group name and there's a pop-up that allows you to go to study view for those 122 samples. In study view, the Mutated Genes table shows that only 120 of the samples have a TP53 mutation. Therefore, it seems like 2 samples without TP53 mutations may have been inadvertently included in the "non-disruptive" group, which accounts for the difference when you look at all samples with TP53 mutations.

-Tali



image.png
image001.png

Joseph Celidonio

unread,
Apr 24, 2025, 8:00:41 AMApr 24
to Tali Mazor, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hello Tali,

I hope all has been well with you.  I am trying to run a query in cbioportal and I could use some help.  This query requires that I filter for specific segments of mutant TP53, and then look at the individual clinical data of the patients in that group.  I have gotten as far as running the gene query to create the groups that I want: https://www.cbioportal.org/results/comparison?comparison_subtab=overlap&comparison_overlapStrategy=Exclude&comparison_selectedGroups=%5B%22Disruptive%20TP53%20All%20Missense%22%2C%22Non-disruptive%20TP53%22%2C%22WT%20TP53%22%5D&comparison_groupOrder=%5Bnull%2C%22TP53%3A%20MUT%3DNONSENSE%22%2C%22TP53%3A%20MUT%3DMISSENSE_(163-195)%22%2C%22TP53%22%2C%22TP53%3A%20MUT%3DFRAMESHIFT%22%2C%22Altered%20group%22%2C%22Unaltered%20group%22%5D&plots_horz_selection=%7B%7D&plots_vert_selection=%7B%7D&plots_coloring_selection=%7B%7D&mutations_gene=TP53&session_id=68096f38854f636a3865c926

From this point, how can I view the clinical data for each patient included in these groups?  The tab that I'm looking to view is included in this link, but I'm looking to view this data for only the disruptive group first, then non-disruptive then WT: https://www.cbioportal.org/study/clinicalData?id=hnsc_tcga

Thank you,
Joe Celidonio

From: Tali Mazor <tma...@ds.dfci.harvard.edu>
Sent: Wednesday, December 18, 2024 2:43 PM
To: Joseph Celidonio <jc2...@njms.rutgers.edu>

Joseph Celidonio

unread,
Apr 28, 2025, 1:07:37 PMApr 28
to Tali Mazor, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hello,

I just wanted to follow up on my previous email.  Is there a way to view clinical data after inputting the gene query in the link I provided?

Sincerely,
Joe Celidonio


From: Joseph Celidonio <jc2...@njms.rutgers.edu>
Sent: Wednesday, April 23, 2025 7:01:08 PM
To: Tali Mazor <tma...@ds.dfci.harvard.edu>

Tali Mazor

unread,
Apr 28, 2025, 9:50:57 PMApr 28
to Joseph Celidonio, cBioPortal for Cancer Genomics Discussion Group, de Bruijn, Ino, Baby Anusha Satravada
Hi Joe,

If you hover your mouse over one of the group names, you'll see a pop-up like in the screenshot below. If you click "Open in study view", that will bring you to the study view page for just the samples in that group. You can then click over to the Clinical Data tab. And then repeat for each of your groups of interest.

-Tali


image.png


Reply all
Reply to author
Forward
0 new messages