Large Study Organization

25 views
Skip to first unread message

Rebecca Sansale

unread,
Oct 7, 2025, 5:32:53 PM (9 days ago) Oct 7
to cBioPortal for Cancer Genomics Discussion Group
Hello!

I am using cBP to visualize data for a cancer cohort (370K+ patients), where 40k patients have clinical and molecular data, and 330k only have clinical data. I am considering 3 different study definitions:

1. 1 study: all cancer patients (370k) 
2. 2 Studies: Gene Panel (Sequenced) patients (40k), non-sequenced patients (330k)
3. 3 Studies: Gene Panel A (30k) , Gene Panel B (10k), and non-sequenced patients (330k)

I have a few questions before I make a decision on my study architecture and wanted to hear others thoughts on the following:

1. Has anyone successfully loaded 370k patients into the same study? are there problems with portal performance?
2. What additional functionality (intra-study comparisons, specific variables etc.) is gained by separating individuals by study?

Thanks for your help!
Rebecca 


de Bruijn, Ino

unread,
Oct 9, 2025, 2:47:15 PM (7 days ago) Oct 9
to Rebecca Sansale, cBioPortal for Cancer Genomics Discussion Group

Hi Rebecca,

 

Thanks for reaching out!

 

> 1. Has anyone successfully loaded 370k patients into the same study? are there problems with portal performance?

 

For large scale cohorts (>100K) we have been experimenting with a new database technology (ClickHouse). This significantly improves the Study View performance and should work fine for the cohort you’re describing. You can read more about the progress here:

 

https://github.com/cBioPortal/roadmap/issues/1

 

You can already use it, follow the steps for ClickHouse here:

 

https://github.com/cbioPortal/cbioportal-docker-compose

 

> 2. What additional functionality (intra-study comparisons, specific variables etc.) is gained by separating individuals by study?

 

There’s no additional functionality really, main reason to split up a large cohort is for performance reasons. For the public portal (https://www.cbioportal.org) the data is organized by published articles. For an internal portal you might not need that grouping

 

Best wishes,

Ino

 

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/d93e89a4-dd97-4fa4-ad97-8068003074c5n%40googlegroups.com.

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Disclaimer ID:MSKCC

Aaron L

unread,
Oct 9, 2025, 8:51:37 PM (7 days ago) Oct 9
to de Bruijn, Ino, Rebecca Sansale, cBioPortal for Cancer Genomics Discussion Group
Rebecca,
Adding to what Ino said, for large datasets, you will see great benefits using Clickhouse, but you may need to make sure Clickhouse has enough resources allocated (especially vCPU).  


Rebecca Sansale

unread,
Oct 14, 2025, 4:51:25 PM (2 days ago) Oct 14
to cBioPortal for Cancer Genomics Discussion Group
great, thanks aaron! i will definitely try out the clickhouse option. 

Rebecca Sansale

unread,
Oct 14, 2025, 4:51:25 PM (2 days ago) Oct 14
to cBioPortal for Cancer Genomics Discussion Group
thank you so much for this info! i will definitely try using the Clickhouse option, that seems promising!
Reply all
Reply to author
Forward
0 new messages