mRNASeq Samples Collection

19 views
Skip to first unread message

Sachin Kumar

unread,
Oct 20, 2025, 3:51:24 PMOct 20
to cBioPortal for Cancer Genomics Discussion Group
Hey everyone

I am computer science graduate good in machine leaning, deep learning, and data science. I have started working on a research project related machine learning in cancer. I am trying to create ensemble or deep learning models to predict gene expression of the most correlated genes of a set of input genes. 

The data type I am using is mRNASeq, but the problem is to get the large size of dataset. 
The dataset I have collected so fat is having seven thousand samples and more than twenty thousand genes. Which is not good at all.

I am curious to know that is it possible to get more than one lakh samples of such data type?
I have been searching on databases like GEO, cbioportal, firehouse etc. But did not reach even ten thousand. 

Is it not possible at all or what?

Any help will be appreciated. Cause I don't want to waste my time on something which is not possible.
HELP PLEASE!

JJ Gao

unread,
Oct 20, 2025, 6:06:59 PMOct 20
to Sachin Kumar, cBioPortal for Cancer Genomics Discussion Group
Dear Sachin,

We don't have one lakh samples with RNASeq data in cBioPortal. You can find all the samples here: https://www.cbioportal.org/datasets.

GEO may be the right place to look for more data. But I think would be challenging to find one lakh public RNASeq data that was inconsistent process.

-JJ

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/1436cbe2-8fb2-4c2b-aaee-c4cc1d4b5869n%40googlegroups.com.

Sachin Kumar

unread,
Oct 21, 2025, 4:52:32 AMOct 21
to JJ Gao, cBioPortal for Cancer Genomics Discussion Group
How much samples I can get for a particular cancer type(example breast cancer)?

Thanks & Regards
Sachin Kumar
PRS-I @ NISER

JJ Gao

unread,
Oct 22, 2025, 4:14:18 PMOct 22
to Sachin Kumar, cBioPortal for Cancer Genomics Discussion Group

Hi Sachin,

One way to get the number is to select the "Curated set of non-redundant studies" and click "Explore Selected Studies". From there, you can explore the data availability per cancer type. Pease see screenshots below.

image.png

image.png

Reply all
Reply to author
Forward
0 new messages