cBioPortal Datahub Public Dataset Load Into Database Issues

14 views
Skip to first unread message

Miu ki Yip

unread,
May 8, 2026, 12:15:33 PM (8 days ago) May 8
to cbiop...@googlegroups.com, Yichao Sun
Hi all,

I am trying to load some of the public data in the cBioPortal Data hub (https://github.com/cBioPortal/datahub/tree/master/public) into a local instance of cBioPortal but I am encountering errors loading the datasets in.

For instance, when trying to load the “msk_impact_50k_2026” study into the database, I am seeing the following errors in the stdout:



ERROR: data_clinical_sample.txt: line 5: column 20: Attribute name not in upper case.; value encountered: 'purity_estimate_from_mutations'

And 

ERROR: data_sv.txt: lines [25, 150, 154, (354 more)]: column 1: Sample ID not defined in clinical file; values encountered: ['P-0012686-T01-IM5', 'P-0013979-T01-IM5', 'P-0010429-T01-IM5', '(317 more)']
WARNING: data_sv.txt: lines [7423, 7840, 12522, (1 more)]: Hugo Symbol should not start with a number.; values encountered: ['48787', '30302_C.890', '1311DEL']

I am concerned that I am looking in an out of date location for the public datasets since a lot of the datasets in the GitHub repo linked have issues loading into the database.

Please let me know how I can get these datasets loaded or if there is an updated data repository.

Thank you!


jagn...@gmail.com

unread,
May 11, 2026, 9:33:58 AM (5 days ago) May 11
to cBioPortal for Cancer Genomics Discussion Group
Hi Miu

thanks for reaching out to cBioPortal team. Which version of cBioPortal are you using for import?

You can obtain this from the main Study page on cbioportal.org. There is a download link for all studies right next to the Study title.


I will reach out to the data team since new studies sometimes have errors.


thanks
Jag
Reply all
Reply to author
Forward
0 new messages