Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Errors when importing public studies on a local instance

39 views
Skip to first unread message

Vincent Duchaos

unread,
Apr 22, 2025, 1:56:05 PMApr 22
to cBioPortal for Cancer Genomics Discussion Group
Hello,

I'm trying to deploy a local instance of cBioPortal using docker compose, with some public studies and ours.
I managed to install the portal, I can access the built-in study (Low-Grade Gliomas)
Now I’d like to  import some others studies, like gbm_tcga_gdc and gbm_tcga

I have errors at the validation step when importing gbm_tcga_gdc.
I downloaded it from the study browser in cBioPortal.org
I run the command
docker compose exec cbioportal metaImport.py -u https://my_url -s study/gbm_tcga_gdc/
But I get the following logs

Starting validation...

INFO: meta_clinical_patient.txt: Validation of meta file complete
INFO: meta_clinical_sample.txt: Validation of meta file complete
INFO: meta_cna.txt: Validation of meta file complete
INFO: meta_cna_hg38_seg.txt: Validation of meta file complete

ERROR: meta_mrna_seq_fpkm.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'CONTINUOUS'; expected one of [mrna_U133, rna_seq_mrna, rna_seq_v2_mrna, rna_seq_v2_mrna_median_normals, mirna, mrna, rna_seq_mrna_capture, mrna_seq_cpm, mrna_seq_tpm, mrna_seq_fpkm_capture, mrna_seq_fpkm_polya]; value encountered: 'mrna_seq_fpkm'

INFO: meta_mrna_seq_fpkm_zscores_ref_all_samples.txt: Validation of meta file complete
INFO: meta_mrna_seq_read_counts.txt: Validation of meta file complete

ERROR: meta_mrna_seq_read_counts_zscores_ref_all_samples.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, rna_seq_v2_mrna_median_normals_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, mrna_seq_tpm_Zscores, mrna_seq_cpm_Zscores, rna_seq_mrna_capture_Zscores, mrna_seq_fpkm_capture_Zscores, mrna_seq_fpkm_polya_Zscores, mrna_U133_all_sample_Zscores, mrna_all_sample_Zscores, rna_seq_mrna_median_all_sample_Zscores, mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores, mrna_seq_cpm_all_sample_Zscores, mrna_seq_tpm_all_sample_Zscores, rna_seq_mrna_capture_all_sample_Zscores, mrna_seq_fpkm_capture_all_sample_Zscores, mrna_seq_fpkm_polya_all_sample_Zscores, mrna_seq_fpkm_Zscores, mrna_seq_fpkm_all_sample_Zscores]; value encountered: 'mrna_seq_read_counts_Zscores'

INFO: meta_mrna_seq_tpm.txt: Validation of meta file complete



For gbm_tcga, I have warnings (a lot data entries skipped). If I override warnings, the study never shows in the portal.

How can I solve this ? Do you need additional info ?

Thanks,
Vincent

Priti Kumari

unread,
Apr 24, 2025, 1:58:56 PMApr 24
to Vincent Duchaos, cBioPortal for Cancer Genomics Discussion Group
Hi Vincent,

Let us know if you still run into errors.


Thank you,
Priti

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/045aa0f0-6108-4c11-8c4b-ca3244cd349fn%40googlegroups.com.

Vincent Duchaos

unread,
Apr 25, 2025, 5:57:55 PMApr 25
to cBioPortal for Cancer Genomics Discussion Group
Hi,

I managed to import gbm_tcga_gdc study by replacing the stable id by another variable name.
I had a lot of warnings like Entrez_Id 693222 and gene symbolnull not found

So I downloaded a more recent study, hoping a better match of gene ID.
But I'm facing another problems when importing gdm_cptac_2021, there are illegal characters in the dataset

ERROR: data_acetylprotein_quantification.txt: lines [2, 3, 4, (12452 more)]: Feature id contains one or more illegal characters; values encountered: ['id was`A1BG_K134:NP_570602.2` and only alpha-numeric, _, . and - are allowed.', 'id was`A2M_K115:NP_000005.2` and only alpha-numeric, _, . and - are allowed.', 'id was`A2M_K1162:NP_000005.2` and only alpha-numeric, _, . and - are allowed.', '(12452 more)']

ERROR: data_lipidome_negative_quantification.txt: lines [2, 3, 4, (245 more)]: Feature id contains one or more illegal characters; values encountered: ['id was`Cer(d16:1/18:0);Cer(d18:1/16:0)` and only alpha-numeric, _, . and - are allowed.', 'id was`Cer(d18:0/16:0)` and only alpha-numeric, _, . and - are allowed.', 'id was`Cer(d18:0/18:0)` and only alpha-numeric, _, . and - are allowed.', '(245 more)']

ERROR: data_lipidome_positive_quantification.txt: lines [2, 3, 4, (331 more)]: Feature id contains one or more illegal characters; values encountered: ['id was`anandamide(18:1)` and only alpha-numeric, _, . and - are allowed.', 'id was`carnitine(12:0)` and only alpha-numeric, _, . and - are allowed.', 'id was`carnitine(14:0)` and only alpha-numeric, _, . and - are allowed.', '(331 more)']

ERROR: data_metabolome_quantification.txt: line 13: Feature id contains one or more illegal characters; value encountered: 'id was`D_(+)_galactose` and only alpha-numeric, _, . and - are allowed.'

ERROR: data_phosphoprotein_quantification.txt: lines [2, 3, 4, (70327 more)]: Feature id contains one or more illegal characters; values encountered: ['id was`AAAS_S495:NP_056480.1` and only alpha-numeric, _, . and - are allowed.', 'id was`AAAS_S495.1:NP_056480.1` and only alpha-numeric, _, . and - are allowed.', 'id was`AAAS_S525:NP_056480.1` and only alpha-numeric, _, . and - are allowed.', '(70327 more)']

Why have I so many errors, am I doing something wrong ?
How could I solve these last ones ?

Thanks

Vincent Duchaos

unread,
Apr 28, 2025, 3:43:04 AMApr 28
to cBioPortal for Cancer Genomics Discussion Group
Hi Priti,

Thanks for your answer.
Unfortunately, the file are the same, I get the same errrors. 

Same for gdm_cptac_2021, it didn't resolve the issues.
How can it be possible to have illegal characters in the files ? The characters weren't illegal in 2021 when the study was first imported in the portal ?
What could be the workaround ?

I'll face the same problem when I'll try to import my own data, there will be ':' and '/' in some quantifications files.

Thanks
Vincent

Baby Anusha Satravada

unread,
May 1, 2025, 1:05:03 PMMay 1
to cBioPortal for Cancer Genomics Discussion Group

Hi Vincent,

Thanks for the follow-up and for clarifying. We understand the issue and are actively working on a fix in the validator to address these character checks. This should resolve the errors you’re seeing, including the ones for gdm_cptac_2021.

We’ll keep you posted as soon as the updated version is ready—it should be available soon.

Thanks again for your patience!

Best,

Baby Anusha.

Reply all
Reply to author
Forward
0 new messages