Problems when importing data into cBioPortal

503 views
Skip to first unread message

Yuerong Zhu

unread,
Apr 28, 2017, 9:52:23 PM4/28/17
to cbiop...@googlegroups.com
Hi,

I’m still learning cBioPortal. I have installed cBioPortal (v1.5.1) on a local server with source codes from https://github.com/cBioPortal/cbioportal. However, after I downloaded some datasets from http://www.cbioportal.org/data_sets.jsp and tried to load data with /cbioportal/core/src/main/scripts/importer/validateData.py or /cbioportal/core/src/main/scripts/importer/metaImport.py, I got lots of errors. 

I wonder if you can provide some tips. Or, do you have updated source codes (/cbioportal/core/src/main/scripts/importer/) that match the exported data from http://www.cbioportal.org/data_sets.jsp?

Thanks a lot in advance!

Here are some example errors for command: sudo /cbioportal/core/src/main/scripts/importer/validateData.py -s /var/lib/tomcat8/webapps/ROOT/brca_tcga_pub/

ERROR: meta_linear_CNA.txt: Invalid stable id for genetic_alteration_type 'COPY_NUMBER_ALTERATION', data_type 'CONTINUOUS'; expected one of [linear_CNA]; value encountered: 'brca_tcga_pub_linear_CNA'

ERROR: meta_expression_merged_median_Zscores.txt: Missing field 'show_profile_in_analysis_tab' in meta file

WARNING: meta_study.txt: Unrecognized field in meta file; value encountered: 'data_filename'
INFO: meta_study.txt: Validation of meta file complete

ERROR: meta_mutations_extended.txt: Invalid stable id for genetic_alteration_type 'MUTATION_EXTENDED', data_type 'MAF'; expected one of [mutations]; value encountered: 'brca_tcga_pub_mutations'

ERROR: meta_methylation_hm27.txt: Invalid stable id for genetic_alteration_type 'METHYLATION', data_type 'CONTINUOUS'; expected one of [methylation_hm27, methylation_hm450]; value encountered: 'brca_tcga_pub_methylation_hm27'

ERROR: meta_miRNA_median_Zscores.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, rna_seq_mrna_capture_Zscores]; value encountered: 'brca_tcga_pub_mirna_median_Zscores'

ERROR: meta_mutsig.txt: Could not determine the file type. Did not find expected meta file fields. Please check your meta files for correct configuration.

ERROR: meta_expression_miRNA.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'CONTINUOUS'; expected one of [mrna_U133, rna_seq_mrna, rna_seq_v2_mrna, mirna, mrna, rna_seq_mrna_capture]; value encountered: 'brca_tcga_pub_mirna'

ERROR: meta_expression_median.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, rna_seq_mrna_capture_Zscores]; value encountered: 'brca_tcga_pub_mrna'

ERROR: brca_tcga_pub_meta_cna_hg19_seg.txt: Could not determine the file type. Did not find expected meta file fields. Please check your meta files for correct configuration.

ERROR: meta_CNA.txt: Invalid stable id for genetic_alteration_type 'COPY_NUMBER_ALTERATION', data_type 'DISCRETE'; expected one of [cna, cna_rae, cna_consensus, gistic]; value encountered: 'brca_tcga_pub_gistic'

ERROR: meta_rppa.txt: Could not determine the file type. Please check your meta files for correct configuration.; value encountered: 'genetic_alteration_type: PROTEIN_LEVEL, datatype: CONTINUOUS'

ERROR: meta_mRNA_median_Zscores.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, rna_seq_mrna_capture_Zscores]; value encountered: 'brca_tcga_pub_mrna_median_Zscores'

ERROR: -: No sample attribute file detected
Validation of study failed.



Ron
------------------------------------------------------------------
Ron Zhu
BioInfoRx, Inc.
E-mail: r...@bioinforx.com
Phone: (01) (608) 234-7752
Web: http://bioinforx.com/
Office: University Research Park, 510 Charmany Dr, Suite 275A, Madison WI 53719
--------------------------------------------------------------------------------
BioInfoRx provides Lab Essential Software Solutions (LESS) for scientific research laboratories, covering daily lab management, data analysis, and custom system needs.




Pieter Lukasse

unread,
May 1, 2017, 9:39:50 AM5/1/17
to cBioPortal for Cancer Genomics Discussion Group, r...@bioinforx.com
Hi Ron,

These errors are expected since we are still in the process of getting more studies to pass validation. The validation script is relatively new and some legacy studies have not yet been adjusted to comply to it. So it is hard to tell which of these legacy studies have real errors and which are just having minor data format mismatches.

Anyway, there is a list of studies that should load without any errors. @Sander, can you point Ron to these studies? 

Best regards,

Pieter 

sand...@thehyve.nl

unread,
May 4, 2017, 12:10:10 PM5/4/17
to cBioPortal for Cancer Genomics Discussion Group, r...@bioinforx.com
Hi Ron,

These TCGA studies from https://github.com/cBioPortal/datahub/tree/master/public should be loadable in the current version (1.5.1) of cBioPortal without errors:

acc_tcga.tar.gz
blca_tcga.tar.gz
cesc_tcga.tar.gz
chol_tcga.tar.gz
dlbc_tcga.tar.gz
esca_tcga.tar.gz
gbm_tcga.tar.gz
hnsc_tcga.tar.gz
kich_tcga.tar.gz
kirc_tcga.tar.gz
kirp_tcga.tar.gz
laml_tcga.tar.gz
lgg_tcga.tar.gz
lihc_tcga.tar.gz
luad_tcga.tar.gz
lusc_tcga.tar.gz
meso_tcga.tar.gz
paad_tcga.tar.gz
pcpg_tcga.tar.gz
prad_tcga.tar.gz
sarc_tcga.tar.gz
stad_tcga.tar.gz
tgct_tcga.tar.gz
thca_tcga.tar.gz
thym_tcga.tar.gz
ucec_tcga.tar.gz
ucs_tcga.tar.gz
uvm_tcga.tar.gz

We have updates ready for the code and data (https://github.com/cBioPortal/datahub/pull/31#issuecomment-293232516) to validate and load these remaining 4 TCGA studies, so you can expect these to be loadable soon:

brca_tcga.tar.gz
coadread_tcga.tar.gz
ov_tcga.tar.gz
skcm_tcga.tar.gz

The tcga_pub studies will be updated at a later time. During loading the studies you will experience a number of warnings, that are mostly caused by new/deprecated Entrez Gene IDs (easily viewable with the report from the -html argument of metaImport.py). You can overwrite these warnings with -o.

Let me know if it works,

Sander
Data Scientist



Op maandag 1 mei 2017 15:39:50 UTC+2 schreef Pieter Lukasse:

bart.le...@gmail.com

unread,
Nov 30, 2017, 10:34:50 AM11/30/17
to cBioPortal for Cancer Genomics Discussion Group
Hello,

I am getting several errors using the metaimport script for several datasets.
When I tried the list of datasets suggested most of them work except for the following:

coadread_tcga.tar.gz
hnsc_tcga.tar.gz
luad_tcga.tar.gz
lusc_tcga.tar.gz
meso_tcga.tar.gz
pcpg_tcga.tar.gz
prad_tcga.tar.gz
skcm_tcga.tar.gz
thca_tcga.tar.gz
ucs_tcga.tar.gz

Most errors seem to be related to :" Value in column 'Validation_Status' is invalid; value encountered: '---'"- logs attached.

Second question how can I easily import datasets other than tcga? For some datasets I can import them after fixing the meta files, but I assume that for others I have to modify more/data itself?

Many thanks,

Bart


Op donderdag 4 mei 2017 18:10:10 UTC+2 schreef Sander Tan:
logs.zip

Ritika Kundra

unread,
Nov 30, 2017, 3:55:22 PM11/30/17
to bart.le...@gmail.com, cBioPortal for Cancer Genomics Discussion Group
Hi Bart,

We will recheck the list above and update you once they are error free.
Can you send us the studies other than tcga that fail?

Thanks,
Ritika

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+unsubscribe@googlegroups.com.
To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/a2029849-b783-4dcb-ac7a-71b26639f032%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

bart.le...@gmail.com

unread,
Dec 1, 2017, 5:24:20 PM12/1/17
to cBioPortal for Cancer Genomics Discussion Group
Thanks! Yes so far I tried to include some of the major studies (with ~>200 samples);

1. MSK_impact gave an error at first but after fixing the metafile it works.
2. The following studies work: hcc_inserm_fr_2015, odg_msk_2017, stes_tcga_pub, brca_igr.
3.Most other datasets have a missing sample attribute file, which I'm not sure how to fix easily, there are some datasets with other errors as well though. (logs attached)

Kind regards,

Bart

Op donderdag 30 november 2017 21:55:22 UTC+1 schreef Ritika Kundra:
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.

To post to this group, send email to cbiop...@googlegroups.com.
Visit this group at https://groups.google.com/group/cbioportal.
2.zip
Reply all
Reply to author
Forward
0 new messages