import from datahub fails

112 views
Skip to first unread message

Oliver Langenhorn

unread,
Jul 26, 2022, 8:02:55 AM7/26/22
to cBioPortal for Cancer Genomics Discussion Group
Trying to import some of the datahub datasets. Command:

find datahub/public -maxdepth 1 -type d -exec docker-compose run cbioportal metaImport.py -p /study/datahub/portalinfo -html /{}/importReport.html -s /{} \;

Getting a lot of:
ERROR: data_sv.txt: line 1: Missing column: Sample_ID; value encountered: 'Sample_Id, Site1_Hugo_Symbol, (...)'
ERROR: data_sv.txt: line 1: Fusion event requires "Site1_Exon" and "Site2_Exon" columns
ERROR: data_sv.txt: line 1: Fusion event requires "Site1_Ensembl_Transcript_Id" and "Site2_Ensembl_Transcript_Id" columns
ERROR: data_sv.txt: Invalid column header, file cannot be parsed

Of 346 datahub entries, 126 failed to import. How to import those datasets into cBioPortal v4.1.13?

Elena Garcia Lara

unread,
Jul 26, 2022, 9:31:10 AM7/26/22
to cBioPortal for Cancer Genomics Discussion Group
Hi,

As the errors indicate, the data_sv.txt files are missing the columns Site1_Exon, Site2_Exon, Site1_Ensemble_Transcript_Id and Site2_Ensemble_transcript_Id. And the column Sample_Id should be renamed Sample_ID. You can simply create four empty columns with the missing names. This way, the validation is successful in cBioPortal v4.1.13.

The cause of this problem is probably the move that cBioPortal is currently doing to phase out the fusion format, and use the structural variants format instead.

Best,
Elena.

Elena Garcia Lara

unread,
Jul 26, 2022, 9:38:58 AM7/26/22
to cBioPortal for Cancer Genomics Discussion Group
I forgot to add: while a pre-release, the cBioPortal version 4.1.16 allows uploading the data_sv.txt files without those changes (i.e. no Site1_Exon, etc columns, and Sample_Id left intact)

Oliver Langenhorn

unread,
Jul 27, 2022, 2:51:20 PM7/27/22
to cBioPortal for Cancer Genomics Discussion Group
Thanks Elena,

I have upgraded to 4.1.16 and indeed the data_sv.txt error disappears. However, another error is now preventing imports:

Loading gene panel profile matrix data to database..

ABORTED!
java.lang.RuntimeException: Gene panel cannot be found in database: WXS
at org.mskcc.cbio.portal.scripts.ImportGenePanelProfileMap.importData(ImportGenePanelProfileMap.java:149)
at org.mskcc.cbio.portal.scripts.ImportGenePanelProfileMap.run(ImportGenePanelProfileMap.java:95)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportGenePanelProfileMap.main(ImportGenePanelProfileMap.java:203)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during data loading step. Please fix the problem and run this again to make sure study is completely loaded.
Traceback (most recent call last):
  File "/usr/local/bin/metaImport.py", line 202, in <module>
    cbioportalImporter.main(args)
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 533, in main
    process_directory(jvm_args, study_directory, args.update_generic_assay_entity)
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 392, in process_directory
    import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 162, in import_study_data
    run_java(*args)
  File "/cbioportal/core/src/main/scripts/importer/cbioportal_common.py", line 985, in run_java
    raise RuntimeError('Aborting due to error while executing step.')
RuntimeError: Aborting due to error while executing step.

Would you have another recommendation on how to import studies with fake gene panel WXS?

Elena Garcia Lara

unread,
Jul 29, 2022, 7:35:18 AM7/29/22
to cBioPortal for Cancer Genomics Discussion Group
Hi, 

Some studies have gene panels for targeted sequencing.
You need to import them prior to loading the study. Gene panels can be found at datahub and can be downloaded in a similar way as studies using git lfs.

You can then import the gene panels using the command:

docker-compose run \
-v $PWD/study/my_gene_panel.txt:/panels/gene_panel.txt \
-w /cbioportal/core/src/main/scripts/ \
cbioportal \
\
  perl importGenePanel.pl \
  --data /panels/gene_panel.txt

Best,
Sander, Elena.


Oliver Langenhorn

unread,
Aug 2, 2022, 6:40:06 AM8/2/22
to cBioPortal for Cancer Genomics Discussion Group
Hi,

It seems there is no WXS panel in the link your provided, and that's the panel that is missing in my case.

Regards,
Reply all
Reply to author
Forward
0 new messages