Converting cBioPortal TARGET datasets to ANNOVAR format - VCF annotation showing no matches

25 views
Skip to first unread message

Fatemeh Shirazi

unread,
Jun 10, 2025, 9:49:34 AM6/10/25
to cBioPortal for Cancer Genomics Discussion Group

Dear cBioPortal Community,

I am a bioinformatician developing a somatic variant detection pipeline and need assistance with integrating cBioPortal data into my ANNOVAR workflow.I have successfully downloaded several TARGET datasets from cBioPortal datahub:

  • TARGET (Osteosarcoma): os_target_gdc.tar.gz
  • TARGET (B-Lymphoblastic Leukemia/Lymphoma - Phase II): bll_target_gdc.tar.gz
  • TARGET (Acute Myeloid Leukemia): aml_target_gdc.tar.gz
  • TARGET (Pediatric Acute Myeloid Leukemia): aml_target_2018_pub.tar.gz

I have extracted the mutation data from these datasets and attempted to convert them into a custom database format compatible with ANNOVAR. While the conversion process completes without errors, my VCF files show no matches when annotated against these custom databases. I have tested this with both individual samples and whole-exome VCF files. 

ANNOVAR command used:

bash
annovar/table_annovar.pl /home/db/input/gfan5730.vcf annovar/humandb/ \ -protocol cesc_tcga,civic_annovar,mocm,refGene,tcga_luad_mutations,variantsummeries \ -operation f,f,f,g,f,f -build hg19 -nastring . -vcfinput
Questions
  1. Data Format: Are there specific formatting requirements for converting cBioPortal mutation data to ANNOVAR-compatible databases?
  2. Coordinate Systems: Could there be issues with genomic coordinate systems or reference genome versions between the cBioPortal data and my VCF files?
  3. Database Structure: What is the expected file structure and format for custom ANNOVAR databases created from cBioPortal data?
  4. Validation: Are there recommended methods to validate that the database conversion was successful before running annotations?
Additional Information
  • Using hg19/GRCh37 reference genome
  • VCF files contain standard somatic variants from tumor samples
  • Other ANNOVAR databases (RefGene, etc.) are working correctly in the pipeline

Has anyone in the community successfully integrated cBioPortal mutation data or other modified databases such as civic etc. with ANNOVAR? Any guidance on troubleshooting this issue or recommendations for alternative approaches would be greatly appreciated.

Thank you for your time and expertise.

Prasanna Jagannathan

unread,
Jun 11, 2025, 11:18:54 PM6/11/25
to Fatemeh Shirazi, cBioPortal for Cancer Genomics Discussion Group
Hi Fatemeh

Thanks for contacting cBioPortal google group. 

In the Annovar commandline that was shared, there is no specific cBioPortal input file that can be discerned.

Can you please explain in detail the specific files that were used from the downloaded TARGET tar.gz dataset files?

Which files were used and which commands were run on these cBioPortal files?

Please reply only to cbiop...@googlegroups.com

thanks
Jag

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/22eb2fdc-bc07-4aed-b55e-9eeec67b648bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages