Handling multiple vcf files

416 views
Skip to first unread message

Vaishali Chakraborty

unread,
May 26, 2022, 7:29:44 AM5/26/22
to cBioPortal for Cancer Genomics Discussion Group
Hi Team,

I was able to successfully install a local instance of cBioPortal and also import a study which has 159 samples. These 159 samples have individual vcf files.

I understand that we can use vcf2maf for converting vcf to maf files and then convert it to the data_mutations.txt file to import the mutations data in cBioPortal. However, as I have 159 vcf files, what are your recommendations on importing the data?

Should I combine all the vcf files and then convert to maf file OR convert individual vcfs to mafs and then merge them. Please let me know your recommendations and if there is a way to process multiple vcfs in one go.

Thank you for your help.

Best regards,
Vaishali

David Higgins

unread,
May 26, 2022, 9:54:31 AM5/26/22
to cBioPortal for Cancer Genomics Discussion Group
Hi Vaishali,

Either way could work, but merging individual mafs might be easier.

In our instance of cBioPortal (pedcbioportal.kidsfirstdrc.org), we merge individual mafs. At the same time, we already generate per-sample mafs as part of our standard pipeline anyway, so this is the best for our group.

Best,

David Higgins, Ph.D.
Informatics Program Manager
Center for Data-Driven Discovery in Biomedicine (D3b)
Children's Hospital of Philadelphia, USA

Vaishali Chakraborty

unread,
May 27, 2022, 2:05:12 AM5/27/22
to David Higgins, cBioPortal for Cancer Genomics Discussion Group
Hi David,

Thank you for your suggestion.

Is there any easy way to convert all the vcfs (maybe in one folder) to maf files with tumor sample id column populated?

Best regards,
Vaishali

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/7b4e2f8a-1f46-41b0-b293-9479079b9977n%40googlegroups.com.

Vaishali Chakraborty

unread,
May 27, 2022, 5:34:02 AM5/27/22
to David Higgins, cBioPortal for Cancer Genomics Discussion Group
Hi David and Team,

I was able to convert vcf 2 maf and combine all the mafs. However, I have one concern, the vcf files we have created are using RefSeq hg19 version but vcf2maf uses vep which in turn uses Ensembl GRCh37 version. This may lead to some changes in the maf results.

Is it possible to provide our own fasta file? Apart from the fasta, there are multiple patch files (attached image) which are present in the vepdata/homo_sapiens/102_GRCh37 folder, will there be changes required here as well? Is there an equivalent RefSeq build/annotations to download and link it to vep?

Best regards,
Vaishali

2022-05-27_15-01.png

Vaishali Chakraborty

unread,
May 30, 2022, 9:44:27 PM5/30/22
to David Higgins, cBioPortal for Cancer Genomics Discussion Group
Hi Team,

Kindly let me know if there is any update on my previous queries.

Best regards,
Vaishali 

David Higgins

unread,
Jun 1, 2022, 11:54:35 AM6/1/22
to Vaishali Chakraborty, cBioPortal for Cancer Genomics Discussion Group

Hi Vaishali,

 

Since you seem to be using 102 specifically, we’d recommend using this cache to run it: http://ftp.ensembl.org/pub/release-102/variation/vep/homo_sapiens_refseq_vep_102_GRCh37.tar.gz

 

If you need to, you can run VEP separately if there is some kind of hard-coded version in there - then run the vcf2maf step and skip the built-in annotation.

 

Best,

 

David M. Higgins, Ph.D. | (he/him)

Informatics Program Manager
Center for Data-Driven Discovery in Biomedicine (D3b)
Children’s Hospital of Philadelphia, USA

Vaishali Chakraborty

unread,
Jun 3, 2022, 12:17:46 AM6/3/22
to David Higgins, cBioPortal for Cancer Genomics Discussion Group
Hi David,

Thank you for your response.

Yes, I did check that there is an ability to provide custom annotation files - gff3 and fasta and run Ensemble VEP. Need to explore that part but before that will try using the 102 version that you have recommended and see if that helps.

Best regards,
Vaishali

debr...@mskcc.org

unread,
Jun 9, 2022, 7:47:41 PM6/9/22
to vaishali.ch...@gmail.com, da...@d3b.center, cbiop...@googlegroups.com, och...@mskcc.org

Hi Vaishali,

 

We have developed a pipeline in to combine VCF files and annotate them with Genome Nexus. This is what we use for the public cBioPortal (cbioportal.org) and the AACR GENIE cBioPortal instances (genie.cbioportal.org)

 

CC’ing Angelica who developed this. The repo can be found here:

 

https://github.com/genome-nexus/annotation-tools

 

Hope that helps!

Best wishes,

Ino

To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/CAB5ADuvuPTX6LW6ax9KsK3vgYDAL3t7xVx6k0%3DHatRO_Q51QiA%40mail.gmail.com.



*** Only open attachments or links from trusted senders. Report phishing to inf...@mskcc.org ***

 

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Vaishali Chakraborty

unread,
Jun 14, 2022, 12:44:42 AM6/14/22
to debr...@mskcc.org, David Higgins, cBioPortal for Cancer Genomics Discussion Group, och...@mskcc.org
Hi Ino,

Thank you for your response.

These scripts will be really helpful for us. However, my concern still is can we provide some custom annotation files to annotate the VCFs with these scripts?

So for example, I have a fasta and the gff3 annotation file downloaded from NCBI, will I be able to use these files specifically for the Genome Nexus?

Sorry for troubling you with this question, I am suffering from Covid 19 symptoms and not able to read through too much. Looking forward to hearing from you.

Best regards,
Vaishali


Vaishali Chakraborty

unread,
Jun 24, 2022, 1:37:57 AM6/24/22
to debr...@mskcc.org, David Higgins, cBioPortal for Cancer Genomics Discussion Group, och...@mskcc.org
Hi All,

Any help and update on this will be great.

Thanks and regards,
Vaishali
Reply all
Reply to author
Forward
0 new messages