vcf filter

155 views
Skip to first unread message

Dollina Dodani

unread,
Dec 14, 2020, 7:07:26 PM12/14/20
to RTG Users
Hi all, 

I am trying to filter vcf files that I generated by merging variant calls from Mutect2 and Strelka2 using the following command:

rtg vcffilter --snps-only --all-samples -i input.vcf.gz -o filtered.vcf.gz

where input.vcf.gz is the merged vcf file. I get the following error:

"Error: Specified filters require GT but no such field contained in record:". 

I understand that this is because Strelka2 does not add the GT field in its output vcf files. I previously worked with vcfeval using Strelka2 vcf files and used the "squash-ploidy" option but that is not available for filtering.  I was wondering if there is a quick workaround for this.  

Thanks, 
Dollina 

Len Trigg

unread,
Dec 14, 2020, 10:07:21 PM12/14/20
to Dollina Dodani, RTG Users
Hi Dollina,

The --snps-only flag uses the GT to check that the alleles actually being called in each sample (according to the GT) are SNPs, so the presence of extra non-SNP ALTs doesn't throw the filtering off. Since you don't have a GT, if you make the assumption that the test can be done purely by looking at the REF and ALT, you can approximate it like this:

rtg vcffilter -i input.vcf.gz -o filtered.vcf.gz --keep-expr 'REF.length == 1 && ALT[0].length == 1'

Your attached screenshot is showing a different issue, where the GT is required to be the first FORMAT field, but after merging the Strelka and Mutect calls this isn't being maintained. That looks like an actual bug in rtg vcfmerge triggered when you merge VCFs where the first one doesn't have a GT field and the second one does. We will fix that, but as a workaround, if you merge your VCF files in the opposite order (i.e. Mutect before Strelka), the output should always have GT first.

Cheers,
Len.




On Tue, 15 Dec 2020 at 13:01, Dollina Dodani <dollin...@gmail.com> wrote:
Hi all, 

I am trying to filter vcf files that I generated by merging variant calls from Mutect2 and Strelka2 using the following command:

rtg vcffilter --snps-only --all-samples -i input.vcf.gz -o filtered.vcf.gz

where input.vcf.gz is the merged vcf file. I get the following error (also attached as a screenshot): 

"Error: Specified filters require GT but no such field contained in record:". 

I understand that this is because Strelka2 does not add the GT field in its output vcf files. I previously worked with vcfeval using Strelka2 vcf files and used the "squash-ploidy" option but that is not available for filtering.  I was wondering if there is a quick workaround for this.  

Thanks, 
Dollina 

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/cad236b2-69e7-4fb4-be2f-0749c25b5a8an%40realtimegenomics.com.

Dollina Dodani

unread,
Dec 14, 2020, 11:02:50 PM12/14/20
to RTG Users, Len Trigg, RTG Users

Thank you, Len. I'll give that a try
Dollina
Reply all
Reply to author
Forward
0 new messages