vcfeval fails with multi-allelic records

154 views
Skip to first unread message

Alexandra Vatsiou

unread,
Dec 22, 2021, 4:31:50 PM12/22/21
to RTG Users
Hello, 

I am trying to run vcfeval on a germline sample that has multi-allelic records. 
I have attempted the following based on documentation: 

rtg vcffilter -i $file.vcf.gz -o out.vcf.gz --keep-expr "INFO.AF.split(',').some(function(af) {return af >= 0.1})"

Do I understand this command correctly, that it will split the AF in the INFO field and keep only the allele with >0.1? What about if both alleles in the multi-allelic record are >0.1

The output vcf still has multi-allelic records and rtg vcfeval still fails. I think this might be because the multi-allelic record AF is recorded for each sample.

Is there a way in rtg to split multi-allelic records? 

I was expecting that since vcfeval decomposes the variants wouldn't have this issue, but it does. Could you please help me understand why decomposition does not take this into account? 

Thank you. 

Best,

Alexandra



Len Trigg

unread,
Dec 22, 2021, 5:37:25 PM12/22/21
to Alexandra Vatsiou, RTG Users
Hi Alexandra,


On Thu, 23 Dec 2021 at 10:31, Alexandra Vatsiou <alex.v...@gmail.com> wrote:
I am trying to run vcfeval on a germline sample that has multi-allelic records. 

You should not have to do anything special for vcfeval to work with multi-allelic records. You will need to give more specific information on how you are calling vcfeval, what the baseline/called variants look like, and what error message you are getting.

 
I have attempted the following based on documentation: 

rtg vcffilter -i $file.vcf.gz -o out.vcf.gz --keep-expr "INFO.AF.split(',').some(function(af) {return af >= 0.1})"

Do I understand this command correctly, that it will split the AF in the INFO field and keep only the allele with >0.1? What about if both alleles in the multi-allelic record are >0.1

That command will keep only VCF records where at least one of the alleles has AF >= 0.1, other records will not be output. It doesn't split the records in any way (but as mentioned above, you should not need to).

Cheers,
Len.

Alexandra Vatsiou

unread,
Dec 23, 2021, 3:01:58 PM12/23/21
to Len Trigg, RTG Users
Hi Len, 

Thanks for the quick reply. 
I have used RTG Tools 3.12.1 and rtg-tools-3.10.1 on a couple different samples and I get an error like the following:
Evaluation too complex (38 unresolved paths, 10000001 iterations) at reference region chr1:XXX-XXX. Variants in this region will not be included in results.
Error: Invalid numeric value "0.667,0.333" in "AF" for VCF record XXX


Issue is resolved when I split the multi-allelic records.

Thanks,
Alexandra

Len Trigg

unread,
Dec 23, 2021, 4:21:04 PM12/23/21
to Alexandra Vatsiou, RTG Users
Hi Alexandra,

You haven't shown me your command line or example VCF records, so I am somewhat guessing as to what your problem is.

The most likely possibility is that you are using the AF field as your score field (i.e -f INFO.AF) - however the specified field must contain a single numeric value (which is not true for multiallelic records). That can be remedied by disabling ROCs, using a different score field entirely that is guaranteed to be a single numeric value, or preprocessing the VCF to ensure the score field is single-valued, depending on your specific use case.

Another possibility is that you are using vcfeval built-in variant decomposition (--decompose) and it has encountered a vcf record that is malformed (e.g if the number of components in an AF field doesn't match the expected this could cause problems when the decomposition tries to recalculate values for several annotations). To fix that you would be best to clean up the VCF.

Cheers,
Len.

Alexandra Vatsiou

unread,
Dec 23, 2021, 5:58:16 PM12/23/21
to Len Trigg, RTG Users
Hi Len, 

Apologies I forgot to add the command in the previous message. 
Here is the command, I was using both with --vcf-score-field AF and --decompose

rtg  vcfeval -c $vcf -b $vcf --no-gzip --decompose -t $ref  -o $out --sample $s1,$s1 --vcf-score-field AF

When I removed the --vcf-score-field AF, vcfeval worked OK on the original files without splitting multi-allelic records. 
Thanks,
Alexandra

Reply all
Reply to author
Forward
0 new messages