extract subfields of INFO field annotations

32 views
Skip to first unread message

Mads Malik Aagaard Jørgensen

unread,
Nov 15, 2018, 3:19:31 PM11/15/18
to vcfanno
Hi Brent,

I'm using vcfanno to combine all relevant annotations from various sources in a single VCF file. Among others, we include SnpEff annotations, which - as I'm sure you are aware of - produce an annotation block for each transcript, consisting of 13-15 subfields, which are always in the same order. 
example:
MQ=60.00;MQRankSum=0.00;AC=2;ANN=G|missense_variant|MODERATE|MUTYH|MUTYH|transcript|NM_001128425.1|protein_coding|12/16|c.1014G>C|p.Gln338His|1230/1930|1014/1650|338/549||

Again, as I'm sure you know, the ANN annotation block always emits annotation in the same order as follows:
ANN=Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

We want to extract specific subfields from the "ANN=" annotation block and add them as separate INFO field annotations in another VCF (e.g. HGVS.c=c.1014G>C;HGVS.p=Gln338His etc.), which can subsequently be extracted as separate annotations to e.g. tab-delimited files used by others in our lab... Is that possible?

We can do this using other programs, e.g. VariantTools, but that particular program kind of sucks at handling e.g. multiallelic variants, so we need a more robust way of doing this...

Thanks in advance! 

Best Regards,

Mads Malik Aagaard Jørgensen.

Brent Pedersen

unread,
Nov 16, 2018, 12:49:30 PM11/16/18
to mma...@gmail.com, vcf...@googlegroups.com
I agree this could be supported better, but you can probably do:

[[postannotation]]
name="gene"
field="ANN"
type="String"
op="lua:split(ann, '|')[4]"

and repeat for the other fields that you need.

split is defined by default so you can just use it.
hope that helps,
-Brent
> --
> You received this message because you are subscribed to the Google Groups "vcfanno" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vcfanno+u...@googlegroups.com.
> To post to this group, send email to vcf...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/vcfanno/32e94903-d49e-4980-b8a9-3441b7d200f0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages