Hi Brent,
I'm using vcfanno to combine all relevant annotations from various sources in a single VCF file. Among others, we include SnpEff annotations, which - as I'm sure you are aware of - produce an annotation block for each transcript, consisting of 13-15 subfields, which are always in the same order.
example:
MQ=60.00;MQRankSum=0.00;AC=2;ANN=G|missense_variant|MODERATE|MUTYH|MUTYH|transcript|NM_001128425.1|protein_coding|12/16|c.1014G>C|p.Gln338His|1230/1930|1014/1650|338/549||
Again, as I'm sure you know, the ANN annotation block always emits annotation in the same order as follows:
ANN=Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
We want to extract specific subfields from the "ANN=" annotation block and add them as separate INFO field annotations in another VCF (e.g. HGVS.c=c.1014G>C;HGVS.p=Gln338His etc.), which can subsequently be extracted as separate annotations to e.g. tab-delimited files used by others in our lab... Is that possible?
We can do this using other programs, e.g. VariantTools, but that particular program kind of sucks at handling e.g. multiallelic variants, so we need a more robust way of doing this...
Thanks in advance!
Best Regards,
Mads Malik Aagaard Jørgensen.