Re: [vcfanno] Re: GnomAD constraint score annotation

26 views
Skip to first unread message
Message has been deleted
Message has been deleted

Brent Pedersen

unread,
Jul 6, 2021, 5:34:27 AM7/6/21
to Genomixer, vcfanno
Hi Sugi,
vcfanno can only annotate by position, not by gene id. Your "GnomAD
constraint scores text file" is in bed format, so you can bgzip and
tabix and
then annotate by position with that using vcfanno conf with something like:

[[[annotation]]
file="gnomad-constraint-scores.bed.gz"
columns=[6]
names=["gnomad_constraint_oe"]
ops=["max"]

hope this helps,
-Brent

On Mon, Jul 5, 2021 at 10:13 PM Genomixer <s.siva...@uni-bonn.de> wrote:
>
> Format of annotated VCF File (ensembl identifier is in the 5th column of the annotation field).
>
> chr1 54561087 . T *,G 872 . AC=1,1;AF=0.5,0.5;AN=2;DP=53;ExcessHet=3.0103;FS=0;MLEAC=1,1;MLEAF=0.5,0.5;MQ=54.47;QD=17.44;SOR=1.157;ANN=G|intron_variant|MODIFIER|ACOT11|ENSG00000162390|transcript|ENST00000371316.3|protein_coding|1/16|c.33+12745T>G||||||;RawScore=0.0497;PHRED=2.233;non_neuro_AF_nfe=.,0.4815;non_neuro_AC_nfe=.,6430;non_neuro_AN_nfe=13354;non_neuro_nhomalt_nfe=.,1529
>
> Format of the GnomAD constraint scores text file:
>
> CHR POS END GENE GENE_ID oe_lof oe_lof_lower oe_lof_upper
> 1 895967 901095 KLHL17 ENSG00000187961 0.90749 0.664 1.259
> 1 901877 911245 PLEKHN1 ENSG00000187583 0.80467 0.585 1.124
> 1 910579 917497 C1orf170 ENSG00000187642 1.1826 0.837 1.686
> 1 1017198 1051741 C1orf159 ENSG00000131591 0.69466 0.394 1.302
>
> The idea is to annotate the data by the Ensemble gene identifier rather than the position because of multiple overlapping.
>
> Thank you very much.
>
> On Monday, July 5, 2021 at 9:20:01 PM UTC+2 Genomixer wrote:
>>
>>
>> Dear Brent,
>>
>> just one question regarding the gene based annotation of GnomAD constraint scores (oe_lof). I have multiple WGS samples annotated with SNPeff and Ensembl gene identifier.
>>
>> Now I would like to annotate GnomAD constraint scores using this dataset:
>>
>> https://storage.cloud.google.com/gcp-public-data--gnomad/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz?_ga=2.137613680.-1935284439.1625467768
>>
>> Can you basically help to setup vcfanno and to create a config for this approach? I think this will be helpful for many other users as well. So far I have generated a tabix indexed tab delimited file of the constraint scores. But dont know how to set the Ensembl gene identifier as primary key for the annotation.
>>
>> Many thanks and looking forward to hear from you soon.
>>
>> Best regards,
>> Sugi
>
> --
> You received this message because you are subscribed to the Google Groups "vcfanno" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vcfanno+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/vcfanno/06e2809f-73ee-4538-87a2-053e050f713bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages