gemini builtin functions not compatible with merged vcf

Skip to first unread message

CJ Yoon

May 19, 2015, 10:01:19 PM5/19/15
Dear GEMINI developers,

Thanks for your work on Gemini. It seems to be a great tool for looking at the data with great annotations. However, I'm running into a problem when I'm trying to use the builtin functions such as de_novo or autosomal recessive. 

My goal is to find potential de novo or autosomal recessive or compound heterozygous variants from a given trio exome sequencing data. 
I have the bam files for all three individuals. I used bcftools to call variants and into vcf files. These vcf files were then combined into a single vcf file using GATK's CombineVariants and phased using PhaseByTransmission with the .ped file. 

The resulting vcf file of the trio was then preprocessed as per the gemini manual with snpEff.jar and loaded to the SQL database. 

However, when I used the builtin de_novo or autosomal recessive function nothing but the header information is returned. Manual querying with --gt-filter "gt_types.91 == HET and gts.92 == './.' and gts.94 == './.'" returned potential de novo variants. (here 91, 92, 94 are my sample names) 
Querying with  --gt-filter "gt_types.91 == HET and gts.92 == HOM_REF and gts.94 == HOM_REF'" returned no value. 

I believe ./. is created when vcf files are being merged on those locus where there is variant reported for one of the individual but not the others. Since there were no information on this locus GATK CombineVariant just marks as no information. If no information is reported this would indicate that this locus is probably a wild type (it could also indicate not sufficient reads to call variant etc.) So my manual way of querying could work, but I don't think it's the ultimate solution to the problem. 

If you have a better recommendation on creating the vcf file of the trio and the use of GEMINI please let me know. 


Aaron Quinlan

May 19, 2015, 10:22:24 PM5/19/15
to CJ Yoon,
Hi CJ,

I think you have precisely stated the source of the problem. You should use GATK to call variants using all three BAM files at once to avoid this "no call" scenario.


CJ Yoon

May 21, 2015, 5:09:48 PM5/21/15
Thanks, it worked after I followed the Best Practice workflow of GATK, calling variants jointly between samples. 

Devin Liu

Jan 24, 2018, 11:26:56 AM1/24/18
to gemini-variation
Hi CJ,
I also do the WGS trio analysis recently,
I followed the GATK pipeline and use HaplotypeCaller GVCF to call variant jointly.
But I do not know how to label these samples in the following steps.
Can you help me, or instruct me how to do next?

在 2015年5月21日星期四 UTC-5下午4:09:48,CJ Yoon写道:
Reply all
Reply to author
0 new messages