Dear GEMINI developers,
Thanks for your work on Gemini. It seems to be a great tool for looking at the data with great annotations. However, I'm running into a problem when I'm trying to use the builtin functions such as de_novo or autosomal recessive.
My goal is to find potential de novo or autosomal recessive or compound heterozygous variants from a given trio exome sequencing data.
I have the bam files for all three individuals. I used bcftools to call variants and into vcf files. These vcf files were then combined into a single vcf file using GATK's CombineVariants and phased using PhaseByTransmission with the .ped file.
The resulting vcf file of the trio was then preprocessed as per the gemini manual with snpEff.jar and loaded to the SQL database.
However, when I used the builtin de_novo or autosomal recessive function nothing but the header information is returned. Manual querying with --gt-filter "gt_types.91 == HET and gts.92 == './.' and gts.94 == './.'" returned potential de novo variants. (here 91, 92, 94 are my sample names)
Querying with --gt-filter "gt_types.91 == HET and gts.92 == HOM_REF and gts.94 == HOM_REF'" returned no value.
I believe ./. is created when vcf files are being merged on those locus where there is variant reported for one of the individual but not the others. Since there were no information on this locus GATK CombineVariant just marks as no information. If no information is reported this would indicate that this locus is probably a wild type (it could also indicate not sufficient reads to call variant etc.) So my manual way of querying could work, but I don't think it's the ultimate solution to the problem.
If you have a better recommendation on creating the vcf file of the trio and the use of GEMINI please let me know.
Thanks,
Chris