Hi everyone,
If someone could help me with this I would be very grateful, thank you for reading.
I have a dataset of RADSeq data for 204 bumblebees. I want to find out if any of them are individually completely homozygous, as this would imply they are haploid and thus male: I want to remove any males from the dataset if there are any and have conformation others are diploid.
Originally, I carried out the de novo method, then used populations to export the snps to vcf format. I then used vcftools with the argument -het which, "Calculates a measure of heterozygosity on a per-individual basis. Specifically, the inbreeding coefficient, F, is estimated for each individual using a method of moments." For this, I got one individual with an F equal to 1, so I assumed it to be haploid and male.
I then used the reference based approach, twice, using reference genomes for B. terrestris and B. hortorum. Again, I used populations to export in vcf format, and used vcftools -het. The results were similar to the de novo method, but each of the 3 protocols. Notably though, the individual that had an F=1 from the de novo method, now has F=0.99853 from the B. terrestris genome referenced, and F=0.99698 from the B. hortorum genome referenced.
So my question is, why are these numbers different? Are they different because the de novo method missed some loci that are heterozygous and therefore is less accurate? Or is it something else?
Or if someone knows of a better method for finding homozygosity of individuals, could you let me know please?
I've attached the data files of the vcftools outputs and log files.
Thanks,
For some reasons the above wouldnt post with the attachments. trying to attach to this post instead..