"The key idea of FREEMIX estimate is to use excessive heterozygosity to estimate the level of contamination. Especially for common SNPs, you will observe higher fraction of heterozygous alleles than 2*p*(1-p), and it turns out that you can quantify the contamination very well if you know the population allele frequency already. If you do not have accurate population allele frequency information, than it would be harder to estimate FREEMIX parameters using verifyBamID."
My question is
1. if you don't give input vcf file then how does it estimate population allele frequency to measure the heterozygosity?
1a. does verifyBAMID uses only BAM file to estimate contamination using sequence only method.
2. FREEMIX values can vary from 0-0.5 because the model assumes contamination as a mixture of two samples.
3. Is there any way I can determine gender mixing happened during sequencing? I feel the total number snps from chrX and ChrY is not enough to get good estimation of freemix parameter.
second question is on CHIPMIX
My understanding is CHIPMIX comes from sequence + array method. It uses Allele frequency from the input vcf file. Is that true?
1 lets say if the sample 1 is contaminated with 50% of sample 2. how does chipmix would look like?
--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
if you don't give input vcf file then how does it estimate population allele frequency to measure the heterozygosity?
Thanks!
Dan
The input VCF file contains (1) external genotype information and/or (2) allele frequency information as AF entry or AC/AN entries in the INFO field. (See | VCF specification for further details). If neither information is provided, verifyBamID will not work properly.
if you don't give input vcf file then how does it estimate population allele frequency to measure the heterozygosity?
FATAL ERROR -
--vcf [vcf file] required
What if the input vcf only includes one individual, the result from this situation could not be believed as true?
--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid+unsubscribe@googlegroups.com.
This is really helpful, thank you a lot.
I think my real question is the diversity of population in my samples, I have > 100 (296) individuals in my VCF file, but they are from 4 populations, my concern is will the mixed population will lead a wrong/un-accurate estimation for AF? I will definitely add AF INFO column in my VCF file.
One more question, when you mentioned different populations without AF INFO, did you mean mixed populations in one file? Or you keep one population in one file?
--