model behind freemix

501 views
Skip to first unread message

Yu Huang

unread,
Aug 30, 2012, 6:26:55 PM8/30/12
to verif...@googlegroups.com
Hi Hyun,

As you may know, we are working on monkey data without proper genotype to begin with, so the freemix parameter is really the only one I could rely on to see if contamination happens to any sample.

I read through these two wikis.

 [FREEMIX] >> 0.02, meaning 2% or more of non-reference bases are observed in reference sites, 


Couldn't find how the freemix parameter was estimated. Was it estimating how many non-reference alleles were found at known polymorphic sites from the input VCF (our input VCF doesn't cover the same individuals. they were discovered from a few high-coverage individuals and plus various filters.)? I assume no haplotype/LD info is used (couldn't really infer from the short reads anyway).  So potentially, one individual quite different from the reference, could have a quite high freemix estimate?

Also it ignores heterozygous sites at this moment, right? http://genome.sph.umich.edu/wiki/Verifying_Sample_Identities_-_Implementation talked about the potential to use het sites:

If we also wish to use heterozygous sites, rather than limiting our comparison to reference homozygous sites, we could use:


Thanks,
yu


On Wed, Jun 20, 2012 at 10:27 AM, Hyun Min Kang <hmk...@umich.edu> wrote:
Dear verifyBamID Users,

A new version of verifyBamID was released athttp://genome.sph.umich.edu/wiki/VerifyBamID, including a bug fix related to the sequence+array based estimate

• Fixed a bug of incorrect estimate of contamination when --chip-full option was used (Thanks to Richard Smith)

• Fixed a bug of incorrect per-readgroup output in --chip-* parameter

If you ran verifyBamID with sequence+array method, I suggest to rerun them to get more accurate results. It is unlikely to rediscover new contamination, but may reduce the # of contaminated samples detected previously by --chip-* parameters.

Thanks,
Hyun.



--
Yu Huang
Postdoc in Nelson Freimer Lab,
Gonda 3554A, Center for Neurobehavioral Genetics, UCLA
Office Phone: +1.310-794-9598
Skype ID: crocea
http://www-scf.usc.edu/~yuhuang

Hyun Min Kang

unread,
Aug 30, 2012, 8:01:19 PM8/30/12
to verif...@googlegroups.com
Hi Yu,

The key idea of FREEMIX estimate is to use excessive heterozygosity to estimate the level of contamination. Especially for common SNPs, you will observe higher fraction of heterozygous alleles than 2*p*(1-p), and it turns out that you can quantify the contamination very well if you know the population allele frequency already. If you do not have accurate population allele frequency information, than it would be harder to estimate FREEMIX parameters using verifyBamID. 

Thanks,
Hyun.

--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group, send email to verifybamid...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Yu Huang

unread,
Aug 31, 2012, 12:26:29 AM8/31/12
to verifybamid
ok thanks. i think i'll plot it against the heterozygosity per individual and let you know.
Reply all
Reply to author
Forward
0 new messages