Understanding projecting down

Skip to first unread message

Zach N

unread,
Apr 8, 2021, 11:12:19 AM4/8/21
to dadi-user
Hello all!

I am trying to understand projecting down a 1 population SFS to deal with missing genotype information. From what I understand, when you project down, you use all SNPs with calls for at least as many individuals as you project down to, and SNPs with fewer are discarded.

When I sum the SFS produced by projecting to the actual sample size, I get as expected an SFS with the number of the SNPs in our dataset that are called in all individuals. However, if I project down to 60% of the actual sample size, I get more SNPs, but not as many as if I counted up how many SNPs in our dataset had genotype calls for 60% of individuals. Is this expected and all sites with 60% calls are still being used, but the method doesn't necessarily produce an SFS with that number of SNPs?

Thank you all for your help!

Zach

Ryan Gutenkunst

unread,
Apr 8, 2021, 12:19:01 PM4/8/21
to dadi...@googlegroups.com
Hello Zach,

Yes, this is expected. The explanation is that when you project down, SNP frequencies get “smeared out” by binomial sampling. For example, imagine you have a SNP observed in 1 out of 10 sampled chromosomes. If you were to subsample 6 chromosomes, often that SNP won’t be segregating in the subsample. A similar thing happens in projection, where we’re averaging over all possible subsampling.

Best,
Ryan
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/2f71e4bb-d72b-4ed2-8b5a-afc3f516667bn%40googlegroups.com.

Zach N

unread,
Apr 8, 2021, 12:48:38 PM4/8/21
to dadi-user
Thank you for the explanation, Ryan! That makes a lot of sense and clears things up.

Best,
Zach
Reply all
Reply to author
Forward
0 new messages