recommended LD pruning for PCA?

Mike Miller

unread,

Mar 19, 2014, 3:16:12 PM3/19/14

to plink2 users

Speed et al. (2012) pointed out that SNPs in LD are implicitly
overweighted in GCTA-style mixed-model heritability analyses:

http://www.cell.com/AJHG/abstract/S0002-9297(12)00533-2

Do you think this matters at all for computation of principal components
to be used in characterizing ancestry of samples and as covariates in
GWAS?

Do any of you recommend LD pruning of markers before computing genetic
relationship matrices for principal components?

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

James Lee

unread,

Mar 19, 2014, 3:33:37 PM3/19/14

to Mike Miller, plink2 users

Sending again, my first attempt for some reason was not posted:

I think lack of LD pruning would matter if some of the statistically significant PCs turned out to reflect local LD rather than global structure induced by geography or lab artifacts. You probably would not want to partial out PCs that merely reflected LD.

It takes a large sample size and marker number to pick up such artifactual PCs. You can diagnose their presence by inspecting the "SNP loadings" and seeing if the biggest ones are all adjacent in the SNP panel. If no such PCs seem to be present, I don't see any harm in using the entire panel of SNPs.

If you do see some relatively prominent LD-based PCs, you can rerun smartpca with an option that allows pruning or downweighting of SNPs by LD. In fact, some of the smartpca's developers have proposed using this option to address the same problem raised by Speed et al.:

http://dx.plos.org/10.1371/journal.pgen.1003993

> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mike Miller

unread,

Mar 20, 2014, 12:55:22 PM3/20/14

to James Lee, plink2 users

Thanks, James. I didn't know about that new option in smartpca. I
haven't been using it lately because it is kind of awkward. Instead, I
just do it by hand using Octave eigs() function. I'll bet R can do that,
too. I use plink2 to decide which subjects are sufficiently distantly
related to form the set for eigenvector computation, then I project their
relatives onto those vectors. It has been a lot easier than using
smartpaca. But now they have new features so maybe I have to return to
it.

Mike

On Wed, 19 Mar 2014, James Lee wrote:

> I think lack of LD pruning would matter if some of the statistically
> significant PCs turned out to reflect local LD rather than global
> structure induced by geography or lab artifacts. You probably would not
> want to partial out PCs that merely reflected LD.
>
> It takes a large sample size and marker number to pick up such
> artifactual PCs. You can diagnose their presence by inspecting the "SNP
> loadings" and seeing if the biggest ones are all adjacent in the SNP
> panel. If no such PCs seem to be present, I don't see any harm in using
> the entire panel of SNPs.
>
> If you do see some relatively prominent LD-based PCs, you can rerun
> smartpca with an option that allows pruning or downweighting of SNPs by
> LD. In fact, some of the smartpca's developers have proposed using this
> option to address the same problem raised by Speed et al.:
>
> http://dx.plos.org/10.1371/journal.pgen.1003993
>
> On Mar 19, 2014, at 2:16 PM, Mike Miller <mbmi...@gmail.com> wrote:
>

Reply all

Reply to author

Forward