Are we sure of accuracy of the scores/loadings?

105 views
Skip to first unread message

Florian Privé

unread,
Nov 29, 2016, 9:19:09 AM11/29/16
to flashpca-users
Hello,

Am I doing something wrong or the algo can't give perfect accuracy?

I tried flashpcaR on a pruned POPRES data set (1385 individuals and 91919 SNPs).

The command I used is:

testFlash <- flashpca(X = geno, method = "svd", stand = "binom", do_loadings = TRUE)

I compared it with:

p <- colMeans(geno) / 2
sd
<- sqrt(2 * p * (1 - p))
geno2
<- sweep(sweep(geno, 2, 2 * p, '-'), 2, sd, '/')
true <- svd(geno2, nu = 10, nv = 10)

The results are not equal (I should get a neater cross):

Gad Abraham

unread,
Nov 29, 2016, 10:08:50 PM11/29/16
to flashpca-users
Hi Florian,

The 'binom' standardisation is the old Eigensoft definition of sd = sqrt(p * (1 - p)). To get sqrt(2 * p * (1 - p)) you need "stand='binom2'".

Also, if you increase the number of extra dimensions (e.g., nextra=50) you'll get more accurate results.

Cheers,
Gad

Florian Privé

unread,
Nov 30, 2016, 5:57:28 AM11/30/16
to flashpca-users
Hi Gad,

If I am correct, changing the scaling to "binom2" would just change the singular values (D in X = U D Vt), which I am not interested in.
Moreover, changing nextra to 50 didn't improve the accuracy of the results.

Best,
Florian

Gad Abraham

unread,
Nov 30, 2016, 6:09:08 AM11/30/16
to Florian Privé, flashpca-users
If you post results with verbose=TRUE, we can see if there are convergence issues etc.

In any case, the upcoming version of flashpcaR will switch to a faster and more accurate algorithm based on RSpectra.

--
You received this message because you are subscribed to the Google Groups "flashpca-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flashpca-users+unsubscribe@googlegroups.com.
To post to this group, send email to flashpca-users@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/flashpca-users/63f931fc-c628-4193-8147-19cd5033abed%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Florian Privé

unread,
Nov 30, 2016, 7:16:10 AM11/30/16
to flashpca-users
[Wed Nov 30 13:12:07 2016] Transpose: no
[Wed Nov 30 13:12:07 2016] standardizing matrix (BINOM) p: 87025
[Wed Nov 30 13:12:08 2016] dim(Y): 1385 x 60
[Wed Nov 30 13:12:08 2016] dim(X): 1385 x 87025
[Wed Nov 30 13:12:08 2016] Trace(K): 174648 (N: 1385)
[Wed Nov 30 13:12:08 2016] iter 0 (orthogonalising) 0.00144023
[Wed Nov 30 13:12:10 2016] iter 1 (orthogonalising) 0.00132834
[Wed Nov 30 13:12:13 2016] iter 2 (orthogonalising) 0.00113871
[Wed Nov 30 13:12:15 2016] iter 3 (orthogonalising) 0.00118289
[Wed Nov 30 13:12:17 2016] iter 4 (orthogonalising) 0.0012768
[Wed Nov 30 13:12:19 2016] iter 5 (orthogonalising) 0.00115752
[Wed Nov 30 13:12:22 2016] iter 6 (orthogonalising) 0.00096402
[Wed Nov 30 13:12:24 2016] iter 7 (orthogonalising) 0.00086798
[Wed Nov 30 13:12:26 2016] iter 8 (orthogonalising) 0.000843568
[Wed Nov 30 13:12:29 2016] iter 9 (orthogonalising) 0.000723562
[Wed Nov 30 13:12:31 2016] iter 10 (orthogonalising) 0.000578304
[Wed Nov 30 13:12:33 2016] iter 11 (orthogonalising) 0.000602857
[Wed Nov 30 13:12:35 2016] iter 12 (orthogonalising) 0.000626901
[Wed Nov 30 13:12:38 2016] iter 13 (orthogonalising) 0.000867501
[Wed Nov 30 13:12:40 2016] iter 14 (orthogonalising) 0.000386019
[Wed Nov 30 13:12:42 2016] iter 15 (orthogonalising) 0.000312457
[Wed Nov 30 13:12:44 2016] iter 16 (orthogonalising) 0.000506404
[Wed Nov 30 13:12:47 2016] iter 17 (orthogonalising) 0.000626406
[Wed Nov 30 13:12:49 2016] iter 18 (orthogonalising) 0.00043404
[Wed Nov 30 13:12:51 2016] iter 19 (orthogonalising) 0.000505845
[Wed Nov 30 13:12:53 2016] iter 20 (orthogonalising) 0.000433905
[Wed Nov 30 13:12:56 2016] iter 21 (orthogonalising) 0.000530151
[Wed Nov 30 13:12:58 2016] iter 22 (orthogonalising) 0.000481853
[Wed Nov 30 13:13:00 2016] iter 23 (orthogonalising) 0.000409389
[Wed Nov 30 13:13:03 2016] iter 24 (orthogonalising) 0.000385137
[Wed Nov 30 13:13:05 2016] iter 25 (orthogonalising) 0.000361632
[Wed Nov 30 13:13:07 2016] iter 26 (orthogonalising) 0.000433815
[Wed Nov 30 13:13:09 2016] iter 27 (orthogonalising) 0.000240931
[Wed Nov 30 13:13:12 2016] iter 28 (orthogonalising) 0.000433697
[Wed Nov 30 13:13:14 2016] iter 29 (orthogonalising) 0.000505163
[Wed Nov 30 13:13:16 2016] iter 30 (orthogonalising) 0.000409389
[Wed Nov 30 13:13:18 2016] iter 31 (orthogonalising) 0.000288709
[Wed Nov 30 13:13:21 2016] iter 32 (orthogonalising) 0.000241091
[Wed Nov 30 13:13:23 2016] iter 33 (orthogonalising) 0.000144611
[Wed Nov 30 13:13:25 2016] iter 34 (orthogonalising) 0.000192918
[Wed Nov 30 13:13:28 2016] iter 35 (orthogonalising) 0.000144481
[Wed Nov 30 13:13:30 2016] iter 36 (orthogonalising) 0.000240949
[Wed Nov 30 13:13:32 2016] iter 37 (orthogonalising) 0.000337097
[Wed Nov 30 13:13:34 2016] iter 38 (orthogonalising) 0.000216845
[Wed Nov 30 13:13:37 2016] iter 39 (orthogonalising) 0.000433425
[Wed Nov 30 13:13:39 2016] iter 40 (orthogonalising) 0.000481514
[Wed Nov 30 13:13:41 2016] iter 41 (orthogonalising) 2.12373e-07
[Wed Nov 30 13:13:44 2016] QR begin
[Wed Nov 30 13:13:44 2016] dim(Q): 1385 x 60
[Wed Nov 30 13:13:44 2016] QR done
[Wed Nov 30 13:13:45 2016] dim(B): 60 x 87025
[Wed Nov 30 13:13:45 2016] SVD begin
[Wed Nov 30 13:13:46 2016] SVD done
[Wed Nov 30 13:13:46 2016] dim(Et): 60 x 60

Gad Abraham

unread,
Nov 30, 2016, 7:47:32 AM11/30/16
to Florian Privé, flashpca-users
Hmm, nothing unusual in the logs. Is that version 1.2.6?

It's working for the HapMap3 data included on github, try this to make sure:

> library(plink2R) # https://github.com/gabraham/plink2R
> dat <- read_plink("HapMap3/data", impute="random")
> library(flashpcaR)
> geno <- dat$bed
> testFlash <- flashpca(X = geno, method = "svd", stand = "binom", do_loadings = TRUE)
> p <- colMeans(geno) / 2
> sd <- sqrt(2 * p * (1 - p))
> geno2 <- sweep(sweep(geno, 2, 2 * p, '-'), 2, sd, '/')
> true <- svd(geno2, nu = 10, nv = 10)
> cor(true$u[,1], testFlash$vectors[,1])
> diag(cor(true$u, testFlash$vectors))
[1] -1.0000000 -1.0000000 1.0000000 -1.0000000 -1.0000000 -0.9999943
[7] -0.9997627 0.9993527 -0.9964212 0.9966891
> --
> You received this message because you are subscribed to the Google Groups
> "flashpca-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to flashpca-user...@googlegroups.com.
> To post to this group, send email to flashpc...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/flashpca-users/f3022066-5d4e-4d21-be13-785b5c6a412e%40googlegroups.com.

Florian Privé

unread,
Dec 3, 2016, 12:12:35 PM12/3/16
to flashpca-users
I get something similar indeed.
I fear that randomized PCA gets more inaccurate with larger samples.

Thanks for your time anyway.

Florian
Reply all
Reply to author
Forward
0 new messages