Providing SKAT residual values

33 views
Skip to first unread message

Mary

unread,
Jan 21, 2015, 10:28:47 AM1/21/15
to SKAT...@googlegroups.com
 Hello,

 I was wondering if it is possible to provide SKAT with the value of mu hat without running the SKAT_Null_Model ? I ask because I have very wide disparity in the data for one of my covariates, and I am seeing the following error message:

Error in solve.default(t(X1) %*% (X1 * pi_1)) :
  system is computationally singular: reciprocal condition number = 6.56745e-17

I suspect that this has to do with the regression, and I was hoping that I can circumvent this by running a Regression in R with glm to get the residuals, and providing this value to SKAT. 
I do not encounter this issue when I drop the covariate from my analysis. Also, I do not want to remove samples that contribute to the large disparity in my data.

Also, can you please tell me what pi_1 represents?

Thank you!

Seunggeun (Shawn) Lee

unread,
Jan 21, 2015, 4:27:15 PM1/21/15
to SKAT...@googlegroups.com
Hi Mary,

1) It is possible, but you need to read SKAT source code (SKAT_Null_Model and Get_SKAT_Residuals.logistic functions) to figure out the elements in SKAT_Null_Model outcomes. 

pi_1 = mu * (1 - mu), where mu is a vector of estimated mean of Y (mu = Prob(Y=1)). In Get_SKAT_Residuals.logistic function you can find 


    mod = lm(formula, data)
    X1 <- model.matrix(formula, data = data)
    glmfit = glm(formula, data = data, family = "binomial")
    betas = glmfit$coef
    mu = glmfit$fitted.values
    eta = glmfit$linear.predictors
    n.case = sum(glmfit$y)
    pi_1 = mu * (1 - mu)
    res = glmfit$y - mu

2) Have you checked whether your covariates matrix is singular? X1 is a covariates matrix (with the intercept), and the error shows that X1^T Pi X1 is singular, where
Pi is a diagonal matrix of pi_1. 

Thanks,
Shawn


John Wallace

unread,
Jan 22, 2015, 11:20:44 AM1/22/15
to SKAT...@googlegroups.com
Shawn,

I'm a programmer helping Anna out, and I think I can help shed some additional light on the issue that we were seeing.  The design matrix that we were using wasn't singular, but it was particularly ill-conditioned (condition number ~10^10).  Typically, when the matrix X1 is singular, R will drop one or more columns so that the resultant columns of the design matrix are linearly independent.

I found the following line in KMTest_Logistic.R which is where I think this is happening:
  W.1 = t(Z) %*% (Z * pi_1) - (t(Z * pi_1) %*%X1)%*%solve(t(X1)%*%(X1 * pi_1))%*% (t(X1) %*% (Z * pi_1)) # t(Z) P0 Z

From the above line, it looks like you're using the normal equations to solve the weighted least squares problem X1 * M = Z (weights of pi_1, solve for M).  Using the normal equations can have some undesired stability properties, as the condition number of t(X1) %*% X1 is the square of the condition number of X1, which in our case was singular to machine precision.  To avoid the error, you can use the SVD of the matrix X1 as in the following lines:
  S <- svd(sqrt(pi_1) * X)
  W.1 = t(Z) %*% (Z * pi_1) - (t(Z * pi_1) %*% X1) %*% (S$v %*% ((t(S$u) %*% (sqrt(pi_1) * Z)) / S$d) )

The two lines should give identical results in infinite precision math, but the second should have better stability using floating point operations.  We tested the fix on our data, and we no longer saw the "computationally singular" messages.

I've only looked at KMTest_Logistic.R, but I did a quick grep for "solve(", and the following files came up:
Joint_CommonRare.R
KMTest_Linear.R
KMTest_Logistic_VarMatching.R
KMTest_Optimal.R
KMTest_Optimal_VarMatching.R
Null_Model.R
SKAT_EMMAX.R

I suspect that they could also benefit from a similar approach.  I'd be happy to help out if you need it.

Thanks for your help,

John Wallace

Seunggeun (Shawn) Lee

unread,
Jan 30, 2015, 4:19:32 PM1/30/15
to SKAT...@googlegroups.com
Hi John,

Thanks a lot. I will consider to use your approach. 

Thanks,
Shawn

Tokhir Dadaev

unread,
Jun 9, 2015, 5:12:37 AM6/9/15
to SKAT...@googlegroups.com
Hi John

Do you have by any chance a forked github version of SKAT with your suggested updates?

Thanks
Tokhir

John Wallace

unread,
Jun 9, 2015, 10:49:09 AM6/9/15
to SKAT...@googlegroups.com
Tokhir,

I haven't implemented the updates for anything except the KMTest_Logistic.R code (and it's only the 2 line drop-in replacement).  If you point me to the correct github repo, I can fork it and create a pull request, but it might take me a few days to look at all of the "solve(" instances.

-John

Tokhir Dadaev

unread,
Jun 9, 2015, 11:02:10 AM6/9/15
to SKAT...@googlegroups.com
Not sure where Shawn keeps his code, but below is the automatically generated SKAT from cran:

https://github.com/cran/SKAT


Seunggeun (Shawn) Lee

unread,
Jun 14, 2015, 6:14:13 AM6/14/15
to SKAT...@googlegroups.com
Hi,

Yes you can find the latest version of SKAT from cran. 

Thanks,
Shawn
Reply all
Reply to author
Forward
0 new messages