question on kinship correction and model comparison

Fernando Andrade

unread,

Jun 25, 2024, 2:13:29 PM6/25/24

to R/qtl2 discussion

Dear Karl, I am going over qtl2 functions to understand all the steps that are taken in the analysis. I am a little lost in the correction for the kinship and how model comparison is done.

In general, from what I gathered, the steps are:

1 - do the eigen decomposition of the kinship

2 - premultiply trait matrix, genetic array and covariate matrix by the eigen vectors (or their transposed?)

3 - run, for each marker, a simple linear model with the "corrected" trait as a response variable and the "corrected" covariates plus "corrected" genotype array as independent variables. Get the residual sum of squares (RSS) for this fit

4 - Compare the RSS fromm #3 to a RSS obtained from a null model.

Am I correct in these general steps?

While reading the qtl2 functions I got lost in how to get the null. It is not simply dropping the genotype 3d array from the model and then comparing the two models, right? While trying to understand how to do this, I was looking into calc_nullLL_clean() and had two dificulties:

a) There is a comment saying to premultiply traits and geno by the transpose of eigenvectors, but the code shows a premultiplication using the eigenvectors (not the transposed)

b) I want's able to find the function Rcpp_calcLL_mat() that is called inside calc_nullLL_clean().

I tried reading the inner functions themselves to understand how the kinship correction and significance testing are done.

Thank you for your attention

Karl Broman

unread,

Jun 25, 2024, 8:06:14 PM6/25/24

to R/qtl2 discussion

It’s maybe easier to try to follow the code in lmmlite, which doesn’t have all the complexities used in qtl2.

https://kbroman.org/lmmlite/

We can’t compare RSS because of the mixed model; we need to calculate log likelihoods.

Rcpp_calcLL_mat is in the c++ code,
https://github.com/kbroman/qtl2/blob/main/src/lmm.cpp#L216

karl

Fernando Andrade

unread,

Jul 3, 2024, 9:29:46 AM7/3/24

to R/qtl2 discussion

Awesome, thank you for pointing to lmmlite, it is indeed easier to see what is going on.

About qtl2 I would like to clear up some questions, if I may.

Considering that I am running qtl2 with a kinship matrix, scan1() calls scan1_pg(), which calls scan_pg_onechr on its turn.

a) In scan_pg_onechr (lines 61:63), we are regressing the covariates (rotated) from X and Y (both rotated), respectively, right? So we are getting the residuals from this step?

b) (lines 64:66) the actual scan is done using the residuals of the rotated X and Y, right? So, after rotating X and Y, one can perform a scan without worrying for the correlation among individuals, is that correct?

Thank you for all your attention

Fernando

Karl Broman

unread,

Jul 3, 2024, 10:14:25 AM7/3/24

to R/qtl2 discussion

Yes, we first estimate the residual heritability under the null hypothesis (with no QTL and so no genotypes). In the case of "loco", this is done separately for each chromosome,

Second, we can use that estimate to then "rotate" the genotypes, additive covariates, and phenotypes, turning the problem into a weighted least squares problem.

Before doing the genotype scan, we eliminate the additive coviarates by regressing them out of the genotypes and the phenotypes.

So then we can do a scan over the chromosome, with no covariates.

Yes after the rotation we've turned the general least squares problem into a weighted least squares problem.

Another confusing aspect of this is allowing for weights in the scan, in which case the covariance matrix becomes where is a diagonal matrix with the reciprocal of the weights.

But we first deal with those weights by pre-multiplying everything by the square-root of the weights, which eliminates them from the problem.

Within scan1_pg_onechr, those weights have been eliminated, and the weights being considered are the ones calculated to handle the kinship business

https://github.com/rqtl/qtl2/blob/main/R/scan1_pg.R#L393-L408

karl

Fernando Andrade

unread,

Jul 3, 2024, 11:24:48 AM7/3/24

to R/qtl2 discussion

About the weights

a) what are those exactly? In my qtl2 run I never used those, so, I assume they are set to null in my runs.

b) setting weights = NULL avoid the usage of D in the cov matrix? Also, in scan1_pg (lines 149:152) are "avoided", right?

c) scan1_pg and scan_pg_onechr both have multiplication by weights, is that correct or am I missing something?

d) are the weigths used in qtl2 similar to those in the S vector (line117) in the lmmLite code?

Sorry for all those questions, but I am just trying to wrap my mind around this kind of analysis

Thank you for your attention

Fernando

Karl Broman

unread,

Jul 3, 2024, 11:38:38 AM7/3/24

to R/qtl2 discussion

If you leave weights=NULL, then a vector of 1's will be used. The D matrix becomes the identity matrix.

The input weights in scan1_pg are these weights, while those in scan1_pg_onechr they have already been eliminated and the weights are to deal with the polygenic effect.

This wasn't implemented in lmmlite.

karl

Fernando Andrade

unread,

Jul 3, 2024, 11:42:18 AM7/3/24

to R/qtl2 discussion

Thank you!

Reply all

Reply to author

Forward