Imputation with qtl2

Bowen Jones

unread,

Nov 24, 2020, 9:21:23 AM11/24/20

to R/qtl2 discussion

Hi Dr. Broman,

As I previously mentioned, we have data from an RIL population that we wish to impute using qtl2. We are implementing a dglm model which has no tolerance to missing values whatsoever, so we require a complete implementation. Additionally, the existing data has been verified with multiple methods, so we are confident in their accuracy and wish to retain their original values.

I used the viterbi(), sim_geno() and maxmarg() functions on our data and found that they all update the existing data and provide 3 output values as opposed to our desired 2 ("A" or "B"). Is there a way to accomplish both of our goals?

Thanks,

Bowen Jones

Karl Broman

unread,

Nov 24, 2020, 10:38:20 AM11/24/20

to R/qtl2 discussion

Dealing with missing genotype information has been a fundamental part of QTL analysis (for example the Lander and Botstein (1989) paper on interval mapping). If you are going to use a regression method that doesn't allow for missing covariate information, my main suggestion would be to use the approach of Haley and Knot (1992), which is to use genotype probabilities in place of the genotypes themselves.

If you need hard calls of genotypes, then the three basic methods are the ones you mentioned. But the main approach we take is to separate the sequence of true underlying genotypes, G_1, G_2, ..., G_M, from the observed marker genotypes O_1, O_2, ..., O_M, allowing for some constant genotyping error. viterbi finds the sequence G_i that maximizes the joint probability Pr(G | O). maxmarg just maximizes the marginal distribution Pr(G_i | O) at each position i. sim_geno simulates from the joint distribution.

If you set error_prob = 0, you should get imputed values that match the observed, but for technical reasons I think we force error_prob to be slightly >0 and so you may not be able to get the values to match exactly. The results of viterbi and sim_geno will have no missing values. To eliminate missing values from maxmarg results, you'd set minprob=0.

karl

Bowen Jones

unread,

Nov 24, 2020, 6:16:07 PM11/24/20

to R/qtl2 discussion

Dr. Broman,

The problem that I'm encountering with the output of each of these functions is that the resulting values are not only "1" and "3" (which I take to be "A" and "B" respectively), but some are occasionally imputed as a "2", which means we have three unique values possible for the cells, while our data should only have two. Therefore I take "2" to be "NA" or some sort of imputation failure. Am I misinterpreting the 2?

Bowen

Message has been deleted

Karl Broman

unread,

Nov 24, 2020, 6:21:22 PM11/24/20

to R/qtl2 discussion

[Correcting my last post, now deleted.]

What cross type are you using? If it's "risib" or "riself", then the imputed genotypes should be all 1's and 2's.

karl

Bowen Jones

unread,

Nov 24, 2020, 6:22:02 PM11/24/20

to R/qtl2 discussion

We're using f2, would it be best to try a different cross type?

Bowen

Bowen Jones

unread,

Nov 24, 2020, 8:00:10 PM11/24/20

to R/qtl2 discussion

Changing the cross type indeed solved the three value problem, thank you!

Bowen

Reply all

Reply to author

Forward