Phenotype that is not normally distributed

William Pu

unread,

Sep 8, 2022, 8:12:28 AM9/8/22

to R/qtl2 discussion

A quantitative trait that we are analyzing is not normally distributed. How to deal with this in the R/qtl2 analysis? The trait is treadmill endurance. In our study, mutant mice have low treadmill endurance, but normal mice have endurance that exceeds the length of the study (ie they saturate the assay). While this could be coded as a binary trait, that would lose the quantification of the phenotype in mutants, which varies considerably between mutants on different genetic backgrounds.

Karl Broman

unread,

Sep 8, 2022, 10:39:51 AM9/8/22

to R/qtl2 discussion

Our main strategies for dealing with non-normal traits are:

- transform the phenotype (for example, take log or square-root)

- use a robust method, such as rank-based. Related to that is to convert the phenotypes to ranks and then to normal quantiles, as with the nqrank function in the R/qtl1 package (qtl::nqrank)

- use a method that is tailored to the specific phenotype distribution, such as with a generalized linear model. This isn't implemented within R/qtl2, but you could use the results of calc_genoprob() and do this on your own. If you need to get linear mixed models in there too, it starts to get really hard.

karl

Mark Sfeir

unread,

Sep 13, 2022, 3:54:24 PM9/13/22

to R/qtl2 discussion

Hi,

I'm trying to build my understanding of the significance of assumptions about normality in relation to the statistical models in qtl2, and I noticed a focus in this thread on the normal distribution of the trait itself, as opposed to normal distribution of the residuals of that trait. Is there a significance to the normality of the simple trait distribution that is good to understand here (beyond just questions of "are there concerning outliers in this data?"), or is this thread really concerned with normal distribution of the residuals, though not explicitly stated?

I'd appreciate anyone's input here.

-Mark Sfeir

Dan Gatti

unread,

Sep 13, 2022, 4:23:23 PM9/13/22

to rqtl2...@googlegroups.com

It’s true, in a linear models course, one of the assumptions is that the residuals are normally distributed. I think that, in practice, having your response be normally distributed, as opposed to log-normal or something really skewed, helps to produce normally distributed residuals. It doesn’t guarantee it. But we can’t do the usual model diagnostics at each marker in a QTL mapping study. So it’s a pragmatic way of trying to satisfy the linear model assumptions.

I don’t have any examples handy, but I have seen QTL get stronger when the phenotype was standardized. And I’ve seen spurious QTL disappear once the phenotype was transformed to make it more normally distributed.

Dan

--
You received this message because you are subscribed to the Google Groups "R/qtl2 discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rqtl2-disc+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rqtl2-disc/c5e9c5be-f505-4944-9ef1-9cb7459f39cen%40googlegroups.com.

---

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Mark Sfeir

unread,

Sep 14, 2022, 12:35:33 PM9/14/22

to R/qtl2 discussion

Thanks, Dan! I wondered if normal trait distribution was a common proxy for normal distribution of residuals.

Reply all

Reply to author

Forward