best practices for non-normal phenotypes

137 views
Skip to first unread message

Molly Edwards

unread,
Sep 10, 2020, 1:48:24 PM9/10/20
to R/qtl discussion
Hello R/qtl pros,

I am working on a project exploring the genetic architecture of 14 floral traits related to pollination syndromes. In my F2 population, phenotypes of five of these traits are normally distributed, and the rest are not. I am looking for some advice in terms of best practices when analyzing a dataset like this. Is transforming the non-normal phenotype data the way to go? Because if I don't, I can only go as far as scanone with model=np, correct? It would be ideal to be able to do the stepwiseqtl analyses for all traits, not just the normally distributed ones. 

Many thanks for your help!

Molly

Christopher G Oakley

unread,
Sep 10, 2020, 2:11:47 PM9/10/20
to rqtl...@googlegroups.com
Hi Molly,

We routinely transform (quantile normalization) all our phenotype data (except binomially distributed data) prior to using the stepwise procedure. You can then refit the QTL model using the untransformed data to get allelic effect sizes that are in units of the phenotype. 

I'm sure others will have potentially better solutions, but this is what works for me.

Take care,

Chris



--
You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rqtl-disc+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rqtl-disc/a668343f-bc7c-4001-8d3d-50bf99853f95n%40googlegroups.com.

James Anderson

unread,
Sep 10, 2020, 2:30:08 PM9/10/20
to rqtl...@googlegroups.com
I usually look towards log or log+1 transformations personally. It depends on the skewness of your data. Could you describe your distributions so we might know the types of transformations you might need? 

--

Molly Edwards

unread,
Sep 10, 2020, 9:34:13 PM9/10/20
to R/qtl discussion
Thanks so much for your help, team! I've attached histograms for 12 of the traits; the three in boxes are normally distributed, the rest are not. The red & blue arrows on the x-axis point to parental means.trait.histograms.plain.png

Lipps, Sarah J

unread,
Sep 10, 2020, 9:45:24 PM9/10/20
to rqtl...@googlegroups.com

I believe the manual states that rQTL is robust to non-normally distributed data. I was working with a dataset that was not normally distributed and did a box cox transformation which helped. I calculated ls means to be used for the dataset and as I was designing a model always checked BIC to make sure it wasn’t over fit in addition to a few other tests including an ANOVA. The ls means values also followed a 3:1 segregation ratio. I was still able to perform QTL mapping and the results from QTL mapping are very exciting.

 

Based off of your histograms, it looks like your parents are segregating for your phenotype, and the abnormal distribution could be due to a dominant genetic influence.

 

Hope this provides insight. And I highly recommend checking out the rQTL manual.

 

Best,

Sarah

 

From: <rqtl...@googlegroups.com> on behalf of Molly Edwards <mollye...@g.harvard.edu>
Reply-To: "rqtl...@googlegroups.com" <rqtl...@googlegroups.com>
Date: Thursday, September 10, 2020 at 8:34 PM
To: R/qtl discussion <rqtl...@googlegroups.com>
Subject: Re: [Rqtl-disc] best practices for non-normal phenotypes

 

Thanks so much for your help, team! I've attached histograms for 12 of the traits; the three in boxes are normally distributed, the rest are not. The red & blue arrows on the x-axis point to parental means.

On Thursday, September 10, 2020 at 2:30:08 PM UTC-4 James Anderson wrote:

I usually look towards log or log+1 transformations personally. It depends on the skewness of your data. Could you describe your distributions so we might know the types of transformations you might need? 

 

On Thu, Sep 10, 2020, 10:48 AM Molly Edwards <mollye...@g.harvard.edu> wrote:

Hello R/qtl pros,

 

I am working on a project exploring the genetic architecture of 14 floral traits related to pollination syndromes. In my F2 population, phenotypes of five of these traits are normally distributed, and the rest are not. I am looking for some advice in terms of best practices when analyzing a dataset like this. Is transforming the non-normal phenotype data the way to go? Because if I don't, I can only go as far as scanone with model=np, correct? It would be ideal to be able to do the stepwiseqtl analyses for all traits, not just the normally distributed ones. 

 

Many thanks for your help!

 

Molly

--

You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rqtl-disc+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rqtl-disc/a668343f-bc7c-4001-8d3d-50bf99853f95n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rqtl-disc+...@googlegroups.com.

James Anderson

unread,
Sep 10, 2020, 10:02:26 PM9/10/20
to rqtl...@googlegroups.com
I've been working on a package that does transformations, so I got code I can send. I'll look for it tomorrow and send it to you. 

Karl Broman

unread,
Sep 11, 2020, 11:08:38 AM9/11/20
to R/qtl discussion
I would probably not transform any of these. The skew of spur curvature and total nector might suggest square-root or logs, but they're really not so bad. The bimodal distributions particularly of sepal CIELAB a* and b* may suggest a major QTL and you don't want to transform that away. (It's not the phenotype distribution itself that needs to be normal, but the residual variation).

The ideas others have suggested (taking logs or normal quantiles) are good. The question of what to do with non-normal phenotype distributions in QTL mapping is the same as the question of what to do with non-normally distributed outcomes in multiple linear regression. Generally linear regression is quite robust, and non-constant residual variance is a bigger problem than non-normality.

karl

Molly Edwards

unread,
Sep 11, 2020, 4:58:58 PM9/11/20
to R/qtl discussion
Hi all,

Wow, thank you so much for your thoughtful replies! Yes, there is a major sepal color QTL and I definitely don't want it to disappear. I've worked my way through the manual and was going back to double-check my analyses before writing the manuscript, and the chapter on non-normal phenotypes made me a little nervous so I wanted to consult the experts :) I truly appreciate all of your advice, this was so helpful.

Best,

Molly
Reply all
Reply to author
Forward
0 new messages