general questions about gwas using TASSEL

761 views
Skip to first unread message

Ines

unread,
Apr 8, 2014, 8:34:00 PM4/8/14
to tas...@googlegroups.com
Hello,

I'm using TASSEL for genome-wide association. While reading about GWAS in general, and TASSEL in particular, I came across the following three questions. Any contribution is welcomed.

1) I would like to confirm that if I use GLM without covariates (population stratification, Q) in TASSEL, I'm actually applying a 'regular' linear regression, i.e. using the naive model. Correct?

2) If I have the same trait measured in different conditions, would you advise me to use the Y=mean+condition+whatever or a 'combined trait', such as subtraction, ratio, etc.? I guess incorporating condition into the model will reduce power, since I have to correct for even more tests (number of snps x number of conditions), correct? But if I use a 'combined trait' and it is not normally distributed, can I apply the regular transformation strategies to normalize the data and use GLM and MLM as usual?

3) Finally, do you know what is the best tool to generate power curves that make sense for the models used in TASSEL? Most available power estimator tools are for case-control GWAS, and I haven't been able to figure out if tools like GWAPower (uses ANOVA and the associated F-test) or GWASpower/QT (also uses ANOVA and the associated F-test) are adequate for GLM and MLM. My understanding is that power depends on heritability, type 1 error rate, total sample size and number of snps used, linkage disequilibrium, and other covariates, but also on the method used. So if I calculate power for GLM it wont be the same for MLM. Is this correct?
Basically, I would like to have a statistically meaningful way of selecting the number of snps to use in my analysis, since I have way too many snps with MAF > 5% and recall rate > 95% and the correction for multiple testing with this many snps would make any possible signal disappear.

Thank you,
Ines.

Peter Bradbury

unread,
Apr 9, 2014, 10:52:37 AM4/9/14
to tas...@googlegroups.com
1) Yes. Although with covariates such as the Q matrix, the method is still linear regression. I am not sure what 'regular' means in this context.
2) To handle the same trait collected under different conditions, things to do would be to calculate the correlation between the trait values under the different conditions. That cannot be done in TASSEL but is generally useful. In TASSEL, a reasonable place to start would be to analyze data from the separate conditions individually, then to analyze the mean across all conditions using the BLUE's or BLUPs if there the data is not balanced. You can use GLM without markers to calculate the BLUEs. You can also include "condition" as a factor in the model, but the analysis will take longer and the results will be similar to just analyzing the means. If the original data or the combined data is distinctly not normal, the transforming it prior to analysis is a good idea. The resulting marker effect estimates may be difficult to interpret, but you get more reliable p-values.
3) I am not familiar with the software you mention for calculating power. Very definitely, power differs between GLM and MLM. In both cases, power depends in a complex way on the distribution of QTL sizes, population structure, the dependence of the trait in question on population structure, etc. 
Reply all
Reply to author
Forward
0 new messages