TASSEL GLM and MLM nominal and adjusted P-value

2,575 views
Skip to first unread message

Phoenixice

unread,
Mar 12, 2012, 4:47:18 AM3/12/12
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Dear All,

Recently I submitted a paper, then one reviewer said:

"2. How multiple testing issue was handled? It is imperative to
provide both nominal and adjusted p-values, in order to aid
interpretation of the results. This is more important as this is the
main result. Also this is non-trivial given the correlations among
phenotypes and among markers."

I have some problems for explanation. Actually I do not understand too
much about Statistics.
1. I run GLM and MLM both, I can find and run permutation test in GLM
with 1,000 permutions, and in the result I can find p_marker and p-
adj_marker. Are these the nominal and adjusted p-value the reviewer
asked?

2. How to calculate the nominal and adjusted p-value for MLM? Before I
used TASSEL 2.1 version.

Please, if anyone know something, just reply.
Thank you very much!



Peter Bradbury

unread,
Mar 12, 2012, 8:42:33 AM3/12/12
to tas...@googlegroups.com
The permutation test in GLM outputs two p-values. p_marker is the nominal p-value from the F-test of each individual marker. perm_p is a permutation based experiment-wise test of the marker effect.That means across the whole experiment it controls the probability of a single false positive. In that regard, it is similar to the Bonferroni method. However, unlike Bonferroni, a permutation test automatically handles dependency between hypothesis and non-normality in the data. The perm_p statistic calculates a p-value across all markers one trait at a time. If you test multiple traits, it might make sense to adjust the perm_p values for that. For instance if you have 10 traits, a Bonferroni correction would simply involve dividing the target alpha level by 10 or multiplying the p-values by 10, which amounts to the same thing. This does not take into account correlations between traits, but is generally considered to be a conservative method. It is likely to control false positives at least at the specified level.

MLM p-values are nominal tests of individual markers and still need to be corrected for multiple testing. Storey's Q-method is a good choice for calculating FDR(http://www.genomine.org/qvalue/index.html). The Benjamini-Hochberg FDR method or the Bonferonni correction are reasonable alternatives.

If you look at Q-Q plots of the output from MLM and GLM, you will see that the p-values for different traits can follow different distributions. For that reason, it is probably a good idea to apply any multiple test correction to one trait at a time since any of the multiple test correction methods is based on the p-value distribution.

Phoenixice

unread,
Mar 13, 2012, 2:05:27 AM3/13/12
to tas...@googlegroups.com
Dear Bradbury,

Thank you very much. You are saving my life. God. ^_^
I followed you suggestion, but still had some problems.


Actually, the GLM output total 3 p-value: p_Marker, p-perm_Marker and p-adj_Marker. I checked the TASSEL manual, and find following explanations:
p_marker: p-value from the F-test of each individual marker.
p-perm_Marker: a test of individual markers.
p-adj_Marker: the marker p-value adjusted for multiple tests. The p-adj_Marker value is a permutation test derived using a step-down MinP procedure (Ge et al. 2003) and controls the family-wise error rate (FWER).

I think this is the nominal p-value as you said. Which p-value is for site-wise permutation test, and which one is for experiment-wise test? I have 50 markers and 7 traits. Every trait give 50 p-Marker, 50 p-perm_Marker, and 50 p-adj_Marker. So which p-value(s) I should choose to define the significance, p-perm or p-adj? When we use this to define the significance (for example, 0.05), the nominal p_marker is useless, right?


For MLM, I followed your suggestion, and ran FDR test using qvalue with the alpha 0.1. Is there some criteria for choosing a alpha?

F_Marker p_Marker Qvalue
13.1207 4.73E-04 0.004497
13.1207 4.73E-04 0.004497
11.5045 0.0011 0.055
9.5803 0.0027 0.067069
8.5449 0.0045 0.225
7.701 0.0069 0.243333
7.2348 0.0085 0.087076
6.1965 0.0146 0.243333
6.1965 0.0146 0.243333
6.1306 0.0151 0.187544
4.6516 0.034 0.087076
4.6516 0.034 0.087076
The P-value with red color is false positive. Is my understanding is right?

And I found another problem
p-value q-value
0.0085 0.087076
0.034 0.087076
0.034 0.087076
0.0975 0.087076
0.0975 0.087076
0.1158 0.087076
0.1339 0.087076
0.1373 0.087076
0.1373 0.087076
0.1376 0.087076
0.1412 0.087076
0.1417 0.087076
0.1417 0.087076
0.1417 0.087076
0.1418 0.087076
0.1423 0.087076
0.1455 0.087076
0.1455 0.087076
0.1455 0.087076
0.1455 0.087076
0.1455 0.087076
0.1456 0.087076
0.1494 0.087076
0.1695 0.094675
0.1829 0.095873
0.1929 0.095873
0.1931 0.095873
0.3153 0.150953
0.3335 0.154161
0.3723 0.161426
0.3733 0.161426
0.4274 0.179044
0.4474 0.180577
0.458 0.180577
0.4821 0.184648
0.517 0.192515
0.608 0.198056
0.6563 0.198056
0.6563 0.198056
0.6818 0.198056
0.6825 0.198056
0.6825 0.198056
0.6825 0.198056
0.6825 0.198056
0.6825 0.198056
0.6944 0.198056
0.6944 0.198056
0.7757 0.216635
0.9511 0.258454
0.964 0.258454
Some non-siginificant p-value have significant Q-value. How to explain that?


在 2012年3月12日星期一UTC+9下午9时42分33秒,Peter Bradbury写道:

yuanyuan chen

unread,
May 5, 2014, 8:09:15 PM5/5/14
to tas...@googlegroups.com
Hi Dr. Bradbury,

I used TASSEL 3.0 to do the GLM analysis. Here I got a weird result in the attachment. Why the "modelMS" is less than "markerMS" ?  I think the "model MS" should be greater than "marker MS", right? 
Could you help me to figure it out? Thanks a lot. I am looking forward to your reply.

Sincerely,
Yuanyuan Chen
Book1.xlsx

Peter James Bradbury

unread,
May 5, 2014, 9:29:48 PM5/5/14
to <tassel@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/8c812122-a074-4182-b386-53d55805320f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Book1.xlsx>

Peter Bradbury

unread,
May 6, 2014, 10:24:32 AM5/6/14
to tas...@googlegroups.com
Model SS should always be greater than marker SS, not MS. Note that SS = MS * df. Also, in your analysis, the results may not be very meaningful because the number of df for markers is high and the total number of observations is low for association analysis. In particular, the number of individuals in each marker class should be >= 10 as a rule of thumb. It is possible that the most significant marker classes have only one or two individuals. If a single individual has an extreme value then any marker for which it has a unique value will be highly significant. 

Peter
Reply all
Reply to author
Forward
0 new messages