Help, Permutation for MLM

John

unread,

Jul 21, 2014, 9:22:23 PM7/21/14

to tas...@googlegroups.com

Dear all,

In my manuscript, I used MLM for GWAS by using GAPIT, and FDR was used to correct the multiple comparisons. I also run permutation to set the empirically genome-wide significant level.

One of the reviewers suggested that permutation does not work to MLM, as randomize phenotype-genotype association will destroy correlation between individuals, thus destroy population structure. However, based on my understanding that kinship matrix (K) and population structure (PCA) are completely based on genotype data. If just randomly assign phenotype to each genotype, it will not change the K and PCA.

I was totally confused. Any suggestion?

Thank you.

John

Peter Bradbury

unread,

Jul 22, 2014, 8:43:36 AM7/22/14

to tas...@googlegroups.com

The reviewer is correct. Once you have broken the relationship between samples and phenotypes, you no longer expect that the kinship matrix will capture a significant fraction of the phenotypic variance. The relationship between phenotype and population structure no longer exists. For permutation to be set up properly, exchangeability under the null hypothesis must be maintained. So, to do a permutation test using MLM, you would have to permute the values of the first SNP, leaving the sample names and phenotypes in place, then test that SNP. Then you would need to do that for every other SNP and find the lowest p-value. That would be a single permutation. You would want to repeat that 1000 times recording the lowest p-value each time to generate an empirical null.

Papers reporting results using MLM for GWAS generally use FDR and/or Bonferonni for multiple test correction.

Peter

John

unread,

Jul 22, 2014, 10:02:18 AM7/22/14

to tas...@googlegroups.com

Hi Peter,

Thank you for your explanation. That's more complicate than I expect.

How to "permute the values of the first SNP"? Does it mean to replace the first SNP genotype (numeric) by another SNP's genotype?

Is GAPIT able to test one SNP at a time? and If I have over 30,000 SNPs and 300 lines how long it takes for 1000 permutation on a regular laptop?

I would not do permutation if it was not suggested by one of the reviewer at the beginning. Then other reviewers will not question my permutation at all.

John

unread,

Jul 22, 2014, 12:59:51 PM7/22/14

to tas...@googlegroups.com

John,

If you are using the hapmap format for input, one way to permute the SNPs, at least conceptually would be to permute the order of the genotype columns in the table, but leave the header positions unchanged. Permute here means shuffle the order randomly. As you indicate, K and PCA (and phenotypes) should be the original unpermuted values. That way the relationship between phenotype and K and PCA remains unchanged. When I said to permute the values of the first SNP, I just wanted to highlight that all the SNPs would need to be permuted and tested for a single permutation. If running GAPIT 1000 times is going to take a long time, you could probably get by with 500 times.

However, as I mentioned before I do not recall reading any mixed model GWAS papers that have used a permutation test for setting a significance threshold, so running the test may not be necessary, at least for the purpose of your manuscript. But, if running the analysis 500 or 1000 times is feasible, I think the method I have outlined would be valid.

Peter

From: John

Hi Peter,

May I first permute the all SNPs in the way your suggested, and then run GAPIT using MLM with K and PCA derived from the original genotypes (before permutation)? I think GAPIT treats each SNP independently. And repeat the above process 1000 times. Does that work?

But again, what is "permute the values of the first SNP" mean?

Thank you.

John

Deadline Was Yesterday

unread,

Nov 3, 2020, 11:31:06 AM11/3/20

to TASSEL - Trait Analysis by Association, Evolution and Linkage

Hey Peter,

I came accross this paper that uses permutation testing with CMLM.

https://www.frontiersin.org/articles/10.3389/fpls.2018.00650/full

I'll copy the methods section that mentions the permutation procedure:

"We divided the phenotype (Y) of each accession into the original genotypic effect (G) and the fixed effect of population structure (Ps). Ps was estimated by the average effect of each PC on each individual through regression analysis of each PC on Y; the remainder was G after excluding Ps from Y. G was randomly reshuffled as Gr, and the new phenotype of each accession was Ps + Gr. We executed permutation test using CMLM with the same parameters of GWAS, a total of 1,000 sets of Ps + Gr were performed for the four traits with the same number of PC. To improve the computational efficiency, the SNPs whose P-values were greater than 2 in the GWAS using original phenotypes were included in the permutation test.

My question is, when subtracting the mean effect of each PC on Y, would that retain the 'relationship' between the samples and phenotypes as you have mentioned? Am I to assume that when the mean effect of PCs are being subtracted from Y, it is also subtracting the kinship information, which will be reintroduced when we add Ps back to Gr?

Reply all

Reply to author

Forward