I’m working with some researchers who have been using PLINK to calculate LD (r2) for a large number of individuals and many SNP combinations rather successfully. However, for their specific case, they would like to calculate chi-square instead of the d-prime statistic.
An example of what they are doing for r2 calculations with PLINK:
plink --bfile data --r2 dprime inter-chr --ld-snp-list snplist.txt --threads 10 --ld-window-r2 0 --out ldResults
Using standard statistics tools makes this simple, but no where near fast enough (not close to PLINK’s speed). So, they asked me to help them. I thought the best place to start was to modify PLINK so it could calculate chi-square since a lot of the potentially slow data IO code is already handled.
Does anyone have any advice as to how to best go about implementing this (i.e. a place to start in the code base, etc.)? Any pointers or suggestions would be much appreciated. Of course, let me know if any further details or explanation would be helpful. I’m happy to contribute any code changes to the project if they are wanted.
-- Lance Parsons - Scientific Programmer Carl C. Icahn Laboratory - Room 136 Lewis-Sigler Institute for Integrative Genomics Princeton University
Lance
(plink 2.0 has a better thread-safe implementation of chiprob_p().)
Excellent, thanks for the pointers. I just put a very simple implementation up on my ld_chisquare
branch of my fork. Right now it calculates chi-square instead of dprime and outputs the chi-square stat instead of the dprime stat in ldResults.ld
. Also, if one snp is missing a genotype completely, you’ll get an nan
value. Finally, there is no accounting for sex chromosomes in the code, which perhaps it should.
I’d love to get this to the state where it can be a Pull Request to PLINK, but could use a bit of help with command line parameters, handling of edge cases, and, of course, code style and optimization. Would you be willing/able to help out?
Also, I’m happy to move this discussion over to plink-dev
if that would be more appropriate.
Sorry about the delay in responding; I'll try to fill in the other pieces (command-line parsing, etc.) later this week.
--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/lME6ld4i4cQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
In case you are going to implement this in 2.0, I’ve updated my fork of where I hacked 1.9 to output chisquare and pvalues instead of r2 and dprime. In addition, I added --ld-window-cs
as a chisquare pvalue threshold to keep the output size down. You can see it all here: https://github.com/lparsons/plink-ng/tree/ld_chisquare