First official stable build (0.19.10)

28 views
Skip to first unread message

Christopher Chang

unread,
May 31, 2013, 9:13:16 AM5/31/13
to wdist-a...@googlegroups.com
Now that public testing has started, I'm distinguishing between stable and development builds.  All unfinished flags should be disabled in the stable build, and not mentioned in command-line help.  Except for bugfixes, the stable build will usually only be updated in large jumps.

Here are some of the features that have been added to WDIST over the last five months:
- Association analysis max(T) permutation tests, for both case/control and quantitative traits.  If you were using a less accurate approach because PLINK's permutation tests were too slow, that should no longer be necessary: our implementation is often over a thousand times faster.
- Very fast Fisher's exact tests.  This incorporates a genuine algorithmic advance that is not yet present in other software as of this writing; see https://www.cog-genomics.org/software/stats for reference code and an in-browser demo.  Chi-square approximations, which can perform poorly on low-MAF markers, are no longer necessary.  Our fast Hardy-Weinberg exact test uses the same idea.
- I/O speed improvements.  When you're routinely producing multi-gigabyte text files, stuff like multithreaded gzip and custom number-to-string conversion routines actually pay off.
- Windows support.
- Proper support for 4GB+ files, even in 32-bit builds.
- Conversion to and back from 23andMe format.  (This is mostly just around to make life easier for our GWAS volunteers, but hopefully it comes in handy elsewhere too.)

I will try to get most or all of the following major features into the next stable build:
- Run-of-homozygosity analysis (--homozyg).
- Hierarchical clustering and multidimensional scaling analysis (--cluster, --mds-plot).  With these done, volunteers will be able to start walking through Razib's introduction to PLINK ( http://blogs.discovermagazine.com/gnxp/2013/01/using-your-23andme-data-in-plink ) without waiting hours between commands.
- Logistic regression and CNV analysis, because we need it for our own research.
Reply all
Reply to author
Forward
0 new messages