I am pleased to announce the release of GLU 1.0 alpha 6 (see below for a summary of changes). Source and binary versions have been posted to the GLU project site. OS X binaries are still in the works and will be posted shortly (sorry Mac people—my laptop is slow!).
This release marks the end of the GLU 1.0 development cycle and no new features will be added before the final 1.0 release. A short series of beta-test releases and final release candidates will be forthcoming over the next few weeks in order to ensure that the final 1.0 release is of the highest quality. In addition, our documentation has been steadily expanding, but we plan to improve upon it significantly prior to the final 1.0 release.
Many thanks to all those who have worked hard to make this release possible.
So what now? Many exciting things are planned for GLU 2.0, including:
· Direct integration with R to make the full power of GLU available to R users and vice-versa.
· Improved parallel processing capabilities. Parallel processing on multi-core systems and computing clusters has always been possible with GLU, but not always easy to set up.
· Support for even larger genotype datasets on even more modest computer hardware. Currently, GLU scales nicely up to tens of thousands of subjects and a few million SNPs. We’re aiming to scaling to another order of magnitude increase in both dimensions.
· Better integration of genome annotation. GLU has a powerful embedded annotation database, but it lives mostly by itself and doesn’t socialize much. Future versions will add additional resources, including modern pathway information, new polymorphism panels, and additional genome builds. In addition, queries on these features will be available from any GLU module, e.g. allowing one to extract, summarize, tag, perform association tests, etc. on all exonic SNPs in a given pathway that are present in HapMap Phase II.
· Support additional genotype input and output formats. GLU already supports reading, writing, and converting among over 15 formats, but there are many more out there. Our goal is for GLU to be a true polyglot and allow it to handle virtually any sensible data you may have.
· “Next-generation” sequencing is here in full force and GLU is expanding to encompass management of the data generated by your 454, Solexa, SOLiD, and other next-gen sequencing platforms.
· Any much more… We want to hear from you! Please send in your feature requests.
The main project site for the open source development of GLU is:
http://code.google.com/p/glu-genetics
Source code and binary versions may be downloaded from:
http://code.google.com/p/glu-genetics/downloads
To subscribe to the GLU Users group, visit:
http://groups.google.com/group/glu-users
To subscribe to the GLU Developers group, visit:
http://groups.google.com/group/glu-dev
I highly recommend that interested users and developers join one or both of these groups, since this is where most of the discussion will take place. You can join from the links at the bottom or right side of the Google Code page.
· You will have to create a Google Account, which is painless and does not require a GMail account.
o If you do have a GMail account and want emails from the group to be sent to your work address, then please sign off from GMail before attempting to join either group. Otherwise, your group subscription will default to your GMail account.
· Click on "Create an account now"
· Enter your email address (e.g. you@somewhere.gov), enter a new and unique password, and the other information they request.
Summary of changes since GLU Release 1.0a5 (2008-10-01)
Although known in principle, this issue was first reported in the wild when Jun Lu started using HapMap build 23, which includes 125k monomorphic SNPs (incomplete models). Over 4.7 GB of RAM and 2m22s were needed to subset the data using GLU 1.0a5 with the old model management strategy, but now only 315 MB of RAM and 5.7s are needed to perform the same operations. A pleasant side-effect is that runtime performance is greatly improved for this and many other operations. This 15x reduction in the amount of memory and a 25x reduction in time required is a substantive start on optimizing GLU for operation on more modest desktop hardware, though clearly more work is needed.
Special thanks to Jun Lu for his help in testing this fairly significant set of changes.
Some formats can no longer be specified by file extension. e.g.,:
glu transform mydata.lbat -o mydata.structure
is now invalid. Really, there are no files with the .structure extension in the wild (nor should there be). What any sensible user wants is:
glu transform mydata.lbat -o mydata.dat:format=structure
or:
glu transform mydata.lbat -F structure -o mydata.dat
Fine print
GLU is Copyright (c) 2008, 2009, BioInformed LLC and the U.S. Department of Health & Human Services.
GLU is primarily developed and maintained by Kevin Jacobs <jacobs at bioinformed dot com> to support the Cancer Genetic Markers of Susceptibility (CGEMS) project, an initiative by the National Cancer Institute (NCI) Division of Cancer Epidemiology and Genetics (DCEG) and run by the NCI Core Genotyping Facility (CGF) to identify genetic alterations that affect susceptible to prostate and breast cancer.
CGEMS is funded by NCI under Contract N01-CO-12400 by SAIC-Frederick, a subsidiary of Science Applications International Corporation (SAIC).