[ANNOUNCEMENT] GLU 1.0a6 Released

25 views

Skip to first unread message

Jacobs, Kevin (NIH/NCI) [C]

unread,

Jan 8, 2009, 11:34:51 AM1/8/09

to glu-...@googlegroups.com, glu...@googlegroups.com

I am pleased to announce the release of GLU 1.0 alpha 6 (see below for a summary of changes). Source and binary versions have been posted to the GLU project site. OS X binaries are still in the works and will be posted shortly (sorry Mac people—my laptop is slow!).

This release marks the end of the GLU 1.0 development cycle and no new features will be added before the final 1.0 release. A short series of beta-test releases and final release candidates will be forthcoming over the next few weeks in order to ensure that the final 1.0 release is of the highest quality. In addition, our documentation has been steadily expanding, but we plan to improve upon it significantly prior to the final 1.0 release.

Many thanks to all those who have worked hard to make this release possible.

So what now? Many exciting things are planned for GLU 2.0, including:

· Direct integration with R to make the full power of GLU available to R users and vice-versa.

· Improved parallel processing capabilities. Parallel processing on multi-core systems and computing clusters has always been possible with GLU, but not always easy to set up.

· Support for even larger genotype datasets on even more modest computer hardware. Currently, GLU scales nicely up to tens of thousands of subjects and a few million SNPs. We’re aiming to scaling to another order of magnitude increase in both dimensions.

· Better integration of genome annotation. GLU has a powerful embedded annotation database, but it lives mostly by itself and doesn’t socialize much. Future versions will add additional resources, including modern pathway information, new polymorphism panels, and additional genome builds. In addition, queries on these features will be available from any GLU module, e.g. allowing one to extract, summarize, tag, perform association tests, etc. on all exonic SNPs in a given pathway that are present in HapMap Phase II.

· Support additional genotype input and output formats. GLU already supports reading, writing, and converting among over 15 formats, but there are many more out there. Our goal is for GLU to be a true polyglot and allow it to handle virtually any sensible data you may have.

· “Next-generation” sequencing is here in full force and GLU is expanding to encompass management of the data generated by your 454, Solexa, SOLiD, and other next-gen sequencing platforms.

· Any much more… We want to hear from you! Please send in your feature requests.

The main project site for the open source development of GLU is:

http://code.google.com/p/glu-genetics

Source code and binary versions may be downloaded from:

http://code.google.com/p/glu-genetics/downloads

To subscribe to the GLU Users group, visit:

http://groups.google.com/group/glu-users

To subscribe to the GLU Developers group, visit:

http://groups.google.com/group/glu-dev

I highly recommend that interested users and developers join one or both of these groups, since this is where most of the discussion will take place. You can join from the links at the bottom or right side of the Google Code page.

· You will have to create a Google Account, which is painless and does not require a GMail account.

o If you do have a GMail account and want emails from the group to be sent to your work address, then please sign off from GMail before attempting to join either group. Otherwise, your group subscription will default to your GMail account.

· Click on "Create an account now"

· Enter your email address (e.g. you@somewhere.gov), enter a new and unique password, and the other information they request.

Summary of changes since GLU Release 1.0a5 (2008-10-01)

Major re-write of genotype model encoding. This corrects a major design flaw which caused excessive amounts of memory to be used to process monomorphic SNPs or other instances of incomplete genotype models. The details are fairly low-level and technical, but the net result is that GLU is much smarter about allocating new model objects, performs faster for many operations, and requires less memory.

Although known in principle, this issue was first reported in the wild when Jun Lu started using HapMap build 23, which includes 125k monomorphic SNPs (incomplete models). Over 4.7 GB of RAM and 2m22s were needed to subset the data using GLU 1.0a5 with the old model management strategy, but now only 315 MB of RAM and 5.7s are needed to perform the same operations. A pleasant side-effect is that runtime performance is greatly improved for this and many other operations. This 15x reduction in the amount of memory and a 25x reduction in time required is a substantive start on optimizing GLU for operation on more modest desktop hardware, though clearly more work is needed.

Special thanks to Jun Lu for his help in testing this fairly significant set of changes.

GLU’s genotype file format support is now fully “pluggable”, in that new formats can be added by placing code in a plug-ins directory and will be automatically made available to all programs. The API is not yet documented, but this feature removes a major barrier for adding custom and user-defined file formats to GLU. This feature also fixes a number of internal limitations and bugs.

Some formats can no longer be specified by file extension. e.g.,:

glu transform mydata.lbat -o mydata.structure

is now invalid. Really, there are no files with the .structure extension in the wild (nor should there be). What any sensible user wants is:

glu transform mydata.lbat -o mydata.dat:format=structure

or:

glu transform mydata.lbat -F structure -o mydata.dat

tagzilla’s founder filter is now based on an exclusion filter, since phenotype information may not include all individuals. This ensures that those of unknown descent are assumed to be founders, rather than non-founders.
Enhance association testing output to include standard errors (logit1/linear1), genotype counts by category (logit1), maf by category (logit1), and align degenerate categories (logit1).
Renaming alleles and recoding models is now done before applying sample and locus renaming. The original behavior had identifiability issues.
Added support for Illumina’s genotype matrix format by adding a new genotype representation (missing genotype=’- -’). To use, specify representation “-g isnp”.
Enabled genotype filter command line parameters in ginfo.
Standardize command-line help output to use the standard error output stream.
Documentation updates based on contributions from Dennis Maeder, Jun Lu, Zhaoming Wang and Dan Eisenberg.

Fine print

GLU is primarily developed and maintained by Kevin Jacobs <jacobs at bioinformed dot com> to support the Cancer Genetic Markers of Susceptibility (CGEMS) project, an initiative by the National Cancer Institute (NCI) Division of Cancer Epidemiology and Genetics (DCEG) and run by the NCI Core Genotyping Facility (CGF) to identify genetic alterations that affect susceptible to prostate and breast cancer.

CGEMS is funded by NCI under Contract N01-CO-12400 by SAIC-Frederick, a subsidiary of Science Applications International Corporation (SAIC).

Reply all

Reply to author

Forward

0 new messages