I have been writing a program that has some similarities to GLU. I need
to make comparisons to similar software. I haven't actually used GLU, so
please forgive the possible stupidity of this question. It appears from
the documentation that GLU might use a database, if so, probably SQLite,
but it is not clear. So, does GLU store the data in a database or not, and
if so, which database?
Here are two files that are contain sql queries.
http://code.google.com/p/glu-genetics/source/browse/glu/modules/genedb/queries.py
[code.google.com]
http://code.google.com/p/glu-genetics/source/browse/glu/lib/fileutils/formats/sqlite.py
[code.google.com]
Please CC me on any reply. Thanks.
Regards, Faheem Mitha
--
You received this message because you are subscribed to the Google Groups "glu-users" group.
To post to this group, send email to glu-...@googlegroups.com.
To unsubscribe from this group, send email to glu-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/glu-users?hl=en.
On Sun, 27 Feb 2011, Kevin Jacobs <jac...@bioinformed.com> wrote:
> Hi Faheem,
> Yes, some of the GLU modules require a SQLite database containing
> various genomic annotations. I'm in the process of posting updated
> versions of these datasets for public use.
> -Kevin
Hi Kevin,
Thanks. So GLU stores annotation information in a database, but does not
store the main genomic information, eg calls, in a database? Thanks.
Regards, Faheem
On Mon, 28 Feb 2011, Faheem Mitha wrote:
>
>
> On Sun, 27 Feb 2011, Kevin Jacobs <jac...@bioinformed.com> wrote:
>
>> Hi Faheem,
>
>> Yes, some of the GLU modules require a SQLite database containing various
>> genomic annotations. I'm in the process of posting updated versions of
>> these datasets for public use.
>
>> -Kevin
>
> Hi Kevin,
>
> Thanks. So GLU stores annotation information in a database, but does not
> store the main genomic information, eg calls, in a database? Thanks.
>
> Regards, Faheem
PS. I'm of the impression that GLU writes its data to disk in a special
compressed format. Is this what you use for storing the genotype calls?
F
You are correct: genome-wide annotation is stored in a global SQLite database, while genotypes plus sample and locus annotation are stored separately. GLU supports many file formats (text, binary, uncompressed, compressed) for storing SNP genotype data (including PLINK's text and binary formats). The "native" compressed binary formats are LBAT, SBAT and TBAT for storage of genotypes by locus, sample, and as individual genotypes, respectively. These formats are based on the HDF5 file format and are able to scale effectively to manage trillions of genotypes. In particular, LBAT files can achieve storage efficiencies of under 1 bit per genotype (inclusive of metadata). E.g, an LBAT file for an analysis with the GIANT consortium contains data on 4,170 subjects for 1,126,862 SNPs (~4.7 trillion genotypes) and requires 371,504 kb of disk storage (<0.65 bits per genotypes).
-Kevin
On Mon, 28 Feb 2011, Jacobs, Kevin (NIH/NCI) [C] wrote:
> Hi Faheem,
>
> You are correct: genome-wide annotation is stored in a global SQLite
> database, while genotypes plus sample and locus annotation are stored
> separately. GLU supports many file formats (text, binary, uncompressed,
> compressed) for storing SNP genotype data (including PLINK's text and
> binary formats). The "native" compressed binary formats are LBAT, SBAT
> and TBAT for storage of genotypes by locus, sample, and as individual
> genotypes, respectively. These formats are based on the HDF5 file
> format and are able to scale effectively to manage trillions of
> genotypes. In particular, LBAT files can achieve storage efficiencies
> of under 1 bit per genotype (inclusive of metadata). E.g, an LBAT file
> for an analysis with the GIANT consortium contains data on 4,170
> subjects for 1,126,862 SNPs (~4.7 trillion genotypes) and requires
> 371,504 kb of disk storage (<0.65 bits per genotypes).
Thanks, Kevin. That is a very clear and detailed explanation.
Regards, Faheem