does GLU use a database

Faheem Mitha

unread,

Feb 27, 2011, 3:44:28 PM2/27/11

to glu-...@googlegroups.com

Hi,

I have been writing a program that has some similarities to GLU. I need
to make comparisons to similar software. I haven't actually used GLU, so
please forgive the possible stupidity of this question. It appears from
the documentation that GLU might use a database, if so, probably SQLite,
but it is not clear. So, does GLU store the data in a database or not, and
if so, which database?

Here are two files that are contain sql queries.

http://code.google.com/p/glu-genetics/source/browse/glu/modules/genedb/queries.py
[code.google.com]
http://code.google.com/p/glu-genetics/source/browse/glu/lib/fileutils/formats/sqlite.py
[code.google.com]

Please CC me on any reply. Thanks.

Regards, Faheem Mitha

Kevin Jacobs <jacobs@bioinformed.com>

unread,

Feb 27, 2011, 5:58:55 PM2/27/11

to glu-...@googlegroups.com, Faheem Mitha

Hi Faheem,

Yes, some of the GLU modules require a SQLite database containing various genomic annotations. I'm in the process of posting updated versions of these datasets for public use.

-Kevin

--
You received this message because you are subscribed to the Google Groups "glu-users" group.
To post to this group, send email to glu-...@googlegroups.com.
To unsubscribe from this group, send email to glu-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/glu-users?hl=en.

Faheem Mitha

unread,

Feb 28, 2011, 1:59:42 AM2/28/11

to jac...@bioinformed.com, glu-...@googlegroups.com

On Sun, 27 Feb 2011, Kevin Jacobs <jac...@bioinformed.com> wrote:

> Hi Faheem,

> Yes, some of the GLU modules require a SQLite database containing
> various genomic annotations. I'm in the process of posting updated
> versions of these datasets for public use.

> -Kevin

Hi Kevin,

Thanks. So GLU stores annotation information in a database, but does not
store the main genomic information, eg calls, in a database? Thanks.

Regards, Faheem

Faheem Mitha

unread,

Feb 28, 2011, 2:55:06 AM2/28/11

to jac...@bioinformed.com, glu-...@googlegroups.com

On Mon, 28 Feb 2011, Faheem Mitha wrote:

>
>
> On Sun, 27 Feb 2011, Kevin Jacobs <jac...@bioinformed.com> wrote:
>
>> Hi Faheem,
>
>> Yes, some of the GLU modules require a SQLite database containing various
>> genomic annotations. I'm in the process of posting updated versions of
>> these datasets for public use.
>
>> -Kevin
>
> Hi Kevin,
>
> Thanks. So GLU stores annotation information in a database, but does not
> store the main genomic information, eg calls, in a database? Thanks.
>
> Regards, Faheem

PS. I'm of the impression that GLU writes its data to disk in a special
compressed format. Is this what you use for storing the genotype calls?

F

Jacobs, Kevin (NIH/NCI) [C]

unread,

Feb 28, 2011, 8:38:46 AM2/28/11

to glu-...@googlegroups.com, fah...@email.unc.edu

Hi Faheem,

You are correct: genome-wide annotation is stored in a global SQLite database, while genotypes plus sample and locus annotation are stored separately. GLU supports many file formats (text, binary, uncompressed, compressed) for storing SNP genotype data (including PLINK's text and binary formats). The "native" compressed binary formats are LBAT, SBAT and TBAT for storage of genotypes by locus, sample, and as individual genotypes, respectively. These formats are based on the HDF5 file format and are able to scale effectively to manage trillions of genotypes. In particular, LBAT files can achieve storage efficiencies of under 1 bit per genotype (inclusive of metadata). E.g, an LBAT file for an analysis with the GIANT consortium contains data on 4,170 subjects for 1,126,862 SNPs (~4.7 trillion genotypes) and requires 371,504 kb of disk storage (<0.65 bits per genotypes).

-Kevin

Faheem Mitha

unread,

Feb 28, 2011, 8:55:57 AM2/28/11

to Jacobs, Kevin (NIH/NCI) [C], glu-...@googlegroups.com

On Mon, 28 Feb 2011, Jacobs, Kevin (NIH/NCI) [C] wrote:

> Hi Faheem,
>
> You are correct: genome-wide annotation is stored in a global SQLite
> database, while genotypes plus sample and locus annotation are stored
> separately. GLU supports many file formats (text, binary, uncompressed,
> compressed) for storing SNP genotype data (including PLINK's text and
> binary formats). The "native" compressed binary formats are LBAT, SBAT
> and TBAT for storage of genotypes by locus, sample, and as individual
> genotypes, respectively. These formats are based on the HDF5 file
> format and are able to scale effectively to manage trillions of
> genotypes. In particular, LBAT files can achieve storage efficiencies
> of under 1 bit per genotype (inclusive of metadata). E.g, an LBAT file
> for an analysis with the GIANT consortium contains data on 4,170
> subjects for 1,126,862 SNPs (~4.7 trillion genotypes) and requires
> 371,504 kb of disk storage (<0.65 bits per genotypes).

Thanks, Kevin. That is a very clear and detailed explanation.

Regards, Faheem

kowzar

unread,

Feb 28, 2011, 10:21:01 PM2/28/11

to glu-users, fah...@email.unc.edu

Hi Kevin.

I take it that the only way for the two systems (GLU and SNPpy) to
interact would be through an intermediate file format (e.g., the PLINK
bed format).

I continue to have problems importing PLINK bed files into GLU (I had
emailed you about my problems a while ago). I have downloaded the
latest version of GLU
$ md5sum glu-1.0b1-Linux_Intel_EM64T.tar.gz
2ecc76595eb8b7ebb7b87d497cabca91 glu-1.0b1-Linux_Intel_EM64T.tar.gz

I am using a toy example (slight modification from example in the
PLINK page)

$ cat test.ped
FAM001 1 0 0 1 2 A A G G A C
FAM001 2 0 0 1 2 A A A G C C
$ cat test.map
1 rs123456 0 1234555
1 rs234567 0 1237793
1 rs233556 0 1337456

I then create the bed file (note that in Debian the plink command has
been renamed to p-link)

$ p-link --file test --make-bed

I get following error: "Error: Invalid BIM locus model: (None/A) when
I try to transform the bed file into the lbat format"
when I issue the command

$ glu -v transform plink.bed -o plink.lbat

I am providing the complete session output below.

Take care,

Kouros

GLU 1.0b1 module: transform
Copyright (c) 2007-2009, BioInformed LLC and the U.S. Department of
Health & Human Services. Funded by NCI under Contract N01-CO-12400.

Well, that could have gone better.

Execution aborted due to a problem with the program input, parameters
supplied, an error in the program. Please examine the following
failure
trace for clues as to what may have gone wrong. When in doubt, please
send
this message and a complete description of the analysis you are
attempting
to perform to the software developers.

Error: Invalid BIM locus model: (None/A)

Command line:
glu transform plink.bed -o plink.lbat

Well, this is embarrassing.

Traceback: Traceback (most recent call last):
File "glu/lib/glu_launcher.py", line 210, in main
File "glu/modules/transform.py", line 68, in main
File "glu/lib/genolib/io.py", line 329, in transform_files
File "glu/lib/genolib/io.py", line 177, in load_genostream
File "glu/lib/genolib/formats/plink.py", line 952, in
load_plink_bed
File "glu/lib/genolib/formats/plink.py", line 844, in
load_plink_bim
RuntimeError: Invalid BIM locus model: (None/A)

[Mon Feb 28 22:12:43 2011] Execution aborted due to a fatal error

Reply all

Reply to author

Forward