[Bioperl-l] BerkeleyDB

shalu sharma

unread,

Jan 25, 2019, 10:32:35 AM1/25/19

to biop...@mailman.open-bio.org

Hey everyone,

So I am using this BerkeleyDB to make a huge database (tree method).

I use it to pull out matching ids (its working fine) from multiple datasets.

here are few lines of the code:

use strict ;

use BerkeleyDB ;

use Bio::SeqIO;

my $filename = "tree" ;

unlink $filename ;

my %h ;

tie %h, 'BerkeleyDB::Btree',

-Filename => $filename,

-Flags => DB_CREATE,

or die "Cannot open $filename: $!\n" ;

# Add a key/value pair to the file

open(IN,"$ARGV[0]"); # adding values

while(<IN>){

my $line = $_;

chomp($line);

my @f = split('\t',$line);

my $id = $f[0];my $val = $f[1];$id =~ s/^\s+//;$id =~ s/\s+$//;

$val =~ s/^\s+//;$val =~ s/\s+$//;

$h{$id} = $val;

----

My question is that: It makes a huge tree file. Is it possible to re-use that tree file again instead of making it again and again. My query datasets changes but not that database.

Peter Cock

unread,

Jan 25, 2019, 11:31:36 AM1/25/19

to shalu sharma, Bioperl L

That's a good question - I don't know the BioPerl answer,
but am interested from the Biopython side of things.

When I created Biopython's SeqIO (first included in
Biopython 1.43 from 2007) it was heavily influenced by
BioPerl's SeqIO:

https://bioperl.org/howtos/SeqIO_HOWTO
https://biopython.org/wiki/SeqIO

The older Biopython framework it replaced (using a regular
expression based system called Martel/Mindy) had indexing,
e.g. see the Biopython 1.30 release notes from 2004.

It took a bit longer to add indexing to Biopython's SeqIO.
I added in-memory indexing (using a dict or hash Perl
terminology) in Biopython 1.52 (2009), and then SQLite
support was added in Biopython 1.57 (2011). And yes, a
key point of this was to build an index once, and reuse it.

I did look at BerkeleyDB for this, but concluded that
SQLite was a more portable and practical choice - it
was usually included with a standard Python install.

Regards,

Peter

> _______________________________________________
> Bioperl-l mailing list
> Biop...@mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Biop...@mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

Gordon Haverland

unread,

Jan 25, 2019, 1:16:56 PM1/25/19

to biop...@mailman.open-bio.org

On Fri, 25 Jan 2019 16:13:52 +0000
Peter Cock <p.j.a...@googlemail.com> wrote:

> That's a good question - I don't know the BioPerl answer,
> but am interested from the Biopython side of things.
>
> When I created Biopython's SeqIO (first included in
> Biopython 1.43 from 2007) it was heavily influenced by
> BioPerl's SeqIO:
>
> https://bioperl.org/howtos/SeqIO_HOWTO
> https://biopython.org/wiki/SeqIO
>
> The older Biopython framework it replaced (using a regular
> expression based system called Martel/Mindy) had indexing,
> e.g. see the Biopython 1.30 release notes from 2004.
>
> It took a bit longer to add indexing to Biopython's SeqIO.
> I added in-memory indexing (using a dict or hash Perl
> terminology) in Biopython 1.52 (2009), and then SQLite
> support was added in Biopython 1.57 (2011). And yes, a
> key point of this was to build an index once, and reuse it.
>
> I did look at BerkeleyDB for this, but concluded that
> SQLite was a more portable and practical choice - it
> was usually included with a standard Python install.

Way back when, I seem to remember some information about DBM::Deep
possibly being put on top of BerkeleyDB. The man page for DBM::Deep
mentions BDB, but not in the context of the work is finished. The code
lives at Github, and very little seems to have been done in the last 2
years.

Gord

Mark Jensen

unread,

Jan 25, 2019, 4:57:08 PM1/25/19

to shalu sharma, biop...@mailman.open-bio.org

Just drop the
-Flags => DB_CREATE
parameter, and is should open a file that already exists. Or am I missing something?

Sent from my iPhone

> On Jan 25, 2019, at 10:18 AM, shalu sharma <sharmas...@gmail.com> wrote:
>
> filename

Fields, Christopher J

unread,

Jan 27, 2019, 2:08:00 PM1/27/19

to shalu sharma, Mark Jensen, biop...@mailman.open-bio.org

Yes, that’s what I was thinking. You should be able to simply tie back into the same file and reuse the btree.

Chris

Gordon Haverland

unread,

Jan 27, 2019, 3:16:47 PM1/27/19

to biop...@mailman.open-bio.org

On Sun, 27 Jan 2019 18:54:53 +0000
"Fields, Christopher J" <cjfi...@illinois.edu> wrote:

> Yes, that’s what I was thinking. You should be able to simply tie
> back into the same file and reuse the btree.

Quite a while ago, I was working with DBM::Deep, and developed a fair
complex Perl structure. The ability to store this to file (which is
what DBM::Deep did originally) or dbase (still a work in progress?)
apparently isn't trivial.

Duplicating parts of the structure could use a shallow copy, a deep
copy, or sometimes you would have to write a custom subroutine to do
what you wanted.

If you wanted to change the structure, you had to get a deep copy of
what was "in the file" into memory, make the changes, and then copy
this new structure on top of the original location, and then save to
file again.

The levels of autovivification that Perl could do, often influenced how
you altered the structure.

None of which is about BerkeleyDB. But I believe in the reading I was
doing around this work, that the reason a lot of this had to be done,
was because of the "tie" interface, and/or what was available for
serializing Perl structures. Which may crop up in this application
(maybe).

Which is why I brought up DBM::Deep.

I had worked with MLDBM before DBM::Deep, on related things.

Gord

Fields, Christopher J

unread,

Jan 27, 2019, 4:48:30 PM1/27/19

to biop...@mailman.open-bio.org, Gordon Haverland

There is some code for storing more complex data in BerkeleyDB buried several places in BioPerl. For example Lincoln had a backend using if for his Bio::DB::SeqFeature::Store implementation.

chris

Reply all

Reply to author

Forward