I don't think I'm doing it right...

5 views
Skip to first unread message

Alex Gallichotte

unread,
Jul 16, 2008, 2:03:24 PM7/16/08
to DBM-Deep
Hi all,

I seem to be having some fundamental difficulty with DBM::Deep. I
wrote a log processor that sifts through some 4 million lines, and
used Deep to store them in a to-disk hash. Yet still, I'd get out of
memory errors. top showed that memory usage increased a steady rate
until perl exploded.

After a thorough vetting of my code, I narrowed down the offending
line to this one:

$deep_obj->{$email} = $status_for{$mail_code};

Odd - why should adding keys to my Deep hash take up memory? I
thought that was the whole point...

So I wrote a little script to test this behavior:

#!/usr/bin/perl

use strict;
use DBM::Deep;

unlink('tmp.db');

tie my %hash, 'DBM::Deep', 'tmp.db';

my $i = 0;
while(1){
$hash{$i} = 1;
$i++;
}

Running this eats up memory at a constant rate until perl throws an
"Out of memory!" and that's that. This problem occurs on both FreeBSD
and Ubuntu, using perl-5.10 and perl-5.8 respectively. I'm using the
most recent DBM::Deep from the CPAN in both cases.

What gives?

-Alex

Paul Miller

unread,
Jul 16, 2008, 2:51:11 PM7/16/08
to DBM-Deep


On Jul 16, 2:03 pm, Alex Gallichotte <famousbi...@gmail.com> wrote:
> #!/usr/bin/perl
>
> use strict;
> use DBM::Deep;
>
> unlink('tmp.db');
>
> tie my %hash, 'DBM::Deep', 'tmp.db';
>
> my $i = 0;
> while(1){
>     $hash{$i} = 1;
>     $i++;
>
> }

Something is fishy... I left this running for about 10 minutes and
it's up to 0.9% of my memory. It has been increasing, but not fast
enough to fill up my ram before I fill up my hard drive.

Rob Kinyon

unread,
Jul 16, 2008, 2:54:38 PM7/16/08
to DBM-...@googlegroups.com
That does sound fishy. Someone please write a testcase.

Paul - do you want a committer bit? I'd love for you to have one.

Thanks,
Rob


--
Thanks,
Rob Kinyon
Founder and CTO, DBIX Corporation

Paul Miller

unread,
Jul 16, 2008, 9:15:33 PM7/16/08
to DBM-Deep


On Jul 16, 2:54 pm, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
> That does sound fishy. Someone please write a testcase.
>
> Paul - do you want a committer bit? I'd love for you to have one.

I believe I still have credentials from the other test I wrote (unless
you revoked them...)

I'll write a test for sure. I may even try to hunt it down a little,
depends how slow things are tomorrow.

Rob Kinyon

unread,
Jul 16, 2008, 9:18:18 PM7/16/08
to DBM-...@googlegroups.com
On Wed, Jul 16, 2008 at 21:15, Paul Miller <jet...@gmail.com> wrote:
> On Jul 16, 2:54 pm, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
>> That does sound fishy. Someone please write a testcase.
>>
>> Paul - do you want a committer bit? I'd love for you to have one.
>
> I believe I still have credentials from the other test I wrote (unless
> you revoked them...)

If it's for svn.ali.as, I'm not revoking any of those. If it's for
svn.perl.org, I'll need to get you new ones.

> I'll write a test for sure. I may even try to hunt it down a little,
> depends how slow things are tomorrow.

That would be phenomenal! Note: What's in SVN is currently almost
ready for deployment, but several tests (primarily for transactions)
are failing. I was supposed to get it finalized over a week ago, but
$Life has completely overtaken me. I'm hoping that this upcoming
week-long conference will give me some time to work through it.

--
Thanks,
Rob Kinyon

Paul Miller

unread,
Jul 18, 2008, 12:48:38 PM7/18/08
to DBM-Deep


On Jul 16, 9:18 pm, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
> > I'll write a test for sure.  I may even try to hunt it down a little,
> > depends how slow things are tomorrow.
>
> That would be phenomenal! Note: What's in SVN is currently almost
> ready for deployment, but several tests (primarily for transactions)

I think I found the culprit, but you'll have to tell *me* what on
earth it is...

After much tinkering, this leakfinder code seems to produce a
provocative result:

use strict;
use Scalar::Util qw(blessed);
use Devel::FindRef;
use Devel::FindBlessedRefs qw(find_refs_by_coderef);
use Data::Dump qw(dump);
use DBM::Deep;

unlink("/tmp/file");

my $PRIME_NUMBER = 1187;

my $db = DBM::Deep->new("/tmp/file");
$db->{randkey()} = 1 for 1 .. $PRIME_NUMBER;

find_refs_by_coderef(\&count_refs);

sub count_refs {
my $sv = shift;

if( (my $p = ref($sv)) ) {
if( $p eq "HASH" ) {
if( keys %$sv == $PRIME_NUMBER ) {
print Devel::FindRef::track $sv;
print "\n----------\n$p: ", dump($sv), "\n";
}
}
}
}

sub randkey {
our $i ++;
my @k = map { int rand 100 } 1 .. 10;
local $" = "-";

return "$i-@k";
}


It produced this:

HASH(0x9961c0) is
+- referenced by REF(0x996060), which is
| in the member '0' of HASH(0x988d70), which is
| referenced by REF(0x988e20), which is
| in the member 'entries' of
DBM::Deep::Engine=HASH(0x988d90), which is
| referenced by REF(0x94b630), which is
| in the member 'engine' of
DBM::Deep::Hash=HASH(0x9316f0), which is
| referenced by REF(0x995e10), which is
| referenced (in mg_obj) by 'P' type magic
attached to DBM::Deep::Hash=HASH(0x94b370), which is
| referenced by REF(0x625e50), which is
| in the lexical '$db' in CODE(0x604420),
which is
| not referenced within the search
depth.
+- referenced by REF(0x9703f0), which is
in the lexical '$sv' in CODE(0x970360), which is
+- referenced by REF(0x995f30), which is
| not found anywhere I looked :(
+- in the global &main::count_refs.

----------
HASH: {
161 => undef,
185 => undef,
209 => undef,
233 => undef,
257 => undef,
281 => undef,
305 => undef,
3623 => undef,
3647 => undef,
3671 => undef,
3695 => undef,
4010 => undef,
4034 => undef,
4058 => undef,
4082 => undef,
4106 => undef,
4130 => undef,
4154 => undef,
4397 => undef,
4421 => undef,
4445 => undef,
4469 => undef,
4493 => undef,
4517 => undef,
4541 => undef,
4565 => undef,
4589 => undef,

....

Paul Miller

unread,
Jul 19, 2008, 8:54:21 AM7/19/08
to DBM-Deep
On Jul 18, 12:48 pm, Paul Miller <jett...@gmail.com> wrote:
>   161    => undef,
>   185    => undef,
>   209    => undef,
>   233    => undef,

Actually, if I unshift the bleeding-edge SVN blib/lib into @INC, it
looks more like this:

> "158:0" => undef,
> "3620:2" => undef,
> "3620:0" => undef,
> "3620:1" => undef,

It's actually just the "entries" for transaction 0. Apparently there
is never get a clear_entries() while under transaction 0. Should
there be? Probably on each insert or something? That I definitely
can't say without further source diving.

I find it rather odd that they still build up while in a transaction.
If I do 1187 hash-scalar inserts without a transaction going, the
{entries}{0} builds up 1187 keys. If I do it with a transaction
running, it'll build up 1186 or 1177 or some other number smaller than
1187.

-Paul

Paul Miller

unread,
Jul 19, 2008, 9:29:17 AM7/19/08
to DBM-Deep

> It's actually just the "entries" for transaction 0.  Apparently there

So, I'm pretty much talking to myself ... Sorry. I pushed a commit
to the svn that seems to fix the leak. I decided that we probably
don't need to track entries when we're not in a transaction. It
doesn't seem to cause any extra test failures to just return from
add_entry() when trans_id is 0, so that's what I did.

Steven Lembark

unread,
Jul 24, 2008, 6:42:49 PM7/24/08
to DBM-...@googlegroups.com
Paul Miller wrote:

> So, I'm pretty much talking to myself ... Sorry. I pushed a commit
> to the svn that seems to fix the leak. I decided that we probably
> don't need to track entries when we're not in a transaction. It
> doesn't seem to cause any extra test failures to just return from
> add_entry() when trans_id is 0, so that's what I did.
>

Might be worth going back and looking at anything else
that tracks state: without any transaction outstanding
the state should be flushed with each action (a.k.a.
AutoCommit).

Paul Miller

unread,
Jul 24, 2008, 9:53:06 PM7/24/08
to DBM-Deep
On Jul 24, 6:42 pm, Steven Lembark <lemb...@wrkhors.com> wrote:
> Might be worth going back and looking at anything else
> that tracks state: without any transaction outstanding
> the state should be flushed with each action (a.k.a.
> AutoCommit).

I believe that's what I did (in effect) by returning from the
add_entry() when there's no transaction running. I believe that's
intended to track changes for transactions, and the transaction tests
all still pass.

I was hoping Rob would comment, but I suspect he's busy (OSCON?).

-Paul

Steven Lembark

unread,
Jul 29, 2008, 12:58:21 PM7/29/08
to DBM-...@googlegroups.com

> On Jul 24, 6:42 pm, Steven Lembark <lemb...@wrkhors.com> wrote:
>> Might be worth going back and looking at anything else
>> that tracks state: without any transaction outstanding
>> the state should be flushed with each action (a.k.a.
>> AutoCommit).
>
> I believe that's what I did (in effect) by returning from the
> add_entry() when there's no transaction running. I believe that's
> intended to track changes for transactions, and the transaction tests
> all still pass.

Q: Is there any possibility that data isn't being
flushed when transactions commit/rollback?

That's another classis source of leaks.

Q: I've lost track of the thread; which copy of
the module are you using (svn rev, CPAN, whatever)?

thanx

Rob Kinyon

unread,
Jul 29, 2008, 10:58:44 PM7/29/08
to DBM-...@googlegroups.com

Yes, your change to add_entry() is perfectly in the spirit of what it
should be doing. I don't know why I missed that one. (Many eyes and so
forth, I guess.)

Any other bugs you've found, please fix. Oh, and if you're in the
mood, please document, too. In fact, documenting much better than
fixing. :-/

And, for the record, I wasn't at OSCON (however much I would've liked
to be). It was another 8 day conference where you don't get much
hacking time. :-/

Paul Miller

unread,
Jul 30, 2008, 8:08:43 AM7/30/08
to DBM-Deep


On Jul 29, 10:58 pm, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
> Yes, your change to add_entry() is perfectly in the spirit of what it
> should be doing. I don't know why I missed that one. (Many eyes and so
> forth, I guess.)

Awesome.

> Any other bugs you've found, please fix. Oh, and if you're in the
> mood, please document, too. In fact, documenting much better than
> fixing. :-/

I'm not sure what you mean here... you mean rt.cpan tickets? Is there
some svn way to document them? XXX comments?

(I intend to hunt down that delete bug at some point. Time is the
problem.)

Rob Kinyon

unread,
Jul 30, 2008, 9:22:14 AM7/30/08
to DBM-...@googlegroups.com
On Wed, Jul 30, 2008 at 08:08, Paul Miller <jet...@gmail.com> wrote:
>> Any other bugs you've found, please fix. Oh, and if you're in the
>> mood, please document, too. In fact, documenting much better than
>> fixing. :-/
>
> I'm not sure what you mean here... you mean rt.cpan tickets? Is there
> some svn way to document them? XXX comments?

Ideally, we would have some set of documents as .pod files in the
repository. That way, the documentation gets included in the CPAN
distro and is viewable on CPAN. This is the sort of thing that
DBIx::Class and Catalyst and Template Toolkit do. I tried to start
doing this with the Internals.pod and Cookbook.pod files, but I didn't
get very far.

> (I intend to hunt down that delete bug at some point. Time is the
> problem.)

Whatever help you can give is awesome.

Paul Miller

unread,
Jul 30, 2008, 10:39:06 AM7/30/08
to DBM-Deep


On Jul 30, 9:22 am, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
> DBIx::Class and Catalyst and Template Toolkit do. I tried to start
> doing this with the Internals.pod and Cookbook.pod files, but I didn't
> get very far.

Oh, you mean actual package docs? I thought you meant document the
bugs instead of fixing them. You really meant, "please write
documentation, all we have is a huge synopsis?" (or did I
misunderstand again.)

The docs that are there seem to cover quite a bit -- I'd think most
everything. What do you think is missing?

Rob Kinyon

unread,
Jul 30, 2008, 10:44:34 AM7/30/08
to DBM-...@googlegroups.com
On Wed, Jul 30, 2008 at 10:39, Paul Miller <jet...@gmail.com> wrote:
> On Jul 30, 9:22 am, "Rob Kinyon" <rob.kin...@gmail.com> wrote:
>> DBIx::Class and Catalyst and Template Toolkit do. I tried to start
>> doing this with the Internals.pod and Cookbook.pod files, but I didn't
>> get very far.
>
> Oh, you mean actual package docs? I thought you meant document the
> bugs instead of fixing them. You really meant, "please write
> documentation, all we have is a huge synopsis?" (or did I
> misunderstand again.)
>
> The docs that are there seem to cover quite a bit -- I'd think most
> everything. What do you think is missing?

Internals is actually completely out of date and wrong. The Cookbook
is decent, but there is no documentation at all of how transactions
work, how the sectors break down, or how the flow of code happens for
any action. The documentation in Deep.pod is pretty thorough (Joe did
a great job), but it's not organized very well. There's no sense of
"This is important for newbies" vs. "This is good stuff for experts."

As for documenting the add_entry() change, I guess I was thinking more
in the direction of "Here's this set of functions that are only useful
in transactions." That would lead to "This is the stuff we need to
track for transactions", etc.

Steven Lembark

unread,
Jul 31, 2008, 9:23:33 AM7/31/08
to DBM-...@googlegroups.com

> The docs that are there seem to cover quite a bit -- I'd think most
> everything. What do you think is missing?

Internal comments, API information for each layer of
the call stack, examples tracing an operation down the
call stack so that someone can implement a new low-
level module if they wanted to.

--
Steven Lembark 85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
lem...@wrkhors.com +1 888 359 3508

Reply all
Reply to author
Forward
0 new messages