DBM::Deep now on Git

17 views
Skip to first unread message

Rob Kinyon

unread,
Jun 29, 2009, 5:38:33 PM6/29/09
to dbm-...@googlegroups.com
I just finished an export of the DBM::Deep source from both
svn.perl.org (0.96x -> 1.0008) and svn.ali.as (1.0008 -> 1.0014). It's
available on:

dbsr...@git.shadowcat.co.uk:DBM-Deep.git
g...@github.com:robkinyon/dbm-deep.git

The shadowcat repos will be considered authoritative and should be
preferred. Go ahead and clone it, then submit patches to me. If you
must, use the github. If we can get along without it, I'd prefer to
decommission it, but in the spirit of git, I'm not going to be fussy.

I have to add the tags for 1.0009-1.0013 (they got lost), but that
shoudl be all that's missing. Also, all the branches are on the SC
repos, but may not be in the github repos. (I'm still figuring this
git thing out.)

--
Thanks,
Rob Kinyon

Paul Miller

unread,
Jun 29, 2009, 7:21:28 PM6/29/09
to DBM-Deep


On Jun 29, 5:38 pm, Rob Kinyon <rob.kin...@gmail.com> wrote:
> I just finished an export of the DBM::Deep source from both
> svn.perl.org (0.96x -> 1.0008) and svn.ali.as (1.0008 -> 1.0014). It's
> available on:
>
> dbsrg...@git.shadowcat.co.uk:DBM-Deep.git
> g...@github.com:robkinyon/dbm-deep.git


Actually, I think you'll find it's available here:

http://github.com/robkinyon/dbm-deep/

And here:

git://github.com/robkinyon/dbm-deep.git

The URL you gave was the private clone URL.

But I'm happy to see this project on github. I'm a big fan of that
site and this project -- together at last.

Steven Lembark

unread,
Jun 29, 2009, 8:52:38 PM6/29/09
to DBM-...@googlegroups.com

> dbsr...@git.shadowcat.co.uk:DBM-Deep.git
> g...@github.com:robkinyon/dbm-deep.git

Q: Why the preference againsed github?

--
Steven Lembark 85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
lem...@wrkhors.com +1 888 359 3508

Rob Kinyon

unread,
Jun 29, 2009, 9:33:16 PM6/29/09
to DBM-...@googlegroups.com
On Mon, Jun 29, 2009 at 20:52, Steven Lembark<lem...@wrkhors.com> wrote:
>
>
>> dbsr...@git.shadowcat.co.uk:DBM-Deep.git
>> g...@github.com:robkinyon/dbm-deep.git
>
> Q: Why the preference againsed github?

Because if github goes down, that spikes the process. SC won't go down
without providing what I need to export.

Rob

Steven Lembark

unread,
Aug 27, 2009, 1:24:52 AM8/27/09
to DBM-...@googlegroups.com
On Mon, 29 Jun 2009 17:38:33 -0400
Rob Kinyon <rob.k...@gmail.com> wrote:

Q: How well would the current DBM::Deep handle a
hash with roughly 160_000 keys?

"I have no idea, noone has been idiotic enough to
try it" would be a reasonable response.

Paul Miller

unread,
Aug 27, 2009, 8:55:25 AM8/27/09
to DBM-...@googlegroups.com
On Mon, Jun 29, 2009 at 9:33 PM, Rob Kinyon<rob.k...@gmail.com> wrote:
>>> dbsr...@git.shadowcat.co.uk:DBM-Deep.git
>>> g...@github.com:robkinyon/dbm-deep.git
>>
>> Q: Why the preference againsed github?
>
> Because if github goes down, that spikes the process. SC won't go down
> without providing what I need to export.

Thinking you're tied to one central repo seems to miss the point of
distributed anyway. Simply keeping it up-to-date on github is enough.
It's a complete repo and you can interact with any repo just the same
as any other.

I'll choose github, others may choose SC (whatever that is) and I'm
sure a clone will turn up on gitorioius and repo.or.cz ...

--
If riding in an airplane is flying, then riding in a boat is swimming.
113 jumps, 46.6 minutes of freefall, 89.1 freefall miles.

QM

unread,
Sep 10, 2009, 10:22:50 AM9/10/09
to DBM-Deep
On Aug 27, 1:24 am, Steven Lembark <lemb...@wrkhors.com> wrote:
> On Mon, 29 Jun 2009 17:38:33 -0400
>
> Rob Kinyon <rob.kin...@gmail.com> wrote:
>
> Q: How well would the current DBM::Deep handle a
>    hash with roughly 160_000 keys?
>
> "I have no idea, noone has been idiotic enough to
> try it" would be a reasonable response.
>
I've been idiotic, apparently.

I've tried it, with upwards of 500K top level keys in a multilevel
hash. Walking through the full hash, my script slows down as the
number of keys accessed increases. To be sure, it's a huge multilevel
hash, taking up ~100GB on disk. At the beginning of the walk, keys
(and their data) might be processed at the rate of 10/minute. Near the
end, it slows down to 10/hour. (I know, because I got fed up waiting
weeks for output, and put in a "progress meter".)

My script has a command line switch to enable/disable use of
DBM::Deep, and can just use native hashes in memory. Of course,
there's no way to get 100GB hashes in my virtual memory. Using the
same data, the memory option doesn't slow down at the end like the
DBM::Deep run does. Of course, the performance difference between
memory and disk is naturally large, so I'm not sure if it's the
problem is there in memory too, and it's just implicit in my script.
I'll have to try and come up with more rigorous testing.

Cheers,
QM

Steven Lembark

unread,
Sep 10, 2009, 3:28:10 PM9/10/09
to dbm-...@googlegroups.com

>
> On Aug 27, 1:24__am, Steven Lembark <lemb...@wrkhors.com> wrote:
> > On Mon, 29 Jun 2009 17:38:33 -0400
> >
> > Rob Kinyon <rob.kin...@gmail.com> wrote:
> >
> > Q: How well would the current DBM::Deep handle a
> > __ __hash with roughly 160_000 keys?

> >
> > "I have no idea, noone has been idiotic enough to
> > try it" would be a reasonable response.
> >
> I've been idiotic, apparently.
>
> I've tried it, with upwards of 500K top level keys in a multilevel
> hash. Walking through the full hash, my script slows down as the
> number of keys accessed increases. To be sure, it's a huge multilevel
> hash, taking up ~100GB on disk. At the beginning of the walk, keys
> (and their data) might be processed at the rate of 10/minute. Near the
> end, it slows down to 10/hour. (I know, because I got fed up waiting
> weeks for output, and put in a "progress meter".)
>
> My script has a command line switch to enable/disable use of
> DBM::Deep, and can just use native hashes in memory. Of course,
> there's no way to get 100GB hashes in my virtual memory. Using the
> same data, the memory option doesn't slow down at the end like the
> DBM::Deep run does. Of course, the performance difference between
> memory and disk is naturally large, so I'm not sure if it's the
> problem is there in memory too, and it's just implicit in my script.
> I'll have to try and come up with more rigorous testing.

My hashes fit easily into core, it's just
that persisting the data via flat files can
become difficult with a single directory
containing 160Kfiles.

My data structure is also fairly flat: using
$hash{ $namespace, $key } leaves me with many
keys and smaller data.

Q: How large are the chunks of data you fetch
during the walking cycle?

Knowing might help me tune things a bit.

Thanks for the information.

QM

unread,
Oct 8, 2009, 3:32:48 PM10/8/09
to DBM-Deep
> lemb...@wrkhors.com                                      +1 888 359 3508

Sorry for taking so long to respond.

Not sure how to measure this, since it's a tall nested hash structure.
However, judging by the output table, for each top level key fetched,
there's 1500 bytes output on average. There's a lot more data in the
DB, but only about 3% is fetched during the walk cycle. So each top
level key points to a 50KB hash tree.

BTW, I was rereading the docs the other day, and came across the
warning about using each on an a lookup instead of a hash ref:

Please note that when using each(), you should always pass a
direct hash reference, not a lookup.

Is this still accurate? I say that because I tried changing to the
hash ref, and the preliminary conclusion is that it doesn't make any
difference.

I thought about moving to the first_key/next_key form, but this
wouldn't work on in-memory hashes (when I choose not to use the DB),
and I'd have to code up something like this (untested):

my $key;
if ($using_dbm_deep) {
$key = $db->{top_level}->first_key() if $using_dbm_deep;
} else {
$key = each %{$db->{top_level}};
}
while ( $key )
{
do stuff with $key

} continue {
if ($using_dbm_deep) {
$key = $db->{top_level}->next_key();
} else {
$key = each %{$db->{top_level}};
}
}

Or maybe something like this already happens for me when I use each on
a D::D object?
Reply all
Reply to author
Forward
0 new messages