btree

George Mpouras

unread,

May 11, 2012, 6:45:45 AM5/11/12

to

I ve done some tests and found out
that hash is faster than 'something' ~~ @array
From theory the btree alogorythm is faster than the hash (even more if the
hash has conflicts)
Is there any way to store my values as btree instead of hash in Perl ?

XeCycle

unread,

May 14, 2012, 2:46:05 AM5/14/12

to

Use Berkeley DB, dude.

Never think about such efficiency problems before your data grew
very large.

--
Carl Lei (XeCycle)
Department of Physics, Shanghai Jiao Tong University
OpenPGP public key: 7795E591
Fingerprint: 1FB6 7F1F D45D F681 C845 27F7 8D71 8EC4 7795 E591

Martijn Lievaart

unread,

May 14, 2012, 2:51:14 AM5/14/12

to

Native? No. But there probably is some module that helps on CPAN.

M4

Tim Watts

unread,

May 14, 2012, 3:33:46 AM5/14/12

to

XeCycle wrote:

> "George Mpouras" <nospam.gr...@hotmail.com.nospam> writes:
>
>> I ve done some tests and found out
>> that hash is faster than 'something' ~~ @array
>> From theory the btree alogorythm is faster than the hash (even more if
>> the hash has conflicts)
>> Is there any way to store my values as btree instead of hash in Perl ?
>
> Use Berkeley DB, dude.
>
> Never think about such efficiency problems before your data grew
> very large.
>

Yep - exactly how much data does the OP have? Typically, you can have tens
to hundreds of thousands of keys in a perl hash without much of a problem,
unless:

1) Speed is becoming a problem;

2) You are implementing on constrained hardware (eg embedded).

Also, there are various tree modules available on CPAN, but few are likely
to compare to the internal perl hashing unless they are written in C and
have been specifically designed for speed/efficiency.
--
Tim Watts

Ben Morrow

unread,

May 14, 2012, 4:12:46 AM5/14/12

to

Quoth Martijn Lievaart <m...@rtij.nl.invlalid>:

> On Fri, 11 May 2012 13:45:45 +0300, George Mpouras wrote:
>
> > I ve done some tests and found out that hash is faster than
> > 'something' ~~ @array

That is not remotely surprising.

> > From theory the btree alogorythm is faster than
> > the hash (even more if the hash has conflicts)
> > Is there any way to store my values as btree instead of hash in Perl ?
>
> Native? No. But there probably is some module that helps on CPAN.

DB_File is a core module. If you use the $DB_BTREE mode, and undef
instead of a filename, it will give you a Perl-level hash implemented by
an in-memory btree.

Ben

Peter J. Holzer

unread,

May 14, 2012, 4:40:24 AM5/14/12

to

On 2012-05-14 07:33, Tim Watts <tw+u...@dionic.net> wrote:
> XeCycle wrote:
>> "George Mpouras" <nospam.gr...@hotmail.com.nospam> writes:
>>> Is there any way to store my values as btree instead of hash in Perl ?
>>
>> Use Berkeley DB, dude.
>>
>> Never think about such efficiency problems before your data grew
>> very large.
>>
>
> Yep - exactly how much data does the OP have? Typically, you can have
> tens to hundreds of thousands of keys in a perl hash without much of a
> problem, unless:
>
> 1) Speed is becoming a problem;

As long as the hash fits into memory, each operation (insert, lookup,
delete) is almost certainly faster on a hash than on a btree.

A btree becomes faster when we are dealing with *persistent* data and a
relatively small number of lookups and/or changes:

With a hash, you need to load all the data into memory at program start,
and if you make any changens you have to write it back to disk before
the program terminates.

With a btree, you only have to read/write the blocks you need.

hp

--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | h...@hjp.at |
__/ | http://www.hjp.at/ | -- Bill Code on as...@irtf.org

Martijn Lievaart

unread,

May 14, 2012, 4:46:48 AM5/14/12

to

On Mon, 14 May 2012 09:12:46 +0100, Ben Morrow wrote:

> DB_File is a core module. If you use the $DB_BTREE mode, and undef
> instead of a filename, it will give you a Perl-level hash implemented by
> an in-memory btree.

Nice! Thanks. Added to the toolbox.

M4

Rainer Weikusat

unread,

May 14, 2012, 8:57:57 AM5/14/12

to

"George Mpouras" <nospam.gr...@hotmail.com.nospam> writes:

[...]

> From theory the btree alogorythm is faster than the hash (even more if the
> hash has conflicts)

As I already showed in another posting: This is completely
wrong. First, if there are now 'conflicts' in the hash table, hash
lookup is O(1) -- you calculate the hash value, select the proper
table slot and do at most a single key comparison. That's better than
anything which can be achieved with any tree structure. Second, if
there are conflicts, the maximum amount of key comparisons necessary
for a search (successful or unsuccessful) depends on the average
length of the list which needs to be searched. If these lists are
short compared to the height of some search tree used for the same set
of elements, hashed lookups will still be faster. And 'short-ness' of
the lists can be ensured by using a suitably large table.

Rainer Weikusat

unread,

May 14, 2012, 9:01:23 AM5/14/12

to

XeCycle <XeC...@Gmail.com> writes:
> "George Mpouras" <nospam.gr...@hotmail.com.nospam> writes:
>
>> I ve done some tests and found out
>> that hash is faster than 'something' ~~ @array
>> From theory the btree alogorythm is faster than the hash (even more if the
>> hash has conflicts)
>> Is there any way to store my values as btree instead of hash in Perl ?
>
> Use Berkeley DB, dude.
>
> Never think about such efficiency problems before your data grew
> very large.

The absolutely last thing you want in practice is working code which
needs a fundamental change until "yesterday" because the core data
structure used by it turns out to be unsuitable for a real-world
problem it is supposed to deal with.

XeCycle

unread,

May 14, 2012, 9:41:31 AM5/14/12

to

Rainer Weikusat <rwei...@mssgmbh.com> writes:

[...]

>> Never think about such efficiency problems before your data grew
>> very large.
>
> The absolutely last thing you want in practice is working code which
> needs a fundamental change until "yesterday" because the core data
> structure used by it turns out to be unsuitable for a real-world
> problem it is supposed to deal with.

Another way is to use a compatible interface for the new data
backend. Berkeley DB can be compatible with Perl arrays and
hashes.

Of course you may not be able to use the new features in this
way, but when things get really large, the original algorithm may
not suffice. Therefore a refactor is needed, in case you're
scaling it too large. A moderate scale up may be able to go
without those features provided by a more capable data backend.

Willem

unread,

May 14, 2012, 11:52:31 AM5/14/12

to

XeCycle wrote:
) Rainer Weikusat <rwei...@mssgmbh.com> writes:
)> The absolutely last thing you want in practice is working code which
)> needs a fundamental change until "yesterday" because the core data
)> structure used by it turns out to be unsuitable for a real-world
)> problem it is supposed to deal with.
)
) Another way is to use a compatible interface for the new data
) backend. Berkeley DB can be compatible with Perl arrays and
) hashes.
)
) Of course you may not be able to use the new features in this
) way, but when things get really large, the original algorithm may
) not suffice. Therefore a refactor is needed, in case you're
) scaling it too large. A moderate scale up may be able to go
) without those features provided by a more capable data backend.

And then it turns out that the API to the backend is what causes
it to scale badly and you're screwed anyway.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Shmuel Metz

unread,

May 14, 2012, 1:40:34 PM5/14/12

to

In <joq8vp$gor$1...@news.ntua.gr>, on 05/11/2012
at 01:45 PM, "George Mpouras"

<nospam.gr...@hotmail.com.nospam> said:

>I ve done some tests and found out
>that hash is faster than 'something' ~~ @array

That's what I would expect if hash is implimented as a hash table.

>From theory the btree alogorythm is faster than the hash

What theory?

>Is there any way to store my values as btree instead of hash in
Perl ?

Yes, but it would slow you down.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spam...@library.lspace.org

Ted Zlatanov

unread,

May 16, 2012, 11:30:34 AM5/16/12

to

On Mon, 14 May 2012 14:46:05 +0800 XeCycle <XeC...@Gmail.com> wrote:

X> Use Berkeley DB, dude.

cfengine (a configuration management tool I use) switched from Berkeley
DB to Tokyo Cabinet recently, and I had a chance to look at TC. It's a
good C kv database with good performance and has a Perl API. It
compares well to Berkeley DB, especially in capacity and bug density,
so maybe it's useful to others too. Take a look at their site,
http://fallabs.com/tokyocabinet/

They also offer Kyoto Cabinet, a C++ successor to TC, but my experience
with C++ portability and readability has been less pleasant...

Ted