Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Equality vs. equivalence in hash_map

3 views
Skip to first unread message

Martin Vuille

unread,
Aug 26, 2001, 10:30:38 AM8/26/01
to

In a recent post, someone asked for assistance with a hash_map.

While perusing the poster's code and the SGI docs for hash_map,
I noticed that hash_map seems to use key comparisons based on
equality rather than equivalence.

Is my understanding correct? If so, why is a different paradigm
used for hashed containers than for other associative containers?
Wouldn't this tend to be confusing and/or error-prone?

If my understanding is incorrect, why is equal_to<T> the
default predicate instead of less<T>?

MV


Martin Vuille | Real-time and | Software & Firmware Contracting
JPMV Realtime Inc. | embedded | Turnkey Controls Development
Tel.:(613)823-0777 | control | System Integration
Fax.:(613)823-3034 | specialist | Product Development Consulting


[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Carl Daniel

unread,
Aug 26, 2001, 2:53:15 PM8/26/01
to

"Martin Vuille" <jpm...@yahoo.com> wrote in message
news:99882857...@news.aei.ca...

>
> In a recent post, someone asked for assistance with a hash_map.
>
> While perusing the poster's code and the SGI docs for hash_map,
> I noticed that hash_map seems to use key comparisons based on
> equality rather than equivalence.
>
> Is my understanding correct? If so, why is a different paradigm
> used for hashed containers than for other associative containers?
> Wouldn't this tend to be confusing and/or error-prone?
>
> If my understanding is incorrect, why is equal_to<T> the
> default predicate instead of less<T>?
>

Because hash_map doesn't require it's keys to be ordered (or even Strict
Weak Ordered).
-cd

James Rogers

unread,
Aug 26, 2001, 3:53:33 PM8/26/01
to
Martin Vuille wrote:
>
> If my understanding is incorrect, why is equal_to<T> the
> default predicate instead of less<T>?
>

In a hash_map the keys must be unique. This means that equality
must be determined.

Jim Rogers
Colorado Springs, Colorado USA

Pete Becker

unread,
Aug 26, 2001, 3:53:51 PM8/26/01
to
Martin Vuille wrote:
>
> Is my understanding correct? If so, why is a different paradigm
> used for hashed containers than for other associative containers?
> Wouldn't this tend to be confusing and/or error-prone?
>

You'll have to ask the folks at SGI. hash_map isn't in the standard
library.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Pete Becker

unread,
Aug 26, 2001, 5:29:56 PM8/26/01
to
James Rogers wrote:
>
> Martin Vuille wrote:
> >
> > If my understanding is incorrect, why is equal_to<T> the
> > default predicate instead of less<T>?
> >
>
> In a hash_map the keys must be unique. This means that equality
> must be determined.
>

With a < operator that defines a strict weak ordering two items are
equal if !(a<b) && !(b<a). So a data structure that relies on equality
can be implemented using only <. Our implementation of hash_map uses a
less-tha predicate and doesn't require an equality operator, so it works
with the same data types as the rest of STL.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Howard Hinnant

unread,
Aug 26, 2001, 5:38:31 PM8/26/01
to
In article <99882857...@news.aei.ca>, Martin Vuille
<jpm...@yahoo.com> wrote:

| In a recent post, someone asked for assistance with a hash_map.
|
| While perusing the poster's code and the SGI docs for hash_map,
| I noticed that hash_map seems to use key comparisons based on
| equality rather than equivalence.
|
| Is my understanding correct? If so, why is a different paradigm
| used for hashed containers than for other associative containers?
| Wouldn't this tend to be confusing and/or error-prone?
|
| If my understanding is incorrect, why is equal_to<T> the
| default predicate instead of less<T>?

This is an interesting question and probably has a direct bearing on a
future C++ standard. There are advantages and disadvantages to either
choice.

As you point out, choosing equal_to<T> after we have all gotten so used
to less<T> for [multi]map/set might be confusing. With less<T>, the
hash containers are more likely to be a "drop in" replacement for the
tree containers.

Another advantage to using less<T> is that a given bucket will only be
searched half way before discovering that a search key won't be found.
Using equal_to<T> an entire bucket has to be searched to discover that
a key doesn't exist in the container.

However choosing equal_to<T> has some significant advantages. A hash
container is logically unordered. Even if the container uses less<T>,
iterating from begin() to end() won't give you elements ordered by
less<T>.

There may be some keys where equal_to<T> has a logical meaning, but
less<T> doesn't (e.g. complex<T>). You can't create a tree container
of complex<T> without hacking up an arbitrary less<T> that will likely
only have meaning for the sake of putting it into this container.

(I believe) a very important class of keys for hash containers is
std::string. less<string> is O(min(sizeof(s1), sizeof(s2))). However,
equal_to<string> is often much cheaper than this. If the two strings
being compared have different lengths, eqaul_to<string> is O(1). So a
hash<string> is likely to be faster if it is using equal_to<string>
rather than less<string>. (Caveat: this complexity on
equal_to<string> is not mandated by the standard.)

About the only reason one turns to hash containers is the lure of
performance (otherwise just use the tree containers as they have a
guaranteed log(N) complexity that the hash containers don't). But if
you're willing to take a little risk, and a lot of testing, hash
container look up can come down to O(1) (worst case is O(N) - much
worse than the tree containers).

So with performance in mind, I chose equal_to for the Metrowerks hash
containers. The argument that less<T> only has to search half of the
entries in a bucket didn't hold water for me, because if you are
concerned about performance you are going to be aiming for 1, maybe 2
items per bucket. Note this is not meant as a slight towards the
less<T> choice. But I had to make a choice as I was writing the
Metrowerks hash containers, and I felt the advantages leaned towards
equal_to<T>.

I can't speak for SGI, but I've become convinced that they made the
better choice. One can often still use the (equal_to<T>) hash
containers as a drop in replacement for the tree based containers. All
that is required is to default the comparison function, and have
equal_to<T> and less<T> have consistent semantics (as string does).

--
Howard Hinnant
Metrowerks

Stephen Howe

unread,
Aug 26, 2001, 7:27:44 PM8/26/01
to
> With a < operator that defines a strict weak ordering two items are
> equal if !(a<b) && !(b<a). So a data structure that relies on equality
> can be implemented using only <. Our implementation of hash_map uses a
> less-tha predicate and doesn't require an equality operator, so it works
> with the same data types as the rest of STL.

Which must mean that it is ordered surely?

Stephen Howe

Pete Becker

unread,
Aug 26, 2001, 8:33:12 PM8/26/01
to
Howard Hinnant wrote:
>
> (I believe) a very important class of keys for hash containers is
> std::string. less<string> is O(min(sizeof(s1), sizeof(s2))).

Which says only that less<string> isn't a good choice for ordering
string elements with equal hash codes. But there's a simple alternative:
we can define a string as less than another if the first string is
shorter than the second or if the two are of equal length and the first
comes before the second lexicographically. In which case it's just as
fast as comparing two strings for equality, and will typically require
searching fewer elements in a bucket to determine whether a match is
present.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Howard Hinnant

unread,
Aug 27, 2001, 4:41:35 AM8/27/01
to
In article <3B8972C8...@acm.org>, Pete Becker
<peteb...@acm.org> wrote:

| Howard Hinnant wrote:
| >
| > (I believe) a very important class of keys for hash containers is
| > std::string. less<string> is O(min(sizeof(s1), sizeof(s2))).
|
| Which says only that less<string> isn't a good choice for ordering
| string elements with equal hash codes. But there's a simple alternative:
| we can define a string as less than another if the first string is
| shorter than the second or if the two are of equal length and the first
| comes before the second lexicographically. In which case it's just as
| fast as comparing two strings for equality, and will typically require
| searching fewer elements in a bucket to determine whether a match is
| present.

Right of course. Unfortunately this bites off one of less<T>'s
advantages: that of being a drop in replacement:

#include <map>
#include <hash_map>
#include <string>

int
main()
{
using decl whatever to ignore namespace for this demo;
map<string, int> tmap;
hash_map<string, int> hmap;
}

If hash_map is based on equal_to<string> this has the efficiency we're
both aiming for (at least for Metrowerks::hash_map, I can't vouch for
other implementations). But if hash_map is based on less<string>, then
to achive our desired goal we need:

#include <map>
#include <hash_map>
#include <string>

struct hash_less_string
{
bool operator ()(std::string const& x, std::string const& y) const
{
if (x.size() != y.size())
return x.size() < y.size();
return x < y;
}
};

int
main()
{
using namespace std;
using namespace Metrowerks;
map<string, int> tmap;
hash_map<string, int, hash<string>, hash_less_string> hmap;
}

Again, this is using the Metrowerks interface of template parameters.
>From a '98 paper (CUJ) on the Dinkumware library, looks like this
should look something more like:

#include <map>
#include <hash_map>
#include <string>

struct string_compare
{
enum {bucket_size = 4, min_buckets = 8};

size_t operator() (const string& key) const
{
your favorite string hashing function goes here
}

bool operator ()(std::string const& x, std::string const& y) const
{
if (x.size() != y.size())
return x.size() < y.size();
return x < y;
}
};

int
main()
{
using decl whatever to ignore namespace for this demo;
map<string, int> tmap;
hash_map<string, int, string_compare> hmap;
}

| Which says only that less<string> isn't a good choice for ordering
| string elements with equal hash codes.

Agreed.

| But there's a simple alternative:

Looks to me like the simplest alternative is to base the hash container
on equal_to<string>.

--
Howard Hinnant
Metrowerks

Pete Becker

unread,
Aug 27, 2001, 8:52:50 AM8/27/01
to
Stephen Howe wrote:
>
> > With a < operator that defines a strict weak ordering two items are
> > equal if !(a<b) && !(b<a). So a data structure that relies on equality
> > can be implemented using only <. Our implementation of hash_map uses a
> > less-tha predicate and doesn't require an equality operator, so it works
> > with the same data types as the rest of STL.
>
> Which must mean that it is ordered surely?
>

Not quite. Entries in the same bin (i.e. same value for hash-code %
num-bins) are ordered. Overall, though, it's not ordered. Well, not in
any reasonable way. <g>

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Pete Becker

unread,
Aug 27, 2001, 10:38:14 AM8/27/01
to
Howard Hinnant wrote:
>
> Right of course. Unfortunately this bites off one of less<T>'s
> advantages: that of being a drop in replacement:

Which is why the comparison should be encapsulated in a traits class,
specialized for string, which makes it transparent to users.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

P.J. Plauger

unread,
Aug 27, 2001, 12:21:06 PM8/27/01
to
"Howard Hinnant" <hin...@antispam.metrowerks.com> wrote in message
news:260820011727280588%hin...@antispam.metrowerks.com...

> I can't speak for SGI, but I've become convinced that they made the
> better choice. One can often still use the (equal_to<T>) hash
> containers as a drop in replacement for the tree based containers. All
> that is required is to default the comparison function, and have
> equal_to<T> and less<T> have consistent semantics (as string does).

Our latest version of the hash template classes aims for the best of both
worlds. Turns out it can hash properly given either a strict weak ordering,
as in operator less<T>, or an inequality comparison, as in operator
not_equal<T>. So we add a partial specialization that looks for SGI-style
template parameters and invert the sense of the supplied predicate. The
upshot is that you can use our hash tables as a drop-in replacement for
map/set, using a strict weak ordering, or as a drop-in replacement for
SGI-style hash_map/set, using an (in)equality comparison.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

0 new messages