Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Generic compare function

424 views
Skip to first unread message

Adam Badura

unread,
Jan 7, 2010, 4:32:44 AM1/7/10
to
Why C++ does not contain a generic compare function? Such function
would return a negative value if left < right, zero if left == right
and a positive value if left > right. It could be either overloaded
for user types or like swap use some template magic.

Such function would be useful in cases when objects are well
ordered and a full order info can be as easily obtained as partial
info. Consider an example of sorting string vectors. For each matching
elements we determine whether left element (string) is smaller then
right element (string). Then we must check separately for equality.
This results in the need to compare those string twice while a simple
"compare" function would give us full information at the cost of one
comparison.

Adam Badura

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Goran

unread,
Jan 7, 2010, 1:09:17 PM1/7/10
to
On Jan 7, 10:32 am, Adam Badura <abad...@o2.pl> wrote:
> Why C++ does not contain a generic compare function? Such function
> would return a negative value if left < right, zero if left == right
> and a positive value if left > right. It could be either overloaded
> for user types or like swap use some template magic.
>
> Such function would be useful in cases when objects are well
> ordered and a full order info can be as easily obtained as partial
> info. Consider an example of sorting string vectors. For each matching
> elements we determine whether left element (string) is smaller then
> right element (string). Then we must check separately for equality.
> This results in the need to compare those string twice while a simple
> "compare" function would give us full information at the cost of one
> comparison.

{ edits: quoted sig and banner removed. please remove excess quoted material
before posting. -mod }

I'd wager the answer is simple: your generic "compare" function would
simply be implemented in terms of op== and op< (or >), at which point,
you'd have one more (trivial) function name for small-ish number of
use cases.

For strings and raw memory, there are low-level (C) functions already
(seems to be your case). But they don't work well in face of +/-
complex types that majority of C++ code is supposed to have.

BTW, it looks like you're thinking in terms of qsort, which I'd
consider strange for idiomatic C++ code. With C++, if you want
ordering, just use set/multiset/map/multimap and/or populate your
vectors using lower/upper_bound. I can't see why do you need a
comparison function then.

Goran.

Francis Glassborow

unread,
Jan 7, 2010, 1:06:31 PM1/7/10
to
Adam Badura wrote:
> Why C++ does not contain a generic compare function? Such function
> would return a negative value if left < right, zero if left == right
> and a positive value if left > right. It could be either overloaded
> for user types or like swap use some template magic.
>
> Such function would be useful in cases when objects are well
> ordered and a full order info can be as easily obtained as partial
> info. Consider an example of sorting string vectors. For each matching
> elements we determine whether left element (string) is smaller then
> right element (string). Then we must check separately for equality.
> This results in the need to compare those string twice while a simple
> "compare" function would give us full information at the cost of one
> comparison.
>
> Adam Badura
>

I wonder if you have thought this through. How would you actually use
such a function? How would it be implemented more efficiently than we
can do for ourselves? How would you determine which ordering was to be
used (many types have more then one logical ordering, and often include
equivalence sets)? Even std::string has multiple orderings depending
on the collation rules. How would you deal with types that have no
ordering? What about types that have equivalence sets but no ordering?

It may seem a very simple thing to do but in fact it is very complicated
and far from trivial to provide generically.

Tony Delroy

unread,
Jan 7, 2010, 1:09:19 PM1/7/10
to
On Jan 7, 6:32 pm, Adam Badura <abad...@o2.pl> wrote:
> Why C++ does not contain a generic compare function? Such function
> would return a negative value if left < right, zero if left == right
> and a positive value if left > right. It could be either overloaded
> for user types or like swap use some template magic.
>
> Such function would be useful in cases when objects are well
> ordered and a full order info can be as easily obtained as partial
> info. Consider an example of sorting string vectors. For each matching
> elements we determine whether left element (string) is smaller then
> right element (string). Then we must check separately for equality.
> This results in the need to compare those string twice while a simple
> "compare" function would give us full information at the cost of one
> comparison.

Yes, it would be nice, and potentially efficient. Something like "x
<=> y" (might conflict with intended use in axioms in a later C++
Standard). Throw in x <=> y ?< a ?= b : c. Similarly, GNU had a nice
notation for "a ? a : b" - another annoying case to handle - but it
just didn't catch on and eventually was deprecated. Guess some things
are too tricky for their own good....

Cheers,
Tony

red floyd

unread,
Jan 7, 2010, 1:12:45 PM1/7/10
to
On Jan 7, 1:32 am, Adam Badura <abad...@o2.pl> wrote:
> Why C++ does not contain a generic compare function? Such function
> would return a negative value if left < right, zero if left == right
> and a positive value if left > right. It could be either overloaded
> for user types or like swap use some template magic.
>
> Such function would be useful in cases when objects are well
> ordered and a full order info can be as easily obtained as partial
> info. Consider an example of sorting string vectors. For each matching
> elements we determine whether left element (string) is smaller then
> right element (string). Then we must check separately for equality.
> This results in the need to compare those string twice while a simple
> "compare" function would give us full information at the cost of one
> comparison.

#include <functional>
template <typename T>
int compare<T>(const T& left, const T& right)
{
if (std::less(left, right))
return -1;
else if (std::less(right, left))
return 1;
else
return 0;

Adam Badura

unread,
Jan 8, 2010, 1:49:36 AM1/8/10
to
> How would you actually use such a function?

Like I am using any other functions or functors.

> How would it be implemented more efficiently than we can do for ourselves?

This can be told just about every thing including entire STL for
example. So I do not consider it an argument.

> How would you determine which ordering was to be
> used (many types have more then one logical ordering, and often include
> equivalence sets)?

Just like it is done with default comparison operators and
functors. I
don't see any problem here.

> Even std::string has multiple orderings depending on the collation rules.

And yet somehow this does not prevent us from comparing
std::string with
operator < for example...

> How would you deal with types that have no ordering?
> What about types that have equivalence sets but no ordering?

Again just like with comparison operators and functors. There
would be
no overload/specialization/whatever dedicated for them and any use of
generic compare function for them would result in compilation error.

> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

I don't think so. After all "red floyd" posted and example
implementation.

And that is just all. A good generic fallback. But types like
std::string might use a smarter code to prevent suboptimal double
comparison. Compiler might even use extra machine code to provide
faster
versions for arithmetic types for example. But the bright side is that
user
of "compare" no longer cares. Using "compare" you ask what you want
directly: "what is the order relation of those to objects from
totally
ordered space?". And asking questions directly is a great thing
because it
makes it easier to give decent replay as fast as possible.

Adam Badura

Adam Badura

unread,
Jan 8, 2010, 1:52:52 AM1/8/10
to
> How would you actually use such a function?

Like I am using any other functions or functors.

> How would it be implemented more efficiently than we can do for ourselves?

This can be told just about every thing including entire STL for


example. So I do not consider it an argument.

> How would you determine which ordering was to be


> used (many types have more then one logical ordering, and often include
> equivalence sets)?

Just like it is done with default comparison operators and


functors. I
don't see any problem here.

> Even std::string has multiple orderings depending on the collation rules.

And yet somehow this does not prevent us from comparing


std::string with
operator < for example...

> How would you deal with types that have no ordering?


> What about types that have equivalence sets but no ordering?

Again just like with comparison operators and functors. There


would be
no overload/specialization/whatever dedicated for them and any use of
generic compare function for them would result in compilation error.

> It may seem a very simple thing to do but in fact it is very complicated


> and far from trivial to provide generically.

I don't think so. After all "red floyd" posted and example
implementation.

And that is just all. A good generic fallback. But types like
std::string might use a smarter code to prevent suboptimal double
comparison. Compiler might even use extra machine code to provide
faster
versions for arithmetic types for example. But the bright side is that
user
of "compare" no longer cares. Using "compare" you ask what you want
directly: "what is the order relation of those to objects from
totally
ordered space?". And asking questions directly is a great thing
because it
makes it easier to give decent replay as fast as possible.

Adam Badura

--

Adam Badura

unread,
Jan 8, 2010, 1:52:46 AM1/8/10
to
> I'd wager the answer is simple: your generic "compare" function would
> simply be implemented in terms of op== and op< (or >), at which point,
> you'd have one more (trivial) function name for small-ish number of
> use cases.

Yes. A generic fallback might be implemented so. Just like the
implementation provided by "red floyd".
But types like std::string, build-in types and even containers
might
provide a more efficient implementations.
In the end you might as well say it about std::swap which after
all is
just a simple copy construction and assignment.

> For strings and raw memory, there are low-level (C) functions already

Yes. But they cannot be used in generic code as they do not work
with
any type.

> (seems to be your case). But they don't work well in face of +/-
> complex types that majority of C++ code is supposed to have.

Yet somehow swap works well...

> BTW, it looks like you're thinking in terms of qsort, which I'd
> consider strange for idiomatic C++ code. With C++, if you want
> ordering, just use set/multiset/map/multimap and/or populate your
> vectors using lower/upper_bound. I can't see why do you need a
> comparison function then.

For example when implementing some algorithms or containers.

Adam Badura

Tony Delroy

unread,
Jan 8, 2010, 1:55:58 AM1/8/10
to
> Adam Badura wrote:
> > Why C++ does not contain a generic compare function? Such function
<snip>
> > [lack] results in the need to compare those string twice while a simple

> > "compare" function would give us full information at the cost of one
> > comparison.

On Jan 8, 3:06 am, Francis Glassborow


<francis.glassbo...@btinternet.com> wrote:
> I wonder if you have thought this through. How would you actually use
> such a function?

As Adam wrote, it's useful when comparing two containers (e.g.
strings), where you scan over some long common part, then find a
distinguishing character. There's a fair chance that you may know
exactly which of <, == and > apply at that point, so it would be nice
to have a good way to return that information. If you only indicate
say <, then a second comparison may waste time skipping the early
common elements before returning to the distinguishing ones....

Many existing "int compare(... a, ... b)" functions only promise to
return a negative, 0 or positive value, as subtracting elements may
produce that in one step more efficiently than guaranteeing -1, 0, or
1. The latter scheme does make it easy to switch on the resultant
value though...

switch (a.compare(b)) // or "a <=> b" or similar if an operator
were available
{
case -1: return "less than";
case 0: return "equal";
case 1: return "greater than";
default: std::cerr << "oops\n"; exit(EXIT_FAILURE);
}

Without a -1, 0, 1 guarantee, something like:
int compare_result = a <=> b;

if (compare_result < 0)
...
else if (compare_result == 0)
...
else // compare_result > 0
...;

In another post, I mentioned the ? : operators could be extended...

return a <=> b ?< "less than" ?= "equals" : "greater than";

I don't for one moment expect anything like this to be added to the
language, but I don't see that the objections you raise undermine this
comparison any more than the existing comparison operators (<, <=,
==, !=, >=, >).

> How would it be implemented more efficiently than we
> can do for ourselves?

It only helps efficiency in that it becomes a standard thing for
compilers to optimise. Consider this program:

int main(int argc, char* argv[])
{
return argc < 7 ? 42 : argc == 7 ? 53 : 64;
}

I picked weird numbers so they'd stand out in x86 assembly...

g++ -S -O3 compare.cc # ver. 4.4.1


cmpl $6, %edi
movl $42, %eax
jle .L3
movb $64, %al
cmpl $7, %edi
movl $53, %edx
cmove %edx, %eax
.L3:
rep
ret

f it helps the compiler a single compare statement,

A compiler might be able to perform the comparison once then use the
CPU flags to branch or load the appropriate return value without
repeating the comparison. Perhaps they do that for simple int/double
comparisons already? I've never studied. It couldn't... I don't see
much point in providing a compare(a, b) function that works for types
where the results would be meaningful (e.g. string, int, double).

Adam Badura

unread,
Jan 8, 2010, 2:32:59 AM1/8/10
to
> #include <functional>
> template <typename T>
> int compare<T>(const T& left, const T& right)
> {
> if (std::less(left, right))
> return -1;
> else if (std::less(right, left))
> return 1;
> else
> return 0;
> }

Yes. Exactly.

But if this function is not in std namespace then:
1) Standard types like string or containers will not provide more
efficient
versions.
2) Standard algorithms/containers will not use it to increase
performance.
3) Interaction of code from different vendors is harder.

Finally note that the same kind of argument might be used against
having std::swap.

Adam Badura

Le Chaud Lapin

unread,
Jan 8, 2010, 2:51:23 AM1/8/10
to
On Jan 7, 12:06 pm, Francis Glassborow

<francis.glassbo...@btinternet.com> wrote:
> Adam Badura wrote:
> > Why C++ does not contain a generic compare function?
[snip]

I agree with Adam. This is the method I use exclusively in all my code
for any situation that requires comparison.

> I wonder if you have thought this through.

I have, quite a bit. :)

I had been using the typical operator < and operator > inside my set-
like containers and many other situations such as comparison in
Microsoft's Namesspace Extension IShellFolder::CompareIDs function...

http://msdn.microsoft.com/en-us/library/bb775062(VS.85).aspx

..., which is implemented by the programmer such that, given two
"things", conceptually, two icons in the shell, it determines the
relative ordering of those things as defined by the programmer, so
that the shell can sort, find, etc. the items (icons).

It turns out that having a compare function of the kind proposed by
Adam which returns -, 0, +, for <, ==, >; respectively; fits nicely
into the model prescribed by Microsoft's API. And there is a good
reason: it is optimal, both mechanically, and conceptually, IMHO.

I call my comparison function "signum()".

This is not the name of one particular function, but a family of
functions. It is also a principle, scattered throughout my code. It is
defined for primitive types as well as friend function of every class
for which comparison makes sense.

> How would you actually use such a function?

Like this:

friend signed int signum (const String &, const String &);

bool operator == (const String &that) const {return signum (*this,
that) == 0;}
bool operator != (const String &that) const {return signum (*this,
that) != 0;}
bool operator > (const String &that) const {return signum (*this,
that) > 0;}
bool operator < (const String &that) const {return signum (*this,
that) < 0;}
bool operator >= (const String &that) const {return signum (*this,
that) >= 0;}
bool operator <= (const String &that) const {return signum (*this,
that) <= 0;}

> How would it be implemented more efficiently than we
> can do for ourselves?

It would eliminate a superfluous comparison in STL set-like
containers, which, in extreme cases, causes execution time to double,
because signum() checks once, and STL checks twice.

You can see in the following STL code from VS2008 that the _Pred()
function is being invoked twice for a comparison:

template<class _Pr, class _Ty1, class _Ty2> inline
bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left,
const _Ty2& _Right,
const wchar_t *_Where, unsigned int _Line)
{ // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
if (!_Pred(_Left, _Right))
return (false);
else if (_Pred(_Right, _Left))
_DEBUG_ERROR2("invalid operator<", _Where, _Line);
return (true);
}

With signum(), using my own equivalents of set<>, map<>, etc., the
comparison is invoked only once.

> How would you determine which ordering was to be
> used (many types have more then one logical ordering, and often include
> equivalence sets)? Even std::string has multiple orderings depending
> on the collation rules. How would you deal with types that have no
> ordering? What about types that have equivalence sets but no ordering?

Interesting question. :)

1. A few months ago, I asked a String expert how I should approach the
problem of comparing strings while redesigning my own international
UNICODE String class. My gut said to follow a strict rule whereby two
objects, of any kind, including String, would always be able to
determine their relative ordering, based on their mutual state,
without any external context. There would be no opportunity to supply
a comparison function. But this person had extensive experience with
UNICODE and C++, and a much broader context in general, so I was
practically committed to taking his advice before the conversation
even started.

He said that context-free comparison was a bad idea because I would
create a mess from which I would never be able to extract myself.

I mentioned that I had heavy nested template code that did not take a
comparison function as argument, that the complexity of which would
manifest even more drastically by the addition of a "helper" functions
for needy classes like String, etc.

He said it was what it was, and so I agreed to follow his advice, as I
had no interest in tackling the "String" problem, but hours of post-
conversation fidgeting, just to be sure, became days and days became
weeks.

Today, I am convinced that having a self-contained "fat" string class
whose objects know how to compare themselves without help from an
external helper function is one of the best design decisions that I
have ever made in C++.

> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

True, it was not trivial, given all the rules of UNICODE, plus the
performance hit when those rules are actually followed.

And to see the size of the code base of ICU...

http://site.icu-project.org/

...does not instill confidence.

In fact, I intended to use ICU directly en lieu of my own String class
a few months ago. But it had several issues, mostly having to do with
"keeping one foot on the dock, the other on the boat", with regard to
transitioning from C to C++, like no exceptions and a member function
whose job it was to test if its object was in a bad state, so I decide
to write my own, based on the principles of UNICODE collation:

http://unicode.org/reports/tr10/

...using ICU as a crutch.

In the end, I concluded that our fixation with skinny string classes
is over-rated.

If the goal is mental relief and true universal applicability, it is
best to let the String class be what it is: inherently fat.

If the goal is to have a sequence of bytes that can be compared using
a helper function and otherwise fiddled-with, then certainly we
already know how to do that.

-Le Chaud Lapin-

Goran

unread,
Jan 8, 2010, 8:44:05 AM1/8/10
to
On Jan 8, 8:51 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> You can see in the following STL code from VS2008 that the _Pred()
> function is being invoked twice for a comparison:
>
> template<class _Pr, class _Ty1, class _Ty2> inline
> bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left,
> const _Ty2& _Right,
> const wchar_t *_Where, unsigned int _Line)
> { // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
> if (!_Pred(_Left, _Right))
> return (false);
> else if (_Pred(_Right, _Left))
> _DEBUG_ERROR2("invalid operator<", _Where, _Line);
> return (true);
> }

Hmmm... I think MS people should check this. That second comparison
should be out of the picture in optimized builds, because it only has
debugging purposes. Quick look at sources tells me that ain't the
case. Implementation quality issue?

Goran.

Seungbeom Kim

unread,
Jan 8, 2010, 8:44:36 AM1/8/10
to
Tony Delroy wrote:
> As Adam wrote, it's useful when comparing two containers (e.g.
> strings), where you scan over some long common part, then find a
> distinguishing character. There's a fair chance that you may know
> exactly which of <, == and > apply at that point, so it would be nice
> to have a good way to return that information. If you only indicate
> say <, then a second comparison may waste time skipping the early
> common elements before returning to the distinguishing ones....

The standard library provides std::less as the entry point for a
general comparison, and uses it as the default in many places.
Your point then, as I understand it, is why is it a two-way comparison
instead of a three-way one, and why does std::sort take a 'less'
function returning bool (true/false) instead of a 'compare' function
returning int (<0/=0/>0) that the good old qsort uses; is this correct?

>
> Many existing "int compare(... a, ... b)" functions only promise to
> return a negative, 0 or positive value, as subtracting elements may
> produce that in one step more efficiently than guaranteeing -1, 0, or
> 1. The latter scheme does make it easy to switch on the resultant
> value though...

A comparison primitive had better be more efficient, IMO. When you
need a 'stricter' version returning one of the three values, you can
always use a separate sign function.

--
Seungbeom Kim

Tony Delroy

unread,
Jan 8, 2010, 8:42:14 AM1/8/10
to
(Sorry... hit "send" by accident on the post to which this replies.
Please ignore everything from "f it helps the compiler a single"
onwards....)

Anyway, I demonstrated an efficiency issue with current C++, in that
sometimes compilers repeat a comparison when the only difference from
an earlier one was a test for == instead of <...

return argc < 7 ? 42 : argc == 7 ? 53 : 64;

g++ -S -O3 compare.cc # ver. 4.4.1

cmpl $6, %edi
movl $42, %eax
jle .L3
movb $64, %al
cmpl $7, %edi
movl $53, %edx
cmove %edx, %eax
.L3:
rep
ret

Here, cmpl $6 is followed by a cmpl $7. Perhaps a builtin less-than/
equal-to-/greater-than comparison would make it easier for compilers
to perform multiple tests on the flags set by a single comparison, but
perhaps it's an overlooked optimisation that could be easily done
already, some other compilation flags would have enabled it, or the
resultant code just happens to be slower on the target CPU here (i7
920 / Ubuntu Karmic Koala).

For ASCIIZ strings, the case is more clear cut, as a common substring
may need to be rescanned for the second comparison.

<francis.glassbo...@btinternet.com> wrote:
> ... many types have more then one logical ordering, and often include


> equivalence sets)? Even std::string has multiple orderings depending
> on the collation rules. How would you deal with types that have no
> ordering? What about types that have equivalence sets but no ordering?
> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

That's true, but no more true than for the operators that do enjoy
language support. An overloadable operator allows customisation to
the exact data stored.

Cheers,
Tony

Piotr Wyderski

unread,
Jan 8, 2010, 1:39:27 PM1/8/10
to
Francis Glassborow wrote:

> I wonder if you have thought this through. How would you actually use
> such a function?

Yes, it should be a default base for std::less and the rest of the family.
To be more specific, there should be two functions:

std::equal_to<T>
std::compare<T>

The defaut implementation of std::less, greater, greater_equal and
less_equal
should reduce to std::compare. std::equal_to also, if not explicitly
provided,
and not_equal_to should be reduced to equal_to. That is because determining
equality is often much faster than a full three-way comparison (and does not
imply a partial order std::less requires). If one can implement std::less,
then
he is also able to implement std::compare using two less-es. On the
contrary,
std::less, greater etc. always reduce to a single invocation of compare, so
there is no need to restore the lost information.

> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

No, I use it in my program and it turns out to be very useful.

Best regards
Piotr Wyderski

Piotr Wyderski

unread,
Jan 8, 2010, 1:40:26 PM1/8/10
to
red floyd wrote:

> #include <functional>
> template <typename T>
> int compare<T>(const T& left, const T& right)
> {
> if (std::less(left, right))
> return -1;
> else if (std::less(right, left))
> return 1;
> else
> return 0;
> }

Two less-es, which is a completely unnecessary overhead. :-)
Instead, one can think of expressing the less family in terms of
compare -- always exactly one invocation. In case of longer
structures a direct 3-way comparison is much faster (one full
scan + the first found non-equal element handling instead of
two full scans).

Best regards
Piotr Wyderski

Adam Badura

unread,
Jan 8, 2010, 1:50:51 PM1/8/10
to
This is a reply which was sent to "Francis Glassborow" and was
accidentally posted also here.

Le Chaud Lapin

unread,
Jan 8, 2010, 8:28:08 PM1/8/10
to
On Jan 8, 7:44 am, Goran <goran.pu...@gmail.com> wrote:
> On Jan 8, 8:51 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > You can see in the following STL code from VS2008 that the _Pred()
> > function is being invoked twice for a comparison:
>
> > template<class _Pr, class _Ty1, class _Ty2> inline
> > bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left,
> > const _Ty2& _Right,
> > const wchar_t *_Where, unsigned int _Line)
> > { // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
> > if (!_Pred(_Left, _Right))
> > return (false);
> > else if (_Pred(_Right, _Left))
> > _DEBUG_ERROR2("invalid operator<", _Where, _Line);
> > return (true);
> > }
>
> Hmmm... I think MS people should check this. That second comparison
> should be out of the picture in optimized builds, because it only has
> debugging purposes. Quick look at sources tells me that ain't the
> case. Implementation quality issue?

I do not understand the standard library very well, but it looks like
this is more a deliberate design issue than implementation issue:

http://www.sgi.com/tech/stl/less.html

Apparently, the idea is that, if A < B, then logically, B must be
greater than A, so the designers of standard library help the
programmer by requiring that s/he only supply operator <, and the
library will take care of the rest.

As Adam pointed out, this "taking care of the rest" is the crux of the
issue.

By prescribing that comparisons must rely on operator <, and not (-,
0, +), the double-comparison for testing order becomes inevitable.

On a semi-related note regarding a fat string class to conform to the
model Adam proposes, whereby comparison of two strings is done using
only state entirely contained within either string:

int s = signum(s1, s2); // s becomes (+, 0, -)

...I realized this morning that all my fretting to keep such a String
class "skinny" compared to std::string might be unwarranted. I figured
that, while my String was guaranteed to be fat, std::string might not
be so skinny itself.

So I checked:

int main ()
{
cout << "sizeof(String) == " << sizeof(String) << endl;
cout << "sizeof(std::string) == " << sizeof(std::string) << endl;
}

Output Debug Mode, Visual Studio 2008:

sizeof(String) == 28
sizeof(std::string) == 32

Output Release Mode, Visual Studio 2008:

sizeof(String) == 28
sizeof(std::string) == 28

Of course, there will be trade-offs. I imagine my String will be
slower in some situations and might not provide some desirable
feautures of std::string, but at least this shows thave a sizeof
(A_String_Class) >, say, 12, is not so bad.

-Le Chaud Lapin-

Adam Badura

unread,
Jan 8, 2010, 8:26:17 PM1/8/10
to
> The standard library provides std::less as the entry point for a
> general comparison, and uses it as the default in many places.
> Your point then, as I understand it, is why is it a two-way comparison
> instead of a three-way one, and why does std::sort take a 'less'
> function returning bool (true/false) instead of a 'compare' function
> returning int (<0/=0/>0) that the good old qsort uses; is this correct?

I cannot say in name of Tony Delroy but as it goes for me it is
more or less what I meant.
I am wondering why STD does not provide a generalized way to
specify "three-way" comparators just like it does for "two-way" ones.
Types which support such comparison (and this includes almost all if
not all standard types and many user types) would be easier and more
efficient in use with such "three-way" comparator.

> A comparison primitive had better be more efficient, IMO. When you
> need a 'stricter' version returning one of the three values, you can
> always use a separate sign function.

I myself didn't opt to have strict -1, 0, +1 return values but
rather general -, 0, +. But it might be done by yet another layer of
abstraction. Those types which easily support strict return values
would do so and other would provide weaker version and the strict on
would be added by sign function.

Adam Badura

--

Francis Glassborow

unread,
Jan 9, 2010, 8:24:03 PM1/9/10
to

True (in what world would A<B not imply B>A) The next step is to realise
that !(A<B) and !(B<A) can be true. Values for A and B that satisfy that
are members of an equivalence set (note that they might not be identical)

Maxim Yegorushkin

unread,
Jan 10, 2010, 2:36:21 PM1/10/10
to
On 07/01/10 09:32, Adam Badura wrote:
> Why C++ does not contain a generic compare function? Such function
> would return a negative value if left< right, zero if left == right
> and a positive value if left> right. It could be either overloaded
> for user types or like swap use some template magic.

It can be as simple as:

// Returns -1, 0 and 1 for t < u, t == u and t > u respectively
// Overload for your own types as necessary.
template<class T, class U>
int compare(T t, U u)
{
return (t > u) - (t < u);
}

--
Max

Adam Badura

unread,
Jan 11, 2010, 2:24:02 PM1/11/10
to
> It can be as simple as:
>
> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
> // Overload for your own types as necessary.
> template<class T, class U>
> int compare(T t, U u)
> {
> return (t > u) - (t < u);
> }

It is not about how short that function could be.
And beside your function (as the default one) has following
drawbacks:
1) It requires both operator < and operator >. Providing both is not
hard if
one can be provided however STD seems to entirely depend only on < and
==.
2) It will not work with overloaded versions of those operators which
do not
return bool but for example an int which is 0 for false and and
undetermined
positive number for true.
3) I think STD prefers to use std::less (and std::greater in this case
as
well) rather then row operators. But I am not sure here.

Adam Badura

--

Joe Gottman

unread,
Jan 11, 2010, 2:32:35 PM1/11/10
to
Maxim Yegorushkin wrote:
> On 07/01/10 09:32, Adam Badura wrote:
>> Why C++ does not contain a generic compare function? Such function
>> would return a negative value if left< right, zero if left == right
>> and a positive value if left> right. It could be either overloaded
>> for user types or like swap use some template magic.
>
> It can be as simple as:
>
> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
> // Overload for your own types as necessary.
> template<class T, class U>
> int compare(T t, U u)
> {
> return (t > u) - (t < u);
> }
>

The problem with this is that operator - is not defined for bool.
Instead, why not just define this as

return (t < u) ? -1 : ((u < t) ? 1 : 0);


Joe Gottman

--

Tony Delroy

unread,
Jan 11, 2010, 5:07:52 PM1/11/10
to
On Jan 11, 4:36 am, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:

> On 07/01/10 09:32, Adam Badura wrote:
>
> > Why C++ does not contain a generic compare function? Such function
> > would return a negative value if left< right, zero if left == right
> > and a positive value if left> right. It could be either overloaded
> > for user types or like swap use some template magic.
>
> It can be as simple as:
>
> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
> // Overload for your own types as necessary.
> template<class T, class U>
> int compare(T t, U u)
> {
> return (t > u) - (t < u);
>
> }

That is simple as in concise, but not so simple in its use of the
integral values associated with booleans. It's also throwing
efficiency out the door, as it performs two comparisons even in
situations when only the first is necessary. Finally, it requires
operator> and operator< be defined for T and U, more of a burden than
using < twice. The following address both issues (and in one less
character ;-P)...

return t < u ? -1 : u < t;

As per the thread, that still risks two from-scratch comparisons,
which might involve redundant work.

Cheers,
Tony


--

Francis Glassborow

unread,
Jan 11, 2010, 5:03:31 PM1/11/10
to
Adam Badura wrote:
>> It can be as simple as:
>>
>> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
>> // Overload for your own types as necessary.
>> template<class T, class U>
>> int compare(T t, U u)
>> {
>> return (t > u) - (t < u);
>> }
>
> It is not about how short that function could be.
> And beside your function (as the default one) has following
> drawbacks:
> 1) It requires both operator < and operator >. Providing both is not
> hard if
> one can be provided however STD seems to entirely depend only on < and
> ==.
Well replace the return statement with

return (u < t) - (t < u);

Maxim Yegorushkin

unread,
Jan 11, 2010, 7:15:10 PM1/11/10
to
On 11/01/10 19:32, Joe Gottman wrote:
> Maxim Yegorushkin wrote:
>> On 07/01/10 09:32, Adam Badura wrote:
>>> Why C++ does not contain a generic compare function? Such function
>>> would return a negative value if left< right, zero if left == right
>>> and a positive value if left> right. It could be either overloaded
>>> for user types or like swap use some template magic.
>>
>> It can be as simple as:
>>
>> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
>> // Overload for your own types as necessary.
>> template<class T, class U>
>> int compare(T t, U u)
>> {
>> return (t > u) - (t < u);
>> }
>>
>
> The problem with this is that operator - is not defined for bool.

This is why these two bools get promoted to int before doing the
substruction.

--
Max

Maxim Yegorushkin

unread,
Jan 11, 2010, 7:16:04 PM1/11/10
to
On 11/01/10 19:24, Adam Badura wrote:
>> It can be as simple as:
>>
>> // Returns -1, 0 and 1 for t< u, t == u and t> u respectively
>> // Overload for your own types as necessary.
>> template<class T, class U>
>> int compare(T t, U u)
>> {
>> return (t> u) - (t< u);
>> }
>
> It is not about how short that function could be.
> And beside your function (as the default one) has following
> drawbacks:
> 1) It requires both operator< and operator>. Providing both is not
> hard if
> one can be provided however STD seems to entirely depend only on< and
> ==.

Good point. It could be then: return (u < t) - (t < u).

> 2) It will not work with overloaded versions of those operators which
> do not
> return bool but for example an int which is 0 for false and and
> undetermined
> positive number for true.

True. Too bad for those eccentric operators. :)

> 3) I think STD prefers to use std::less (and std::greater in this case
> as
> well) rather then row operators. But I am not sure here.

The above works well for built-in arithmetic types. For user-defined
types it is assumed that the user wants to provide a more efficient
implementation that avoids doing two less-than comparisons in the first
place. Something like:

struct X
{
std::string a;
bool b;
int c;
double d;

int compare(X const& other) const
{
// assuming compare(std::string, std::string) overload
if(int r = compare(a, other.a))
return r;
if(int r = compare(b, other.b))
return r;
if(int r = compare(c, other.c))
return r;
return compare(d, other.d);
}
};

// overload the free-standing compare
inline int compare(X const& a, X const& b) { return a.compare(b); }

--
Max

Seungbeom Kim

unread,
Jan 11, 2010, 7:14:54 PM1/11/10
to
Maxim Yegorushkin wrote:
> On 07/01/10 09:32, Adam Badura wrote:
>> Why C++ does not contain a generic compare function? Such function
>> would return a negative value if left< right, zero if left == right
>> and a positive value if left> right. It could be either overloaded
>> for user types or like swap use some template magic.
>
> It can be as simple as:
>
> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
> // Overload for your own types as necessary.
> template<class T, class U>
> int compare(T t, U u)
> {
> return (t > u) - (t < u);
> }

This version performs the comparison twice, which defeats the purpose
of having one compare function in the first place. Suppose t and u are
std::string objects several hundred or thousand characters long that
differ only at the end...

--
Seungbeom Kim

Maxim Yegorushkin

unread,
Jan 12, 2010, 6:58:04 PM1/12/10
to
On 12/01/10 00:14, Seungbeom Kim wrote:
> Maxim Yegorushkin wrote:
>> On 07/01/10 09:32, Adam Badura wrote:
>>> Why C++ does not contain a generic compare function? Such function
>>> would return a negative value if left< right, zero if left == right
>>> and a positive value if left> right. It could be either overloaded
>>> for user types or like swap use some template magic.
>>
>> It can be as simple as:
>>
>> // Returns -1, 0 and 1 for t < u, t == u and t > u respectively
>> // Overload for your own types as necessary.
>> template<class T, class U>
>> int compare(T t, U u)
>> {
>> return (t > u) - (t < u);
>> }
>
> This version performs the comparison twice, which defeats the purpose
> of having one compare function in the first place. Suppose t and u are
> std::string objects several hundred or thousand characters long that
> differ only at the end...

This logic could apply to std::swap() if the was not an overload:

void std::swap(std::string& a, std::string& b)
{
a.swap(b);
}

Whose only purpose is optimization. Following std::swap() optimization
pattern in a real system there should be an overload:

int compare(std::string& a, std::string& b)
{
return a.compare(b);
}

Interesting enough, using compare() (or compare<> functor) instead of
std::less<> in std::set/map<> would give it a speed boost for operations
like find() because to determine whether elements are equal it calls
std::less<> twice . That results in two strncmp() calls while returning
from find() when keys are strings, whereas using compare that would
require only one strncmp() call.

--
Max

Seungbeom Kim

unread,
Jan 13, 2010, 7:36:44 AM1/13/10
to
Maxim Yegorushkin wrote:
> On 12/01/10 00:14, Seungbeom Kim wrote:
>> Maxim Yegorushkin wrote:
>>
>> This version performs the comparison twice, which defeats the purpose
>> of having one compare function in the first place. Suppose t and u are
>> std::string objects several hundred or thousand characters long that
>> differ only at the end...
>
> This logic could apply to std::swap() if the was not an overload:
>
> void std::swap(std::string& a, std::string& b)
> {
> a.swap(b);
> }
>
> Whose only purpose is optimization. Following std::swap() optimization
> pattern in a real system there should be an overload:
>
> int compare(std::string& a, std::string& b)
> {
> return a.compare(b);
> }

The question is, what do we get by choosing "less", instead of "compare",
as /the/ primitive and requiring overloads for optimization?

We have seen that "less" could be easily and efficiently provided in
terms of "compare", but not vice versa. With "less" as the primitive,
some algorithms may suffer the performance penalty mentioned below,
or an overload of "compare" has to be provided separately from "less"
by the class designer. With "compare" as the primitive, just one
"compare" function is all that's needed from the class designer,
and "less" or "greater" or "equal_to" can all be implemented easily
and efficiently in terms of "compare".

Of course, thing could have been a lot easier with an operator like
"<=>", which would have encouraged the designers of the standard library
to use this as the primitive for general comparison, and allowed class
designers to provide overloads of the operator.

>
> Interesting enough, using compare() (or compare<> functor) instead of
> std::less<> in std::set/map<> would give it a speed boost for operations
> like find() because to determine whether elements are equal it calls
> std::less<> twice . That results in two strncmp() calls while returning
> from find() when keys are strings, whereas using compare that would
> require only one strncmp() call.

Right, and that is the motivation for "compare" (as I understand it).

--
Seungbeom Kim

Maxim Yegorushkin

unread,
Jan 13, 2010, 5:56:54 PM1/13/10
to

Yep, <=> is compare, in terms of which other relational operators can be
implemented easily/generically.

--
Max

Jesse Perla

unread,
Jan 13, 2010, 5:56:53 PM1/13/10
to
On Jan 13, 7:36 am, Seungbeom Kim <musip...@bawi.org> wrote:

> Maxim Yegorushkin wrote:
> The question is, what do we get by choosing "less", instead of "compare",
> as /the/ primitive and requiring overloads for optimization?

I think the answer is primarily that the std is trying to have
mathematical foundations in order theory. Everything in order theory
is about binary predicates. Take set Y, the ordering is a mapping Y x
Y: -> {0,1}

For an overview of the foundations behind this, the creator of the
standard library recently wrote: http://www.elementsofprogramming.com/


--

Seungbeom Kim

unread,
Jan 13, 2010, 9:22:37 PM1/13/10
to
Jesse Perla wrote:
> On Jan 13, 7:36 am, Seungbeom Kim <musip...@bawi.org> wrote:
>> Maxim Yegorushkin wrote:
>> The question is, what do we get by choosing "less", instead of "compare",
>> as /the/ primitive and requiring overloads for optimization?
>
> I think the answer is primarily that the std is trying to have
> mathematical foundations in order theory. Everything in order theory
> is about binary predicates. Take set Y, the ordering is a mapping Y x
> Y: -> {0,1}

That may be valuable in math, but computer programming is not just math.
For example, in mathematics you can define natural numbers using sets:

0 := { }
1 := {0} = {{ }}
2 := {0, 1} = {{ }, {{ }}}
3 := {0, 1, 2} = {{ }, {{ }}, {{ }, {{ }}}}
: :

and this is totally equivalent to the conventional definition and
fine in mathematics, but nobody does that for computer programming;
it would be a huge inefficiency. Because mathematical models don't
take efficiency into account.

Likewise, in mathematics it may be that all you need is <, and that
you can build all the others (<=, >, >=, ==, !=) from it, but it may
turn out to be suboptimal in efficiency, as we see in this thread.
The C++ standard library could have been much more efficient, while
maintaining the equivalence, by using compare (<=>) functions instead
of less (<).

--
Seungbeom Kim

Le Chaud Lapin

unread,
Jan 14, 2010, 2:07:08 PM1/14/10
to
On Jan 13, 8:22 pm, Seungbeom Kim <musip...@bawi.org> wrote:
> Likewise, in mathematics it may be that all you need is <, and that
> you can build all the others (<=, >, >=, ==, !=) from it, but it may
> turn out to be suboptimal in efficiency, as we see in this thread.
> The C++ standard library could have been much more efficient, while
> maintaining the equivalence, by using compare (<=>) functions instead
> of less (<).

Given the relatively low expenditure of intellectual energy required
to discover that (-, 0, +) is far more efficient than operator <, one
has to wonder for what reason was operator < chosen in the first
place.

-Le Chaud Lapin-


--

LR

unread,
Jan 15, 2010, 9:57:43 PM1/15/10
to
Le Chaud Lapin wrote:

> Given the relatively low expenditure of intellectual energy required
> to discover that (-, 0, +) is far more efficient than operator <, one
> has to wonder for what reason was operator < chosen in the first
> place.

Perhaps this http://www.stepanovpapers.com/notes.pdf will provide an
explanation.

LR

Anders Dalvander

unread,
Jan 15, 2010, 10:01:39 PM1/15/10
to
On Jan 14, 8:07 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> Given the relatively low expenditure of intellectual energy required
> to discover that (-, 0, +) is far more efficient than operator <, one
> has to wonder for what reason was operator < chosen in the first
> place.

I would guess the rationale is rather simple: "less" is a much simpler
concept than "compare".

Even so, I would believe that using "compare" instead of "less" would
be marginally better, and possible a premature optimization.

For instance, imagine the difference in performance of std::map::find,
if it would utilize "compare" instead of "less". Sure, if it would
contain few elements the performance improvements would be relatively
large, if it would contain many elements the relative performance
improvements would be marginally. Using another data structure would
probably be better anyway.

Regards,
Anders Dalvander

Le Chaud Lapin

unread,
Jan 16, 2010, 2:26:32 PM1/16/10
to
On Jan 15, 9:01 pm, Anders Dalvander <goo...@dalvander.com> wrote:
> On Jan 14, 8:07 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > Given the relatively low expenditure of intellectual energy required
> > to discover that (-, 0, +) is far more efficient than operator <, one
> > has to wonder for what reason was operator < chosen in the first
> > place.
>
> I would guess the rationale is rather simple: "less" is a much simpler
> concept than "compare".

I do not see how "less" can be simpler than, say, strcmp(), an old
from that obviously uses the "compare" method:

int strcmp ( const char * str1, const char * str2 );

The actual code for strcmp() using "compare" is hardly more complex
than would be that for "less". Onen simply returns the difference: int
(str1[i]) - int(str2[i]).

> Even so, I would believe that using "compare" instead of "less" would
> be marginally better, and possible a premature optimization.

On the contratry, "compare" can be nearly twice as fast as "less" in
extreme cases. Imagine that the programmer has a set<> of 1,000,000
Foo's:

set<Foo>; // 1,000,000 elements

If the test of relative order of two Foo's is inherently long, such
that its excution time dominates the execution time of the other
operations, say, in in AVL/Red-Black implementations (mainly
rotations), then the "compare" method will be 1.9x, 1.95x, 1.98x, etc.
as fast as as the "less" method.

What is notable is that, once the library designer has prescribed the
"less" method, there is nothing that the library user could do to
change this difference in performance. Not optimization of "less" will
mitigate this inherent disparity.

> For instance, imagine the difference in performance of std::map::find,
> if it would utilize "compare" instead of "less". Sure, if it would
> contain few elements the performance improvements would be relatively
> large, if it would contain many elements the relative performance
> improvements would be marginally. Using another data structure would
> probably be better anyway.

Well, I think it is the opposite. The more elements there are, the
more one should prefer "compare", especially if the ordering test is
inherently computationally expensive.

-Le Chaud Lapin-

Chris Morley

unread,
Jan 17, 2010, 8:20:28 AM1/17/10
to
"Le Chaud Lapin" <jaibu...@gmail.com> wrote in message
news:30ea50bf-18e2-4f45-a5d4-

> On the contratry, "compare" can be nearly twice as fast as "less" in
> extreme cases. Imagine that the programmer has a set<> of 1,000,000
> Foo's:

This is a bold statement without supplying any argument as why. Why should
it be twice the speed? Using < for tree operations doesn't require double
the comparisons. Others have discussed relative speed of < vs compare so I
won't so let's simply assume they are identical.

> set<Foo>; // 1,000,000 elements
>
> If the test of relative order of two Foo's is inherently long, such
> that its excution time dominates the execution time of the other
> operations, say, in in AVL/Red-Black implementations (mainly
> rotations), then the "compare" method will be 1.9x, 1.95x, 1.98x, etc.
> as fast as as the "less" method.

As someone who has actually implemented their own RB tree code (using
indexes rather than pointers) I know that the number of comparisons (&
branches) dominate performance (regardless of pred function complexity).
Walking the tree and performing rotations are fast operations & you don't
need many.

With <, lower bound is trivial and requires one comparison per node till you
hit the bottom of the tree. Checking each node for equality as well (did my
return value == 0?) will double the branch statements per node traversed
likely causing any node past 1/2 way down the tree to be slower (which is
most of the nodes, it being a tree 'n all).

Find is trivially written in terms of lower bound with one extra compare.

iterator Find(const_reference Key, predT Pred)
{
iterator wherenode( LowerBound(Key,Pred) );
return (wherenode == end() || Pred( Key, *wherenode) ? end() :
wherenode);
}

If you have a tree with the guarentee of no duplication of keys then you
could optimise find to stop when it matches with compare but even then on
average you don't have half the comparison operations. Most of the nodes are
far from the head/root remember, it is a binary tree. Without this
guarentee, you need to descend the entire tree anyway to look for duplicates
so they are the same.

Take a look at the STL library code to see how you actually do operations
using < only. Tree source with gcc is hideously unreadable but the VS2008
xtree.h? (of the top of my head) is pretty clear.

> Well, I think it is the opposite. The more elements there are, the
> more one should prefer "compare", especially if the ordering test is
> inherently computationally expensive.

Don't forget that if you double the elements in a _balanced_ binary tree
then you only increase the path height by 1 node. As most of the nodes are
near the bottom of the tree by definition, even with hundreds of
millions/billions of entries in the restricted no duplicates tree you save a
few comparisons on average out of ~30. Do the arithmetic. If the key
comparison is cheap however then you lose with compare vs < because the
increased branches in your "optimised" find method.

If you can make/demonstrate an STL replacement library with compare which
performs 2x faster than implementations shipped with VS & GCC etc. then I'm
sure many readers of this group would be very interested.

Chris

Adam Badura

unread,
Jan 17, 2010, 2:11:24 PM1/17/10
to
> Perhaps this http://www.stepanovpapers.com/notes.pdf will provide an
> explanation.

This is over 200 pages. Are you expecting us to read it all and
then reply whether it provided any explanation or not? Because this
does not seem reasonable...
I quickly browsed through parts about comparing but found nothing
about this topic. And I do not have (sadly) that much time to read it
all (now).

Adam Badura

Anders Dalvander

unread,
Jan 17, 2010, 2:11:20 PM1/17/10
to
On Jan 16, 8:26 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> > I would guess the rationale is rather simple: "less" is a much simpler
> > concept than "compare".
>
> I do not see how "less" can be simpler than, say, strcmp(), an old
> from that obviously uses the "compare" method:

I'm not referring to the code, I'm referring to the model and concept.
"less" models Strict Weak Ordering, a well known concept in
programming as well as in other areas, see for instance
http://en.wikipedia.org/wiki/Strict_weak_ordering or
http://www.sgi.com/tech/stl/StrictWeakOrdering.html for description.

SWO has four well defined properties: irreflexivity, asymmetry,
transitivity and transitivity of equivalence. Which properties
"compare" would have I don't know, I feel that it would have more
complex properties than the ones for "less". This is what I meant with
my statement that "less" is a much simpler concept than "compare".

> Well, I think it is the opposite. The more elements there are, the
> more one should prefer "compare", especially if the ordering test is
> inherently computationally expensive.

std::map::find is usually built on-top of std::map::lower_bound. To
give a correct result std::map::find need one extra call to "less"
than std::map::lower_bound. std::map::lower_bound, which only need to
utilize "less", wouldn't gain anything from calling "compare" instead
of "less", as "less" is the only thing needed and is only called once
for every level in the map.

Given a map of 1,000,000 elements, that map would have the height of
20. std::map::lower_bound would need to call "less" 20 times, and
std::map::find would need to call "less" 21 times.

For the same sized compmap, utilizing a "compare" function,
compmap::lower_bound would need to call "less" 20 times, and
compmap::find would need to call "less" 20 times. The complexity of
compmap::find would on the other hand be greater as it cannot fully
rely on compmap::lower_bound.

std::map::find would call "less" at most 5% more times than
compmap::find would call "compare", if the map would contain 1,000,000
elements.

Regards,
Anders Dalvander

LR

unread,
Jan 18, 2010, 9:35:26 AM1/18/10
to
Adam Badura wrote:
>> Perhaps this http://www.stepanovpapers.com/notes.pdf will provide an
>> explanation.
>
> This is over 200 pages. Are you expecting us to read it all and
> then reply whether it provided any explanation or not?

No. The document has a table of contents.


> I quickly browsed through parts about comparing but found nothing
> about this topic. And I do not have (sadly) that much time to read it
> all (now).

I quickly browsed also and found parts that I think are relevant,
although I don't think there is An Explanation in this text. I think
that might have to be extracted from reading the text and taking its
implications. Unfortunately this process will be subjective, so YMWV.

Anyway, sorry, but I don't want to paraphrase what he wrote, so this is
the best I can do, here are the parts that I think may be relevant. I
hope you find this useful.

Start on page 20 in Lecture 2 with the discussion of the == operator for
cvector_int. Continue with the justification of why == and != should be
defined together in the start of Lecture 3. Continue with Lecture 3 on
pages 26 and 27 where using operator< to implement the other operators
is justified, although I think the choice between < and > might be
somewhat arbitrary, I'm not sure. There seems to be a minor
contradiction on these and the following pages regarding comparisons of
built in types with problems associated with NaN earlier in the text. I
think the discussion starting on page 29 regarding swap isn't relevant
to the question at hand. The class at the end of Lecture 5 might be
interesting to read. Lecture 6 page 52 contains some words about the
illusion of freedom (excuse me for interjecting a personal note, I
partially agree with this, but also disagree with it in parts) that I
think is relevant to the question of comparisons and what kind of code
we as programmers may want to write. I think the paragraph after that
about what Mr. Stepanov considers the worst mistake of his technical
career is probably relevant as well. Lecture 7 contains more
justification at the beginning and some more technical details.

I haven't read through the rest of the document, so there may be more.


But as I said, none of this is The Explanation, so I hope someone else
will do a better search than I was able to do and find The Answer.

HTH

LR

Seungbeom Kim

unread,
Jan 18, 2010, 9:43:19 AM1/18/10
to
Anders Dalvander wrote:
>
> std::map::find is usually built on-top of std::map::lower_bound. To
> give a correct result std::map::find need one extra call to "less"
> than std::map::lower_bound. std::map::lower_bound, which only need to
> utilize "less", wouldn't gain anything from calling "compare" instead
> of "less", as "less" is the only thing needed and is only called once
> for every level in the map.
>
> Given a map of 1,000,000 elements, that map would have the height of
> 20. std::map::lower_bound would need to call "less" 20 times, and
> std::map::find would need to call "less" 21 times.
>
> For the same sized compmap, utilizing a "compare" function,
> compmap::lower_bound would need to call "less" 20 times, and
> compmap::find would need to call "less" 20 times. The complexity of
> compmap::find would on the other hand be greater as it cannot fully
> rely on compmap::lower_bound.
>
> std::map::find would call "less" at most 5% more times than
> compmap::find would call "compare", if the map would contain 1,000,000
> elements.

It's not only the number of calls to "less" on the top level that
matters. Consider std::map<T> and compmap<T>, where T is std::pair<A, B>.
Each call to std::less<T> eventually leads to something like this:

// taken from GNU libstdc++
bool operator<(const pair<A, B>& x, const pair<A, B>& y)
{
return x.first < y.first
|| (!(y.first < x.first) && x.second < y.second);
}

Therefore, if x.first < y.first evaluates to false, whose probability
would be 50% under a uniform distribution assumption, the above function
should also evaluate y.first < x.first just to check whether the two
operands are equal and the second fields should be considered. We could
have checked it in the first place when we compared x.first and y.first.
This is what I call inefficiency.

On the other hand, if we used "compare", each call to compare<T> would
eventually lead to something like this:

int compare(const pair<A, B>& x, const pair<A, B>& y)
{
if (int r = compare(x.first, y.first)) return r;
return compare(x.second, y.second);
}

and we would never have to compare x.first and y.first twice.

I used std::pair just as an easy example, but the same phenomenon can
happen in any structure where the lexicographical ordering ("compare
field A first, and field B second, ...") is needed.

bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
{
return x.first < y.first
|| (!(y.first < x.first)
&& (x.second < y.second
|| (!(y.second < x.second)
&& x.third < y.third)));
}

// I'm not even sure if I got that right...
// This would be easier to write and understand:

bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
{
if (x.first < y.first) return true;
if (y.first < x.first) return false;
if (x.second < y.second) return true;
if (y.second < x.second) return false;
return x.third < y.third;
}

// This is intuitive and efficient!
int compare(const pair<A, B, C>& x, const pair<A, B, C>& y)
{
if (int r = compare(x.first, y.first)) return r;
if (int r = compare(x.second, y.second)) return r;
return compare(x.third, y.third);
}

And the inefficiency could get higher with a deeper level of nesting.

--
Seungbeom Kim

Adam Badura

unread,
Jan 18, 2010, 9:44:42 AM1/18/10
to
> I'm not referring to the code, I'm referring to the model and concept.
> "less" models Strict Weak Ordering, a well known concept in
> programming as well as in other areas, see for instance
> http://en.wikipedia.org/wiki/Strict_weak_ordering or
> http://www.sgi.com/tech/stl/StrictWeakOrdering.html for description.
>
> SWO has four well defined properties: irreflexivity, asymmetry,
> transitivity and transitivity of equivalence. Which properties
> "compare" would have I don't know, I feel that it would have more
> complex properties than the ones for "less". This is what I meant with
> my statement that "less" is a much simpler concept than "compare".

No. "compare" functor/function can model SWO or total ordering or
anything else like "less". It just computes in one shot results of all
three conditions:
1) A < B,
2) !(A < B) && !(B < A)
3) B < A
as it seems from practise that answering all these questions requires
usually (I don't know of any counter example) exactly the same work as
answering on of them.

> std::map::find is usually built on-top of std::map::lower_bound. To
> give a correct result std::map::find need one extra call to "less"
> than std::map::lower_bound. std::map::lower_bound, which only need to
> utilize "less", wouldn't gain anything from calling "compare" instead
> of "less", as "less" is the only thing needed and is only called once
> for every level in the map.

How about more complex data structures which are not present in
STD?


I do not have any hard data on performance gain of any of the
algorithms (either in STD or others).
However I am sure that using "compare" would not cause any
performance problems. Neither would it cause any implementation
problems. And it would allow to ask more direct questions. If you want
to compare two objects you just do it.
After all why did C had "strcmp" function (or something like that,
I'm not sure now) and not "strless"?

Adam Badura

Adam Badura

unread,
Jan 18, 2010, 2:58:50 PM1/18/10
to
> It's not only the number of calls to "less" on the top level that
> matters. Consider std::map<T> and compmap<T>, where T is std::pair<A, B>.
> Each call to std::less<T> eventually leads to something like this:
>
> // taken from GNU libstdc++
> bool operator<(const pair<A, B>& x, const pair<A, B>& y)
> {
> return x.first < y.first
> || (!(y.first < x.first) && x.second < y.second);
> }

Actually this can be done even now since you are the one who
writes the operator.

But since "compare" is not well established in standard library it
is more difficult as in general you will end up in writing your own
"compare" for every type including those standard ones and from 3-rd
party libraries.
If "compare" was present in the standard library, just like "swap"
is, the it would be much easier to use it.

Adam Badura

--

Le Chaud Lapin

unread,
Jan 18, 2010, 2:59:33 PM1/18/10
to
On Jan 18, 8:43 am, Seungbeom Kim <musip...@bawi.org> wrote:
[snipped]

> I used std::pair just as an easy example, but the same phenomenon can
> happen in any structure where the lexicographical ordering ("compare
> field A first, and field B second, ...") is needed.

Yep, such as gene sequence sorting, etc.

> bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
> {
> return x.first < y.first
> || (!(y.first < x.first)
> && (x.second < y.second
> || (!(y.second < x.second)
> && x.third < y.third)));
> }
>
> // I'm not even sure if I got that right...
> // This would be easier to write and understand:
>
> bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
> {
> if (x.first < y.first) return true;
> if (y.first < x.first) return false;
> if (x.second < y.second) return true;
> if (y.second < x.second) return false;
> return x.third < y.third;
> }
>
> // This is intuitive and efficient!
> int compare(const pair<A, B, C>& x, const pair<A, B, C>& y)
> {
> if (int r = compare(x.first, y.first)) return r;
> if (int r = compare(x.second, y.second)) return r;
> return compare(x.third, y.third);
> }
>
> And the inefficiency could get higher with a deeper level of nesting.

The form of the code above is pratically identical to that of the
actual code below for comparing bytes of Ethernet address:

---- START CODE ----

friend int signum (const Bytes &a, const Bytes &b)
{
if (int s = ::signum(a.buffer[5], b.buffer[5])) return s;
if (int s = ::signum(a.buffer[4], b.buffer[4])) return s;
if (int s = ::signum(a.buffer[3], b.buffer[3])) return s;
if (int s = ::signum(a.buffer[2], b.buffer[2])) return s;
if (int s = ::signum(a.buffer[1], b.buffer[1])) return s;

return ::signum(a.buffer[0], b.buffer[0]);
}

bool operator == (const Bytes &that) const {return signum (*this,
that) == 0;}
bool operator != (const Bytes &that) const {return signum (*this,
that) != 0;}
bool operator > (const Bytes &that) const {return signum (*this,
that) > 0;}
bool operator < (const Bytes &that) const {return signum (*this,
that) < 0;}
bool operator >= (const Bytes &that) const {return signum (*this,
that) >= 0;}
bool operator <= (const Bytes &that) const {return signum (*this,
that) <= 0;}

---- END CODE ----

As you might imagine, this method becomes so comfortable/regular that
after two or three iterations, it requires no forethought.

Whenever I have create a new class whose values can be ordered,
whether I intend to order them or not, immediately, I:

1. Go to one of my existing order-able classes.
2. Cut and paste the code like that above.
3. Rename old class name to new class name within the code.
4. Define signum for the new class.
5. Tuck it away for later, just in case.

Of course, existing STL code cannot benefit from "compare/signum", so
one would have to define their own containers to take advantage of
this method.

-Le Chaud Lapin-


--

Maxim Yegorushkin

unread,
Jan 21, 2010, 9:15:07 PM1/21/10
to
On 16/01/10 19:26, Le Chaud Lapin wrote:

[]

>> Even so, I would believe that using "compare" instead of "less" would
>> be marginally better, and possible a premature optimization.
>
> On the contratry, "compare" can be nearly twice as fast as "less" in
> extreme cases. Imagine that the programmer has a set<> of 1,000,000
> Foo's:
>
> set<Foo>; // 1,000,000 elements
>
> If the test of relative order of two Foo's is inherently long, such
> that its excution time dominates the execution time of the other
> operations, say, in in AVL/Red-Black implementations (mainly
> rotations), then the "compare" method will be 1.9x, 1.95x, 1.98x, etc.
> as fast as as the "less" method.

Well, I mentioned in my post:

...


Interesting enough, using compare() (or compare<> functor) instead of
std::less<> in std::set/map<> would give it a speed boost for operations
like find() because to determine whether elements are equal it calls

std::less<> twice . That results in two strncmp() calls *while returning
from find()* when keys are strings, whereas using compare that would


require only one strncmp() call.

...

I.e. it makes one extra call to std::less<> on the return from find(),
rather than for each node.

--
Max

0 new messages