Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is this sorting predicate ok?

94 views
Skip to first unread message

Paul

unread,
Nov 19, 2018, 9:47:53 AM11/19/18
to
I've begun a previous thread about a dictionary problem.
Basically we want the longest word that can be obtained by
choosing among a given set of characters.
So one idea is to sort the dictionary and we want to try the
longest words first. However, if a word is so long that it
contains too many characters then I want to ignore it.
So I came up with this sorting mechanism which does work when I tested
it but it seems problematic because of being potentially non-associative.
I know that that's not complete code but I'm interested in whether my
sorting predicate is bug-vulnerable.
Many thanks.

Paul


std::vector<int> characterCounts = charCounts(arrayOfLetters);
std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
// Test the longest words first but not the words that are too long.
const int length = arrayOfLetters.size();
auto cmp = [length](const std::string& word1, const std::string& word2)->bool{return word1.size() <= length && word2.size() > length || word1.size() > word2.size();};
std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);

Jorgen Grahn

unread,
Nov 19, 2018, 6:01:10 PM11/19/18
to
On Mon, 2018-11-19, Paul wrote:
> I've begun a previous thread about a dictionary problem.
> Basically we want the longest word that can be obtained by
> choosing among a given set of characters.
> So one idea is to sort the dictionary and we want to try the
> longest words first. However, if a word is so long that it
> contains too many characters then I want to ignore it.
> So I came up with this sorting mechanism which does work when I tested
> it but it seems problematic because of being potentially non-associative.
> I know that that's not complete code but I'm interested in whether my
> sorting predicate is bug-vulnerable.
...

(Formatted for readability)

> auto cmp = [length](const std::string& word1,
> const std::string& word2) -> bool {
> return word1.size() <= length && word2.size() > length
> || word1.size() > word2.size();
> };

Side note: my compiler gives me a warning about the precedence
between || and &&. I happily admit I had to look it up; I've never
needed to memorize that particular precedence rule.

Enable and pay attention to compiler warnings.

I tried to understand it, but failed. I don't know if it's safe; my
gut feeling is it's not. As you seem to know, it's easy to get a
sorting predicate subtly wrong and have the code break in subtle but
catastrophic way.

I'd try to solve the problem in a way which doesn't need an unusual
predicate. Where "unusual" means roughly anything but
std::lexicographical_compare(f(a), f(b)), or whatever the syntax is.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paul

unread,
Nov 19, 2018, 7:08:32 PM11/19/18
to
Good advice. You say "I tried to understand it, but failed."
I'll explain what I was trying to do.
We have a std::string, for example std::string s = "ababcoddefg";
We have a std::unordered_set<std::string> stringSet;
The task is to find a longest word in stringSet that uses only letters
from string s and respects the character counts.
For example s has only 2 a's. So the word "aaa" would not be allowed but
the word "a" would be allowed (assuming that "a" is a member of stringSet).
So that raises the question of which words to try first in stringSet.
Well a word can't possibly work if it's longer than s.size() so I want
all words longer than s.size() to be at the bottom.
Regarding words that are not longer than s.size(), I want them to be
in descending order of size. That way, once you've found a word that works,
you can just return it.
I was trying to achieve the above by a sorting predicate.
I'm not saying I was right, and that my method was good.
But now I hope you understand what I was trying to do. I suppose I could
experiment by creating a large set of words and examine whether the sorting
has the above property.

Paul

Paul

unread,
Nov 19, 2018, 7:13:33 PM11/19/18
to
On Monday, November 19, 2018 at 11:01:10 PM UTC, Jorgen Grahn wrote:
> On Mon, 2018-11-19, Paul wrote:
> > I've begun a previous thread about a dictionary problem.
> > Basically we want the longest word that can be obtained by
> > choosing among a given set of characters.
> > So one idea is to sort the dictionary and we want to try the
> > longest words first. However, if a word is so long that it
> > contains too many characters then I want to ignore it.
> > So I came up with this sorting mechanism which does work when I tested
> > it but it seems problematic because of being potentially non-associative.
> > I know that that's not complete code but I'm interested in whether my
> > sorting predicate is bug-vulnerable.
> ...
>
> (Formatted for readability)
>
> > auto cmp = [length](const std::string& word1,
> > const std::string& word2) -> bool {
> > return word1.size() <= length && word2.size() > length
> > || word1.size() > word2.size();
> > };
>
> Side note: my compiler gives me a warning about the precedence
> between || and &&. I happily admit I had to look it up; I've never
> needed to memorize that particular precedence rule.
...

But it doesn't need memorization or looking it up. The precedence between
|| and && can't be different from the precedence between + and *.
Surely everyone knows that true || false && false means true || (false && false) which == true.

Paul

Paavo Helde

unread,
Nov 20, 2018, 1:30:49 AM11/20/18
to
On 19.11.2018 16:47, Paul wrote:
> I know that that's not complete code but I'm interested in whether my
> sorting predicate is bug-vulnerable.
>
> std::vector<int> characterCounts = charCounts(arrayOfLetters);
> std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
> // Test the longest words first but not the words that are too long.
> const int length = arrayOfLetters.size();
> auto cmp = [length](const std::string& word1, const std::string& word2)->bool{return word1.size() <= length && word2.size() > length || word1.size() > word2.size();};
> std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);

Assume length = 3

Consider
x = "ab";
y = "abcd";

cmp(x, y) -> true
cmp(y, x) -> true

That's not going to fly.


Jorgen Grahn

unread,
Nov 20, 2018, 1:59:43 AM11/20/18
to
On Tue, 2018-11-20, Paul wrote:
> On Monday, November 19, 2018 at 11:01:10 PM UTC, Jorgen Grahn wrote:
>> On Mon, 2018-11-19, Paul wrote:
...
>> > auto cmp = [length](const std::string& word1,
>> > const std::string& word2) -> bool {
>> > return word1.size() <= length && word2.size() > length
>> > || word1.size() > word2.size();
>> > };
>>
>> Side note: my compiler gives me a warning about the precedence
>> between || and &&. I happily admit I had to look it up; I've never
>> needed to memorize that particular precedence rule.
> ...
>
> But it doesn't need memorization or looking it up. The precedence between
> || and && can't be different from the precedence between + and *.

I don't immediately see a similarity between those pairs.

> Surely everyone knows that true || false && false means true || (false && false) which == true.

1. Not everyone; see above. I honestly don't know, and I've been
programming in C for almost thirty years. (I could learn the rule,
but I have never needed to.)

2. I don't seem to be extreme; I don't use more parentheses than many
others. (Although I prefer to introduce temporaries for complex
expressions.)

3. Enough people seem to feel that way, or gcc wouldn't warn.

Ben Bacarisse

unread,
Nov 20, 2018, 8:34:29 AM11/20/18
to
Jorgen Grahn <grahn...@snipabacken.se> writes:

> On Tue, 2018-11-20, Paul wrote:
>> On Monday, November 19, 2018 at 11:01:10 PM UTC, Jorgen Grahn wrote:
>>> On Mon, 2018-11-19, Paul wrote:
> ...
>>> > auto cmp = [length](const std::string& word1,
>>> > const std::string& word2) -> bool {
>>> > return word1.size() <= length && word2.size() > length
>>> > || word1.size() > word2.size();
>>> > };
>>>
>>> Side note: my compiler gives me a warning about the precedence
>>> between || and &&. I happily admit I had to look it up; I've never
>>> needed to memorize that particular precedence rule.
>> ...
>>
>> But it doesn't need memorization or looking it up. The precedence between
>> || and && can't be different from the precedence between + and *.
>
> I don't immediately see a similarity between those pairs.

If you tabulate result of +, *, || and && with bool operands you get
this:

+ | false | true || | false | true
--------+-------+------ --------+-------+------
false | false | true false | false | true
true | true | true true | true | true

* | false | true && | false | true
--------+-------+------ --------+-------+------
false | false | false false | false | false
true | false | true true | false | true

You'd get the same with | and &.

The same goes for set union and intersection. In a language with those
operators, I'd expect union to have a lower precedence than
intersection. If C++'s std::set were to add them, it would be
reasonable (at least to me!) to use operator+ and operator* for them.

Slightly related: if, instead of | you take ^ (exclusive or), then ^ and
& are the additive and multiplicative operators of a finite field over
{true, false}. That field is isomorphic to the one you get using
addition and multiplication modulo 2 over {0, 1}.

--
Ben.

Jorgen Grahn

unread,
Nov 20, 2018, 4:38:31 PM11/20/18
to
Ah, ok. I recall that isomorphism better as:

0 + x == x 0 | x == x
0 * x == 0 0 & x == 0

(But again, I have no use of this knowledge.)

Öö Tiib

unread,
Nov 21, 2018, 4:12:51 AM11/21/18
to
I have memorized the precedence for ages but still find
that some "redundant" parentheses often help me to
read complex expressions.

On current case the OP predicate is not ordering
its arguments but lack of parentheses feels to make
that harder to realize.

Öö Tiib

unread,
Nov 21, 2018, 8:47:04 AM11/21/18
to
Still there seems to be confusion exactly there because logic of
your predicate does not follow logic that your describe in prose.
Also there seem to be difference *exactly* in parentheses
misplaced.

That is what your code does:

return word1.size() <= length && word2.size() > length
|| word1.size() > word2.size();

That seems to be what you describe in prose:

return word1.size() <= length && (word2.size() > length
|| word1.size() > word2.size());

Notice the parentheses.

Tim Rentsch

unread,
Nov 24, 2018, 2:43:57 PM11/24/18
to
Jorgen Grahn <grahn...@snipabacken.se> writes:
> On Tue, 2018-11-20, Paul wrote:

>> [..regarding the precedence of && and || operators..]

>> Surely everyone knows that
>> true || false && false means true || (false && false) which == true.
>
> 1. Not everyone; see above. I honestly don't know, and I've been
> programming in C for almost thirty years. (I could learn the rule,
> but I have never needed to.)
>
> 2. I don't seem to be extreme; I don't use more parentheses than many
> others. (Although I prefer to introduce temporaries for complex
> expressions.)
>
> 3. Enough people seem to feel that way, or gcc wouldn't warn.

I have a different reaction. Certainly the attitude you describe
is common. That is regrettable, especially for these operators,
which are placed where many or most programmers would expect them
to be, following a well-established mathematical custom going back
at least to the 1800's. Moreover these particular operators have
the same relative precedence in essentially all programming
languages (the only exceptions I'm aware of are APL and Smalltalk,
which have just one precedence level for all binary operators).
[Note: strictly speaking I believe neither APL nor Smalltalk has
short-circuiting binary logical operators, but I'm glossing over
that aspect.]

More generally, this sentiment appears to be a holdover from a
time 60 years ago, when compilers, languages, and documentation
could not be trusted to be accurate or correct. To me it seems
like the same attitude that produces sloppy code, habits like
making arrays "a little bigger" than they need to be. The C
language has had exactly the same operator ordering for 40 years
(and maybe more, but K&R came out in 1978). Anyone who is capable
enough to be a competent programmer can easily learn those rules
in about 15 minutes. I believe the programming community would do
well to adopt the attitude that any serious C programmer should be
expected to have learned C's precedence rules (and similarly for
C++, at least for those parts where C and C++ are the same).

It could be helpful to have a "meeting-of-the-minds" discussion to
decide where the dividing line should be for which aspects should
be expected to be known and which are more esoteric. To me though
the relative precedences of relational/equality operators, the &&
operator, and the || operator, are clearly and emphatically well
inside the line of what should be expected of any serious C or C++
programmer.

Ben Bacarisse

unread,
Nov 24, 2018, 7:22:57 PM11/24/18
to
Tim Rentsch <t...@alumni.caltech.edu> writes:

> Jorgen Grahn <grahn...@snipabacken.se> writes:
>> On Tue, 2018-11-20, Paul wrote:
>
>>> [..regarding the precedence of && and || operators..]
>
>>> Surely everyone knows that
>>> true || false && false means true || (false && false) which == true.
>>
>> 1. Not everyone; see above. I honestly don't know, and I've been
>> programming in C for almost thirty years. (I could learn the rule,
>> but I have never needed to.)
>>
>> 2. I don't seem to be extreme; I don't use more parentheses than many
>> others. (Although I prefer to introduce temporaries for complex
>> expressions.)
>>
>> 3. Enough people seem to feel that way, or gcc wouldn't warn.
>
> I have a different reaction. Certainly the attitude you describe
> is common. That is regrettable, especially for these operators,
> which are placed where many or most programmers would expect them
> to be, following a well-established mathematical custom going back
> at least to the 1800's. Moreover these particular operators have
> the same relative precedence in essentially all programming
> languages (the only exceptions I'm aware of are APL and Smalltalk,
> which have just one precedence level for all binary operators).

Curiously, they are reversed in Icon, a fact I post simply for the
novelty value.

<snip>
--
Ben.

Vir Campestris

unread,
Nov 25, 2018, 4:57:19 PM11/25/18
to
On 24/11/2018 19:43, Tim Rentsch wrote:
> Anyone who is capable
> enough to be a competent programmer can easily learn those rules
> in about 15 minutes.

I'm afraid I'm going to disagree with you.

I don't have the kind of mind that can do rote learning. And it is that
- there's no logic that says even that multiply takes precedence over
add. It's just tradition.

I think I am a fairly competent programmer. I'm still working on code
for embedded devices which ship in the millions, 40 years after I first
hacked some Fortran together.

And I feel that writing (a & b) ^ c is clearer than leaving the brackets
out. It's obvious what is meant, and I don't have to think about it.

Andy

Jorgen Grahn

unread,
Nov 25, 2018, 5:27:37 PM11/25/18
to
On Sun, 2018-11-25, Vir Campestris wrote:
> On 24/11/2018 19:43, Tim Rentsch wrote:
>> Anyone who is capable
>> enough to be a competent programmer can easily learn those rules
>> in about 15 minutes.
>
> I'm afraid I'm going to disagree with you.
...
> And I feel that writing (a & b) ^ c is clearer than leaving the brackets
> out. It's obvious what is meant, and I don't have to think about it.

Or like I mentioned upthread, what's usually clearer with && and || is
to introduce at least one temporary boolean with a well-chosen name.

Tim Rentsch

unread,
Dec 10, 2018, 7:08:56 AM12/10/18
to
That's interesting, I didn't know that.

Although, isn't it the case that & and | in Icon are not really
the same as boolean "and" and "or" operations, because of Icon's
semantics? Perhaps the distinctive semantics in Icon of these
operators motivates a different ordering.

The question notwithstanding, the point is still noteworthy.

Ben Bacarisse

unread,
Dec 10, 2018, 3:17:51 PM12/10/18
to
I think that's probably the case. The unusual semantics might well lead
to some patterns where the opposite precedence is warranted, but you
would need to confirm that with someone who knows idiomatic Icon.

> The question notwithstanding, the point is still noteworthy.

--
Ben.

Tim Rentsch

unread,
Dec 11, 2018, 11:18:34 AM12/11/18
to
Vir Campestris <vir.cam...@invalid.invalid> writes:

> On 24/11/2018 19:43, Tim Rentsch wrote:
>
>> Anyone who is capable
>> enough to be a competent programmer can easily learn those rules
>> in about 15 minutes.
>
> I'm afraid I'm going to disagree with you.

I am interested to explore your response. First a few incidental
comments [I moved the important second paragraph to the end].

> I think I am a fairly competent programmer. I'm still working on
> code for embedded devices which ship in the millions, 40 years
> after I first hacked some Fortran together.

I take it for granted that you are a competent programmer, based
on your writings in the newsgroup here.

> And I feel that writing (a & b) ^ c is clearer than leaving the
> brackets out. It's obvious what is meant, and I don't have to
> think about it.

I don't necessarily agree, but that is a separate topic which
I would like to postpone for the moment to help simplify the
discussion.

> [the moved paragraph]
> I don't have the kind of mind that can do rote learning. And it
> is that - there's no logic that says even that multiply takes
> precedence over add. It's just tradition.

I didn't mean to imply rote learning. I too am not the sort of
person who does well at learning by rote. I wouldn't have made
the statement unless I thought C's precedence rules carry a
lot of structure that makes them much easier to learn than if
they were random.

Your comment about multiply/add suggests you think the choice
was arbitrary. Do you think it was arbitrary? I am skeptical
of that hypothesis.

I think any good discussion of this subject should start with
some data rather than just anecdotal evidence. Here is a self
test for associativity and precedence of binary operators
(associativity is always the same operator twice, precedence is
always different operators though they may have the same
"precedence level"). The question in each case is how is the
unparenthesized expression parsed, or in other words where would
the implied parentheses go? Some cases may be syntax errors.
Without consulting any reference (pop quiz!), for what fraction
of the cases does the taker feel confident in their answer?
I hope other people who are interested in this will do the
self-test also. (Yes I have gone through and marked it up
myself.) Needless to say the ordering and handedness of the
examples has been scrambled.



Associativity:

a / b / c
a & b & c
a > b > c
a && b && c
a != b != c
a % b % c
a + b + c
a || b || c
a ^ b ^ c
a = b = c
a == b == c
a >> b >> c
a | b | c
a << b << c
a <= b <= c
a >= b >= c
a , b , c
a - b - c
a * b * c
a < b < c



Precedence (two columns):

a < b <= c a >> b == c
a ^ b , c a % b ^ c
a - b >> c a % b != c
a >= b % c a >= b || c
a == b || c a ^ b = c
a <= b , c a - b || c
a , b + c a < b | c
a == b > c a + b <= c
a / b % c a - b | c
a && b & c a << b || c
a % b > c a + b ^ c
a > b && c a >= b , c
a || b >> c a << b - c
a * b >= c a , b < c
a || b <= c a + b || c
a >> b = c a % b - c
a % b + c a & b / c
a / b >> c a < b << c
a >> b && c a >= b && c
a > b || c a > b >= c
a / b == c a * b = c
a != b == c a == b = c
a && b - c a == b < c
a > b <= c a + b / c
a && b % c a , b & c
a > b / c a != b - c
a - b * c a = b - c
a * b < c a >= b == c
a && b ^ c a + b == c
a == b % c a * b % c
a / b | c a <= b << c
a * b , c a | b > c
a ^ b & c a & b >> c
a & b || c a - b & c
a - b , c a < b && c
a - b > c a / b >= c
a + b | c a && b <= c
a / b && c a <= b >= c
a ^ b | c a < b >= c
a != b < c a << b / c
a - b / c a >= b != c
a == b , c a << b = c
a = b >= c a & b >= c
a != b * c a * b & c
a > b < c a != b << c
a * b || c a ^ b << c
a << b | c a != b & c
a - b >= c a - b == c
a >> b * c a < b + c
a , b || c a | b == c
a != b || c a != b && c
a && b || c a = b && c
a < b % c a = b != c
a ^ b <= c a <= b | c
a | b >= c a ^ b / c
a << b + c a = b , c
a + b = c a > b * c
a >= b + c a == b * c
a || b < c a >> b != c
a & b <= c a >> b >= c
a >> b | c a <= b = c
a , b && c a << b == c
a | b % c a % b || c
a & b > c a >> b , c
a < b >> c a / b != c
a % b = c a != b > c
a < b & c a + b > c
a ^ b >> c a > b ^ c
a || b = c a >= b ^ c
a , b > c a << b , c
a / b < c a > b << c
a == b <= c a ^ b == c
a % b << c a && b << c
a / b <= c a + b && c
a = b / c a | b * c
a ^ b * c a < b = c
a & b + c a = b | c
a || b ^ c a = b > c
a * b / c a = b & c
a + b * c a / b || c
a == b && c a | b != c
a >> b > c a >> b % c
a || b | c a , b | c
a , b != c a < b ^ c
a % b & c a != b <= c
a + b >> c a >> b << c
a >= b << c a - b < c
a | b & c a - b + c
a != b ^ c a ^ b - c
a - b <= c a * b <= c
a * b && c a , b % c
a & b << c a * b << c
a | b && c a & b == c
a >> b <= c a <= b % c
a != b + c a / b , c

Tim Rentsch

unread,
Dec 11, 2018, 11:35:32 AM12/11/18
to
Jorgen Grahn <grahn...@snipabacken.se> writes:

> On Sun, 2018-11-25, Vir Campestris wrote:
>
>> On 24/11/2018 19:43, Tim Rentsch wrote:
>>
>>> Anyone who is capable
>>> enough to be a competent programmer can easily learn those rules
>>> in about 15 minutes.
>>
>> I'm afraid I'm going to disagree with you.
>
> ...
>
>> And I feel that writing (a & b) ^ c is clearer than leaving the brackets
>> out. It's obvious what is meant, and I don't have to think about it.
>
> Or like I mentioned upthread, what's usually clearer with && and || is
> to introduce at least one temporary boolean with a well-chosen name.

I agree that it can be better to use a temporary variable (always
suitably named, of course) in such cases. But I wouldn't go so
far as to say "usually". Adding a new variable can make things
worse rather than better in many cases. The question is not so
simple that a single general rule applies.
0 new messages