Defect in specification of std::string(_view)::operator==()?

311 views
Skip to first unread message

u97...@gmail.com

unread,
Nov 14, 2016, 1:18:17 PM11/14/16
to ISO C++ Standard - Discussion
It looks like the definition of

bool std::string::operator==(const string& lhs, const string& rhs)

is specified to be

lhs.compare(rhs) == 0

which again is in practice specified as memory comparison with character count min(lhs.size(), rhs.size()). So in effect it seems that the standard requires a potentially linear time comparison even when one could determine the result in significant number of cases by simply checking the sizes. For example comparison with operator== and strings

std::string(1000000, 'a');
std::string(1000001, 'a');

will first do (at least in VC2015, even in O2-optimized build) a 1 MB memory comparison and then realize 'oh, the strings don't even have the same length, I'll just return a false then'. Some might argue that this isn't quite the optimal comparison algorithm.

If this interpretation is correct, should also the next standard have it specified this way? If not, is a defect report the right way to proceed? Also string_view::operator== is specified similarly.

Nevin Liber

unread,
Nov 14, 2016, 2:10:28 PM11/14/16
to std-dis...@isocpp.org
On Mon, Nov 14, 2016 at 12:18 PM, <u97...@gmail.com> wrote:
It looks like the definition of

bool std::string::operator==(const string& lhs, const string& rhs)

is specified to be

lhs.compare(rhs) == 0

which again is in practice specified as memory comparison with character count min(lhs.size(), rhs.size()). So in effect it seems that the standard requires a potentially linear time comparison even when one could determine the result in significant number of cases by simply checking the sizes.
 
If this interpretation is correct, should also the next standard have it specified this way? If not, is a defect report the right way to proceed? Also string_view::operator== is specified similarly.

I believe your interpretation is correct.  Please file a defect report.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com>  +1-847-691-1404

Bo Persson

unread,
Nov 14, 2016, 2:55:45 PM11/14/16
to std-dis...@isocpp.org
The standard isn't to be taken that literally, just that the compiler
should produce the same result as-if it uses

return lhs.compare(rhs)==0;


Other compilers do a length comparison first, so it is likely not a
defect in the standard, but just a Qualify of Implementation thing.


Bo Persson



Nicol Bolas

unread,
Nov 14, 2016, 3:17:32 PM11/14/16
to ISO C++ Standard - Discussion, b...@gmb.dk

I disagree. The wording for `operator==` says that it will call `compare`. The wording for `basic_string::compare` says that it will call `traits::compare`. And since a call to that function is an observable effect, it is not possible for an implementation to avoid it. An implementation might specialize the string types for the standard traits classes, and therefore avoid calling it in those cases.

But if you use a user-provided traits type, it must be called. Always.

I don't like leaving this up to QoI. Also, let's not make things harder on the people who need user-defined traits classes, since that case cannot be optimized.

Richard Smith

unread,
Nov 14, 2016, 3:17:59 PM11/14/16
to std-dis...@isocpp.org
If they do that even in the face of a custom char_traits implementation, they would appear to be non-conforming.
 
so it is likely not a defect in the standard, but just a Qualify of Implementation thing.


   Bo Persson




--

--- You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nevin Liber

unread,
Nov 14, 2016, 3:36:07 PM11/14/16
to std-dis...@isocpp.org
On Mon, Nov 14, 2016 at 1:54 PM, Bo Persson <b...@gmb.dk> wrote:
The standard isn't to be taken that literally, just that the compiler should produce the same result as-if it uses

return lhs.compare(rhs)==0;

The problem is that has to go through traits::compare, which may be user-defined and weird, so you have to perform the operation as specified.  Sure, you could make the argument that a sufficiently high quality implementation can apply the as-if rule (compiler magic, specializations, etc.) for types fully specified by the standard, but I'd much rather have LWG make that determination.  It does not hurt to file an issue and get either a definitive answer that this is intentional (and why) or a change in specification.  The question is certainly legitimate.

Bo Persson

unread,
Nov 14, 2016, 4:42:36 PM11/14/16
to std-dis...@isocpp.org
I just can't see where traits::compare can affect the result of
different length strings.

It will be passed two pointers and the shortest length. If that
subsequence compares equal, the different lengths will make basic_string
return not equal. If traits::compare returns not equal, the basic_string
result will also be not equal.

Only if traits::compare returns equal, and the lengths are equal, will
the total result be "equal".


Bo Persson



Nevin Liber

unread,
Nov 14, 2016, 4:47:43 PM11/14/16
to std-dis...@isocpp.org
On Mon, Nov 14, 2016 at 3:41 PM, Bo Persson <b...@gmb.dk> wrote:
I just can't see where traits::compare can affect the result of different length strings.

It may have side effects.  If you cannot determine that it does not have side effects, you have to call it, or you are violating as-if.

Nicol Bolas

unread,
Nov 14, 2016, 4:53:03 PM11/14/16
to ISO C++ Standard - Discussion, b...@gmb.dk
On Monday, November 14, 2016 at 4:42:36 PM UTC-5, Bo Persson wrote:
On 2016-11-14 21:35, Nevin Liber wrote:
> On Mon, Nov 14, 2016 at 1:54 PM, Bo Persson <b...@gmb.dk
> <mailto:b...@gmb.dk>> wrote:
>
>     The standard isn't to be taken that literally, just that the
>     compiler should produce the same result as-if it uses
>
>     return lhs.compare(rhs)==0;
>
>
> The problem is that has to go through traits::compare, which may be
> user-defined and weird, so you have to perform the operation as
> specified.  Sure, you could make the argument that a sufficiently high
> quality implementation can apply the as-if rule (compiler magic,
> specializations, etc.) for types fully specified by the standard, but
> I'd much rather have LWG make that determination.  It does not hurt to
> file an issue and get either a definitive answer that this is
> intentional (and why) or a change in specification.  The question is
> certainly legitimate.

I just can't see where traits::compare can affect the result of
different length strings.

It doesn't matter. `traits::compare` is permitted to have side-effects. Therefore, if it has side-effects, then they must happen when you call `operator==`.

Victor Dyachenko

unread,
Nov 15, 2016, 5:05:15 AM11/15/16
to ISO C++ Standard - Discussion
I wouldn't consider this as a defect. It's rather an improvement request.
 

Nevin Liber

unread,
Nov 15, 2016, 5:11:11 AM11/15/16
to std-dis...@isocpp.org
And LWG can make the determination that it currently works as intended, needs someone to write a paper to get it changed, etc.

AFAIK, the discussion never came up for string_view (string was before my time).

u97...@gmail.com

unread,
Nov 17, 2016, 2:21:01 PM11/17/16
to ISO C++ Standard - Discussion
Thanks for the responses. I do hope the interpretation that 'the standard isn't to be taken that literally' is not the solution here, because at least for a non-expert the wording seems to be very clear and if one can't trust even that kind of specification and instead is expected to understand how to do a related as-if analysis, things get hard. And given the result one currently gets from actual implementations, it seems that the interpretation varies even among C++ standard library implementers; see the example code below.

Code:

#include <string>
#include <iostream>

bool bCompareCalled = false;

struct TestTraits : public std::char_traits<char>
{
   
static int compare(const char* p0, const char* p1, size_t n) { bCompareCalled = true; return std::char_traits<char>::compare(p0, p1, n); }
};

int main()
{
   
typedef std::basic_string<char, TestTraits> String;
   
const bool bEqual = String(1, 'a') == String(2, 'a');
    std
::cout << bEqual << ": " << ((bCompareCalled) ? "called" : "not called");
   
return 0;
}

Results:

VC2015         : 0: called
VC2017RC      
: 0: not called
GCC
4.6        : 0: called
GCC
5.2        : 0: called
GCC
6.1        : 0: called
clang
3.8      : 0: called
clang
3.8 C++11: 0: not called
clang
3.8 C++14: 0: not called
clang
3.8 C++17: 0: not called


Are compilers printing 'not called' conformant? If not, it's worth noting that as the standard is clear and it's easy to do a conformant implementation but libraries have still even replaced existing with nonconformant implementations, there's probably a pretty good reason why. If yes, perhaps someone has the time to write a 'How to interpret the C++ standard: primer' and tell people what they can rely on when implementing custom char_traits or when reasoning about the performance of operator==.

Nicol Bolas

unread,
Nov 17, 2016, 3:15:00 PM11/17/16
to ISO C++ Standard - Discussion, u97...@gmail.com


On Thursday, November 17, 2016 at 2:21:01 PM UTC-5, u97...@gmail.com wrote:
Thanks for the responses. I do hope the interpretation that 'the standard isn't to be taken that literally' is not the solution here, because at least for a non-expert the wording seems to be very clear and if one can't trust even that kind of specification and instead is expected to understand how to do a related as-if analysis, things get hard. And given the result one currently gets from actual implementations, it seems that the interpretation varies even among C++ standard library implementers; see the example code below.

Code:

#include <string>
#include <iostream>

bool bCompareCalled = false;

struct TestTraits : public std::char_traits<char>
{
   
static int compare(const char* p0, const char* p1, size_t n) { bCompareCalled = true; return std::char_traits<char>::compare(p0, p1, n); }
};

int main()
{
   
typedef std::basic_string<char, TestTraits> String;
   
const bool bEqual = String(1, 'a') == String(2, 'a');
    std
::cout << bEqual << ": " << ((bCompareCalled) ? "called" : "not called");
   
return 0;
}

Results:

VC2015         : 0: called
VC2017RC      
: 0: not called
GCC
4.6        : 0: called
GCC
5.2        : 0: called
GCC
6.1        : 0: called
clang
3.8      : 0: called
clang
3.8 C++11: 0: not called
clang
3.8 C++14: 0: not called
clang
3.8 C++17: 0: not called


Are compilers printing 'not called' conformant?

No, they are not. The C++ standard is not a suggestion. The "as if" rule only applies if it is implemented in such a way that the user cannot tell the difference between the implementation's behavior and the standard's required behavior. As you've just demonstrated, there's a way to tell the difference. And therefore, those implementations are non-conformant.

This is why the copy elision rules in C++98/03 had to explicitly spell out the times when an implementation was allowed to elide the copy. It wasn't a case of the standard saying that a copy will happen, but implementations deciding not to do it. It was the standard explicitly permitting implementations not to do what would otherwise have been required.

T. C.

unread,
Nov 17, 2016, 5:50:26 PM11/17/16
to ISO C++ Standard - Discussion


On Thursday, November 17, 2016 at 3:15:00 PM UTC-5, Nicol Bolas wrote:"

No, they are not. The C++ standard is not a suggestion. The "as if" rule only applies if it is implemented in such a way that the user cannot tell the difference between the implementation's behavior and the standard's required behavior. As you've just demonstrated, there's a way to tell the difference. And therefore, those implementations are non-conformant.

This is why the copy elision rules in C++98/03 had to explicitly spell out the times when an implementation was allowed to elide the copy. It wasn't a case of the standard saying that a copy will happen, but implementations deciding not to do it. It was the standard explicitly permitting implementations not to do what would otherwise have been required.
 
That said, the library may need a more relaxed version of "as if" (compare LWG 2695) or else a systematic review. Currently, many parts of the library specification are written in a way that is simpler to describe but suboptimal if used literally. If implementations have to reproduce all the behavior as depicted, we are in serious trouble - just look at the numerous places in basic_string's specification that construct temporary strings.

u97...@gmail.com

unread,
Nov 25, 2016, 7:37:39 PM11/25/16
to ISO C++ Standard - Discussion
What is confusing here regarding the aspect of 'is this a defect to be fixed' is that this specification has gone through numerous standard library implementations and some have even seen the trouble of changing it from version that is surely literally conforming to one whose conformity is disputable, but as far as I know, no one has filed a defect report. But I'll ponder how to interpret that and decide whether to proceed with the defect report.

Nathan Ernst

unread,
Nov 25, 2016, 8:21:43 PM11/25/16
to std-dis...@isocpp.org

A point to consider whether or not this a defect that I haven't seen mentioned yet, can't std::string's characters (well chara) possibly represent multibyte codepoints depending on locale and encoding? Such as utf-8 encoding. Sometimes MBCS may have different codepoints for the same semantic character, but may differ in byte size.

I'm certainly not an expert in encodings or MBCS (and I may be using incorrect terminology above), but thought I'd bring it up.

Regards


On Nov 25, 2016 6:37 PM, <u97...@gmail.com> wrote:
What is confusing here regarding the aspect of 'is this a defect to be fixed' is that this specification has gone through numerous standard library implementations and some have even seen the trouble of changing it from version that is surely literally conforming to one whose conformity is disputable, but as far as I know, no one has filed a defect report. But I'll ponder how to interpret that and decide whether to proceed with the defect report.

--

Nicol Bolas

unread,
Nov 25, 2016, 9:37:53 PM11/25/16
to ISO C++ Standard - Discussion
On Friday, November 25, 2016 at 8:21:43 PM UTC-5, Nathan Ernst wrote:

A point to consider whether or not this a defect that I haven't seen mentioned yet, can't std::string's characters (well chara) possibly represent multibyte codepoints depending on locale and encoding? Such as utf-8 encoding. Sometimes MBCS may have different codepoints for the same semantic character, but may differ in byte size.

I'm certainly not an expert in encodings or MBCS (and I may be using incorrect terminology above), but thought I'd bring it up.


If the question you're asking is "can `char_traits` represent a UTF-8 string", then the answer is "no". At least, not in accord with the rules of Unicode.

Take `char_traits::length` for example. This is used by `std::basic_string` for the expressed purpose of finding the NUL terminator of a string, not for finding out how many "characters" a string can decompose into. As such, you can't really do any MBCS stuff with it.

While `compare` could in theory implement some kind of Unicode comparison, I would say that doing so would be overall a bad idea. At least, in a vacuum. There are many forms of Unicode comparisons, and locking the form into a string's type would not be an appropriate thing to do.

As such, it is highly unlikely that any code out there relies on overloading `compare` to do any MBCS/Unicode shenanigans. I say file a DR on it, and see if anyone has a problem the fix. If people have written code that breaks in this case, they need to let us know.

Nathan Ernst

unread,
Nov 25, 2016, 9:44:32 PM11/25/16
to std-dis...@isocpp.org

I do not disagree with your assesment, but was having haunting images of when we just had string and wstring (and not the explicitly sized unicode variants), neither of which played well with pre unicode MBCS encodings.


--

Nevin Liber

unread,
Nov 25, 2016, 11:10:40 PM11/25/16
to std-dis...@isocpp.org
On Fri, Nov 25, 2016 at 6:37 PM, <u97...@gmail.com> wrote:
What is confusing here regarding the aspect of 'is this a defect to be fixed' is that this specification has gone through numerous standard library implementations and some have even seen the trouble of changing it from version that is surely literally conforming to one whose conformity is disputable, but as far as I know, no one has filed a defect report. But I'll ponder how to interpret that and decide whether to proceed with the defect report.

I don't understand the resistance to filing a defect.  If you aren't going to, please let us know so we can.

u97...@gmail.com

unread,
Dec 11, 2016, 7:59:00 AM12/11/16
to ISO C++ Standard - Discussion
Sorry for the delay; I'll let you know if I won't be filing the DR so someone else can. I couldn't find any target date by which it would be good to have the DR filed, so without better knowledge, I'll presume there's no hurry with this (i.e. won't make it to C++17). It's ok for me if someone wishes to get the DR done sooner than later by writing one him/herself.

And just to add to the MBCS point of view, it's worth noting that, unless I'm interpreting the standard incorrectly, traits::compare() can get strictly less information than compare()/operator== (traits::compare() gets min(lhs.size(), rhs.size())), so even if someone made a custom traits compare, where different sized strings could compare equal, lhs.compare(rhs) and all dependent functionality such as operator==() wouldn't work correctly.

Nicol Bolas

unread,
Dec 11, 2016, 4:19:28 PM12/11/16
to ISO C++ Standard - Discussion, u97...@gmail.com


On Sunday, December 11, 2016 at 7:59:00 AM UTC-5, u97...@gmail.com wrote:
Sorry for the delay; I'll let you know if I won't be filing the DR so someone else can. I couldn't find any target date by which it would be good to have the DR filed, so without better knowledge, I'll presume there's no hurry with this (i.e. won't make it to C++17). It's ok for me if someone wishes to get the DR done sooner than later by writing one him/herself.[/quote]

As I understand it, defects don't belong to any particular version of the standard. If defective wording in the current standard can be applied to a C++98/03 implementation, then it effectively is.

So the sooner a defect report can be filed, the better for everyone.

u97...@gmail.com

unread,
Jan 9, 2017, 5:02:00 PM1/9/17
to ISO C++ Standard - Discussion
A defect report has been submitted.

u97...@gmail.com

unread,
Feb 1, 2017, 6:52:47 PM2/1/17
to ISO C++ Standard - Discussion
To update the progress, the defect report has received some tentative comments preferring to close it as NAD with rationale that the wording "returns: lhs.compare(rhs) == 0" does not specify the precise steps, only the return value (in contrast to "Effects: Equivalent to"). Given that the interpretation wasn't considered in this thread and also many of the implementations might have missed it, some of you might have a stance regarding the view or perhaps even have a clear reference to give how the standard defines such wording.

Nicol Bolas

unread,
Feb 1, 2017, 11:11:15 PM2/1/17
to ISO C++ Standard - Discussion, u97...@gmail.com
On Wednesday, February 1, 2017 at 6:52:47 PM UTC-5, u97...@gmail.com wrote:
To update the progress, the defect report has received some tentative comments preferring to close it as NAD with rationale that the wording "returns: lhs.compare(rhs) == 0" does not specify the precise steps, only the return value (in contrast to "Effects: Equivalent to"). Given that the interpretation wasn't considered in this thread and also many of the implementations might have missed it, some of you might have a stance regarding the view or perhaps even have a clear reference to give how the standard defines such wording.

[structure.specifications]/4 outlines the specific meaning of "Equivalent to" clauses. But I don't see anything in that section that makes it clear that "Returns:" wording is supposed to be taken only as an expression of the general result rather than the means to produce that result.

There does seem to be a distinction between `compare` overloads which use the "Equivalent to" wording and which use the "Returns" wording. For example, `basic_string::compare(const charT*)` uses "Returns: compare(basic_string(s))", which would suggest that this requires allocating a new `basic_string`. Yet that's clearly wasteful.

So what they're saying is that they believe there is a difference between "returns value consistent with this expression" and "executes an expression equivalent to this, returning its value". In the case of the former, there is no requirement of the "as if" rule with regard to side effects; it only cares about the returned value. In the case of the latter, side effects should be present "as if" this were the code executed. One is spelled "Returns:", and the other is spelled "Effects: Equivalent to".

If that's the case, I'd like to see a more explicit statement to this effect in [structure.specifications].

u97...@gmail.com

unread,
Feb 2, 2017, 5:06:52 PM2/2/17
to ISO C++ Standard - Discussion
From structure.specification (N4618):

Effects: the actions performed by the function
Returns: a description of the value(s) returned by the function

From this to me it seems clear that 'returns' does not imply that the value should be obtained as written and the DR is indeed invalid. I'll probably try to see if it can be withdrawn before more people wastes time on this.

T. C.

unread,
Feb 2, 2017, 8:03:37 PM2/2/17
to ISO C++ Standard - Discussion, u97...@gmail.com
Even if that's true, it doesn't hurt for LWG to confirm this. No need to be hasty.

Moreover, I don't see how that argument works when we have basic_string::operator=(const char* s) specified as "Returns: *this = basic_­string(s)". Surely that doesn't mean the implementation can skip the assignment and just return *this!

Bo Persson

unread,
Feb 3, 2017, 8:18:19 AM2/3/17
to std-dis...@isocpp.org
On 2017-02-03 02:03, T. C. wrote:
>
>
> On Friday, February 3, 2017 at 6:06:52 AM UTC+8, u97...@gmail.com wrote:
>
> From structure.specification (N4618):
>
> /Effects: the actions performed by the function/
> /Returns: a description of the value(s) returned by the function/
>
> From this to me it seems clear that 'returns' does not imply that
> the value should be obtained as written and the DR is indeed
> invalid. I'll probably try to see if it can be withdrawn before more
> people wastes time on this.
>
>
> Even if that's true, it doesn't hurt for LWG to confirm this. No need to
> be hasty.
>
> Moreover, I don't see how that argument works when we have
> basic_string::operator=(const char* s) specified as "Returns: *this =
> basic_­string(s)". Surely that doesn't mean the implementation can skip
> the assignment and just return *this!
>

But it also cannot mean that the implementation is required to create a
temporary basic_string and then use the rvalue assignment operator to
get the new value into *this.


Bo Persson

Nicol Bolas

unread,
Feb 3, 2017, 10:05:55 AM2/3/17
to ISO C++ Standard - Discussion, b...@gmb.dk

Well, it could mean that. It would just be really silly to force such an implementation on us.

I think there needs to be a distinction between "the state of this object is changed/the return value is computed as if code X was executed" and "behaves as if code X was executed". In the former, the code is just a guide explaining the changes to the object's state or its return value. Any side effects from such code are not expected to be fulfilled. Whereas the latter means that the code must execute as if it was written exactly like that, with all apparent side effects intact.

If that is intended to be the distinction between "Returns:" and "Equivalent to", I really want to see that explained more thoroughly.

u97...@gmail.com

unread,
Feb 3, 2017, 11:38:24 AM2/3/17
to ISO C++ Standard - Discussion
In the "Returns: *this = basic_­string(s)" example I think the specification is clear: "return *this;" is not allowed because

return *this == return *this = basic_string(s);

is not generally true.

Of course one can ask new questions such as how exactly the equivalence of returned elements is defined etc., but I think that would be out of scope from the DR.

u97...@gmail.com

unread,
Jul 25, 2017, 6:22:02 PM7/25/17
to ISO C++ Standard - Discussion
To conclude this matter, the defect report has been closed as NAD with the aforementioned rationale that wording "returns: ..." does not specify the precise steps, only the return value (in contrast to "Effects: Equivalent to...").
Reply all
Reply to author
Forward
0 new messages