bool std::string::operator==(const string& lhs, const string& rhs)
lhs.compare(rhs) == 0
std::string(1000000, 'a');
std::string(1000001, 'a');
It looks like the definition of
bool std::string::operator==(const string& lhs, const string& rhs)
is specified to be
lhs.compare(rhs) == 0
which again is in practice specified as memory comparison with character count min(lhs.size(), rhs.size()). So in effect it seems that the standard requires a potentially linear time comparison even when one could determine the result in significant number of cases by simply checking the sizes.
If this interpretation is correct, should also the next standard have it specified this way? If not, is a defect report the right way to proceed? Also string_view::operator== is specified similarly.
so it is likely not a defect in the standard, but just a Qualify of Implementation thing.
Bo Persson
--
--- You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
The standard isn't to be taken that literally, just that the compiler should produce the same result as-if it uses
return lhs.compare(rhs)==0;
I just can't see where traits::compare can affect the result of different length strings.
On 2016-11-14 21:35, Nevin Liber wrote:
> On Mon, Nov 14, 2016 at 1:54 PM, Bo Persson <b...@gmb.dk
> <mailto:b...@gmb.dk>> wrote:
>
> The standard isn't to be taken that literally, just that the
> compiler should produce the same result as-if it uses
>
> return lhs.compare(rhs)==0;
>
>
> The problem is that has to go through traits::compare, which may be
> user-defined and weird, so you have to perform the operation as
> specified. Sure, you could make the argument that a sufficiently high
> quality implementation can apply the as-if rule (compiler magic,
> specializations, etc.) for types fully specified by the standard, but
> I'd much rather have LWG make that determination. It does not hurt to
> file an issue and get either a definitive answer that this is
> intentional (and why) or a change in specification. The question is
> certainly legitimate.
I just can't see where traits::compare can affect the result of
different length strings.
#include <string>
#include <iostream>
bool bCompareCalled = false;
struct TestTraits : public std::char_traits<char>
{
static int compare(const char* p0, const char* p1, size_t n) { bCompareCalled = true; return std::char_traits<char>::compare(p0, p1, n); }
};
int main()
{
typedef std::basic_string<char, TestTraits> String;
const bool bEqual = String(1, 'a') == String(2, 'a');
std::cout << bEqual << ": " << ((bCompareCalled) ? "called" : "not called");
return 0;
}
VC2015 : 0: called
VC2017RC : 0: not called
GCC 4.6 : 0: called
GCC 5.2 : 0:
called
GCC 6.1 : 0:
called
clang 3.8 : 0:
called
clang 3.8 C++11: 0:
not called
clang 3.8 C++14: 0:
not called
clang 3.8 C++17: 0:
not called
Thanks for the responses. I do hope the interpretation that 'the standard isn't to be taken that literally' is not the solution here, because at least for a non-expert the wording seems to be very clear and if one can't trust even that kind of specification and instead is expected to understand how to do a related as-if analysis, things get hard. And given the result one currently gets from actual implementations, it seems that the interpretation varies even among C++ standard library implementers; see the example code below.Code:
#include <string>
#include <iostream>
bool bCompareCalled = false;
struct TestTraits : public std::char_traits<char>
{
static int compare(const char* p0, const char* p1, size_t n) { bCompareCalled = true; return std::char_traits<char>::compare(p0, p1, n); }
};
int main()
{
typedef std::basic_string<char, TestTraits> String;
const bool bEqual = String(1, 'a') == String(2, 'a');
std::cout << bEqual << ": " << ((bCompareCalled) ? "called" : "not called");
return 0;
}Results:
VC2015 : 0: called
VC2017RC : 0: not called
GCC 4.6 : 0: called
GCC 5.2 :0:
called
GCC 6.1 :0:
called
clang 3.8 :0:
called
clang 3.8 C++11:0:
not called
clang 3.8 C++14:0:
not called
clang 3.8 C++17:0:
not calledAre compilers printing 'not called' conformant?
No, they are not. The C++ standard is not a suggestion. The "as if" rule only applies if it is implemented in such a way that the user cannot tell the difference between the implementation's behavior and the standard's required behavior. As you've just demonstrated, there's a way to tell the difference. And therefore, those implementations are non-conformant.
This is why the copy elision rules in C++98/03 had to explicitly spell out the times when an implementation was allowed to elide the copy. It wasn't a case of the standard saying that a copy will happen, but implementations deciding not to do it. It was the standard explicitly permitting implementations not to do what would otherwise have been required.
A point to consider whether or not this a defect that I haven't seen mentioned yet, can't std::string's characters (well chara) possibly represent multibyte codepoints depending on locale and encoding? Such as utf-8 encoding. Sometimes MBCS may have different codepoints for the same semantic character, but may differ in byte size.
I'm certainly not an expert in encodings or MBCS (and I may be using incorrect terminology above), but thought I'd bring it up.
Regards
What is confusing here regarding the aspect of 'is this a defect to be fixed' is that this specification has gone through numerous standard library implementations and some have even seen the trouble of changing it from version that is surely literally conforming to one whose conformity is disputable, but as far as I know, no one has filed a defect report. But I'll ponder how to interpret that and decide whether to proceed with the defect report.
--
A point to consider whether or not this a defect that I haven't seen mentioned yet, can't std::string's characters (well chara) possibly represent multibyte codepoints depending on locale and encoding? Such as utf-8 encoding. Sometimes MBCS may have different codepoints for the same semantic character, but may differ in byte size.
I'm certainly not an expert in encodings or MBCS (and I may be using incorrect terminology above), but thought I'd bring it up.
I do not disagree with your assesment, but was having haunting images of when we just had string and wstring (and not the explicitly sized unicode variants), neither of which played well with pre unicode MBCS encodings.
--
What is confusing here regarding the aspect of 'is this a defect to be fixed' is that this specification has gone through numerous standard library implementations and some have even seen the trouble of changing it from version that is surely literally conforming to one whose conformity is disputable, but as far as I know, no one has filed a defect report. But I'll ponder how to interpret that and decide whether to proceed with the defect report.
Sorry for the delay; I'll let you know if I won't be filing the DR so someone else can. I couldn't find any target date by which it would be good to have the DR filed, so without better knowledge, I'll presume there's no hurry with this (i.e. won't make it to C++17). It's ok for me if someone wishes to get the DR done sooner than later by writing one him/herself.[/quote]
To update the progress, the defect report has received some tentative comments preferring to close it as NAD with rationale that the wording "returns: lhs.compare(rhs) == 0" does not specify the precise steps, only the return value (in contrast to "Effects: Equivalent to"). Given that the interpretation wasn't considered in this thread and also many of the implementations might have missed it, some of you might have a stance regarding the view or perhaps even have a clear reference to give how the standard defines such wording.
return *this == return *this = basic_string(s);