#include <iostream>
#include <vector>
#include <ctype.h>
bool compare_ignore_case_equals(char c1, char c2)
{
return toupper(c1) == toupper(c2);
}
bool compare_ignore_case_less(char c1, char c2)
{
return toupper(c1) < toupper(c2);
}
int main(int argc, char *argv[])
{
std::vector<std::string> args(argv + 1, argv + argc);
const char *words[] =
{
"add", "del", "new", "help"
};
std::vector<std::string> list(words, words + (sizeof words / sizeof words[0]));
std::vector<std::string>::iterator word = list.begin();
while (word != list.end())
{
std::cout << "Testing " << *word << " = " << args[0];
if (std::lexicographical_compare(
word->begin(), word->end(),
args[0].begin(), args[0].end(),
compare_ignore_case_equals))
{
std::cout << " found!\n";
break;
}
std::cout << "\n";
word++;
}
}
Here's an example:
./quick new
Testing add = new
Testing del = new found!
That simply cannot be correct, what is it that I've done wrongly? Thanks
--
http://www.munted.org.uk
Fearsome grindings.
> The short snippet below demonstrates the problem I'm having with
> std::lexicographical_compare() in that it does not reliably work!
> [code snipped]
> if (std::lexicographical_compare(
> word->begin(), word->end(),
> args[0].begin(), args[0].end(),
> compare_ignore_case_equals))
First, remove compare_ignore_case_equals and try again. You'll get
similar problems. Then read about lexicographical_compare and what its
return value means.
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)
> > if (std::lexicographical_compare(
> > word->begin(), word->end(),
> > args[0].begin(), args[0].end(),
> > compare_ignore_case_equals))
>
> First, remove compare_ignore_case_equals and try again. You'll get
> similar problems. Then read about lexicographical_compare and what
> its return value means.
I've now switched to using this:
#include <string.h>
#include <string>
inline int strcasecmp(const std::string& s1, const std::string& s2)
{
return strcasecmp(s1.c_str(), s2.c_str());
}
This leverages C++'s ability to overload functions and works better.
stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO. Some
posters have mentioned using stricmp() instead of strcasecmp(), which
happens not to be the correct answer. Why?
Why are you using compare_ignore_case_equals as your binary predicate?
--
Perfection is achieved, not when there is nothing more to add,
but when there is nothing left to take away.
-- Antoine de Saint-Exupery
> On Sun, 28 Dec 2008 09:09:32 -0500, I waved a wand and this message
> magically appears in front of Pete Becker:
>
>>> if (std::lexicographical_compare(
>>> word->begin(), word->end(),
>>> args[0].begin(), args[0].end(),
>>> compare_ignore_case_equals))
>>
>> First, remove compare_ignore_case_equals and try again. You'll get
>> similar problems. Then read about lexicographical_compare and what
>> its return value means.
>
> I've now switched to using this:
>
> #include <string.h>
> #include <string>
>
> inline int strcasecmp(const std::string& s1, const std::string& s2)
> {
> return strcasecmp(s1.c_str(), s2.c_str());
> }
>
> This leverages C++'s ability to overload functions and works better.
>
> stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO.
No, it's not. It's Unix, if I remeber correctly. But I think I didn't
make my point clearly enough. The problem isn't fundamentally in the
predicate. So drop the predicate and use the default predicate until
you understand what lexicographical_compare does.
> > This leverages C++'s ability to overload functions and works better.
> >
> > stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO.
>
> No, it's not. It's Unix, if I remeber correctly. But I think I didn't
> make my point clearly enough. The problem isn't fundamentally in the
> predicate. So drop the predicate and use the default predicate until
> you understand what lexicographical_compare does.
strcasecmp() is actually defined in the POSIX standards. But I will
look again at std::lexicograpical_compare() when I get some time. The
program works well enough with strcasecmp().
Actually, it looks more like it leverages C++'s ability to cause a stack
overflow due to infinite recursion. strcasecmp isn't part of ISO C++, so
on plenty of compilers, this function will simply call itself.
> stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO. Some
> posters have mentioned using stricmp() instead of strcasecmp(), which
> happens not to be the correct answer. Why?
As far as I can tell, neither are part of standard C++.
> > First, remove compare_ignore_case_equals and try again.
> > You'll get similar problems. Then read about
> > lexicographical_compare and what its return value means.
> I've now switched to using this:
> #include <string.h>
> #include <string>
> inline int strcasecmp(const std::string& s1, const std::string& s2)
> {
> return strcasecmp(s1.c_str(), s2.c_str());
> }
> This leverages C++'s ability to overload functions and works
> better.
> stricmp() isn't standard whilst strcasecmp() is standard
> ANSI/ISO.
It's not present in any version of the standard I have handy
(C++98, C99, and the latest C++ draft). The standard C++
functionnal object for comparing strings in a locale dependent
way is std::locale (which has an operator() which does exactly
what is needed for lexicographical_compare). And as any
comparisons involved case are locale sensitive, it's really what
you need, e.g.:
if ( std::lexicographical_compare(
word->begin(), word->end(),
args[ 0 ].begin(), args[ 0 ].end(),
std::locale() ) ) {...}
(or std::locale( "xxx" ), with whatever locale you want).
> Some posters have mentioned using stricmp() instead of
> strcasecmp(), which happens not to be the correct answer.
> Why?
Neither are the correct answer, since neither are standard
C/C++. (strcasecmp is defined in Posix, but not very well: "In
the POSIX locale, [...]. The results are unspecified in other
locales." So unless you happen to live in POSIX, it's not very
useful.)
--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Just a reminder, but this is, of course, undefined behavior.
> }
> bool compare_ignore_case_less(char c1, char c2)
> {
> return toupper(c1) < toupper(c2);
As is this.
> }
(I've addressed the other issues in another posting.)
> > This leverages C++'s ability to overload functions and works
> > better.
>
> Actually, it looks more like it leverages C++'s ability to cause a
> stack overflow due to infinite recursion. strcasecmp isn't part of
> ISO C++, so on plenty of compilers, this function will simply call
> itself.
As this snippet below shows, you're actually correct.
#include <iostream>
#include <string>
int hahaha(const std::string& s1, const std::string& s2)
{
return hahaha(s1.c_str(), s2.c_str());
}
int main()
{
std::string s1 = "hahaha";
std::string s2 = "HAHAHA";
if (hahaha(s1, s2) == 0)
std::cout << "Equal!\n";
return 0;
}
> > stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO.
> > Some posters have mentioned using stricmp() instead of
> > strcasecmp(), which happens not to be the correct answer. Why?
>
> As far as I can tell, neither are part of standard C++.
Yes, at some point in time I'm going to have to change to
std::lexicographical_compare, or is there anything else I can try for
case insensitive compares on std::string objects?
>
> Yes, at some point in time I'm going to have to change to
> std::lexicographical_compare,
Let me renew my suggestion that you read about what the return value of
lexicographical_compare means before you do this.
operator() of std::locale works on strings by itself. You could use
operator() directly:
/* true, if word < args[0] */
if ( std::locale()(word, args[0]) ) {...}
But does std::locale()() really compare case insensitive?
--
Thomas
The answer to that is a definite maybe. It does (or it should)
in locales where case insensitive comparison makes sense. And
it does so correctly, matching "Straße" and "STRASSE" (or
"ändern" and "Aendern", in Switzerland, but not in Germany).
And "I" and "i" won't compare equal in a Turkish locale. Since
the "C" locale is designed for parsing C code, and the POSIX
locale for working in a Posix environment (including the file
systems and filenames), the comparison in those locales will NOT
be case insensitive.
And of course, you can always define your own locale. (At
least, that's what it says. In practice, it takes a pretty high
level of C++ competence to do it reliably. More than I have, at
any rate.)
#include <locale>
struct compare_ignore_case_equals
{
compare_ignore_case_equals(const std::locale& loc_ = std::locale())
: loc(loc_) {}
bool operator()(char c1, char c2) const
{
return std::tolower(c1, loc) == std::tolower(c2, loc);
}
private:
std::locale loc;
};
How about this? Doesn't depend on users locale, you can provide your own
locale, and isn't UB.
Why does ::toupper actually take an int?
--
Thomas
If you want to parse commands case insensitivly, like in a shell, script
interpreter or text based protocoll, a maybe isn't enough.
> And of course, you can always define your own locale. (At
> least, that's what it says. In practice, it takes a pretty high
> level of C++ competence to do it reliably. More than I have, at
> any rate.)
Then it would be easier to build a comparision predicate with
std::toupper/tolower as I showed else-thread.
What do people do for multibyte encodings like UTF-8?
--
Thomas
See particularly Eric Sosman's response to the OP here (message #2):
http://groups.google.com/group/comp.lang.c/browse_frm/thread/3b27e652f1a7ab32
The other immediate responses to the OP are also informative.
Jason
Still doesn't work with lexicographical_compare...
Replace the == with < and you've got the ordering predicate needed for
lexicographical_compare.
--
Thomas
> > On Dec 29, 3:10 pm, "Thomas J. Gritzan" <phygon_antis...@gmx.de>
> [...]
> >> But does std::locale()() really compare case insensitive?
> > The answer to that is a definite maybe. [...]
> If you want to parse commands case insensitivly, like in a
> shell, script interpreter or text based protocoll, a maybe
> isn't enough.
The problem is that case insensitive comparison is locale
dependent. So of course, you have to involve the locale
somehow. But yes, there is a gap between literal comparison
(all bytes equal) and locale dependent colating (which can
involve a number of things, e.g. "é" compares equal to "E", "ä"
collates as "ae", etc. And there's no real support for anything
between these two extremes in the language (either C or C++).
> > And of course, you can always define your own locale. (At
> > least, that's what it says. In practice, it takes a pretty
> > high level of C++ competence to do it reliably. More than I
> > have, at any rate.)
> Then it would be easier to build a comparision predicate with
> std::toupper/tolower as I showed else-thread.
Probably:-). You have to define what equality actually means
first (e.g. does "ß" compare equal to "SS"), but for things like
filenames and interpreter commands, you're often limited to a
small set of characters where the definition isn't too
difficult. (This is becoming less and less true with regards to
filenames, of course.)
> What do people do for multibyte encodings like UTF-8?
A lot of hand written code:-). In practice, you can't count on
the present of a UTF-8 locale, and you can't count on it working
right if it's present. Note too that anything case insensitive
will still be locale dependent, even if you limit it to UTF-8;
in practice, if you want case insensitivity over the full
Unicode range, you have a lot of defining to do (although the
Unicode Consortium data files help a lot).
> #include <locale>
> private:
> std::locale loc;
> };
I'm not sure what you mean by "doesn't depend on the user's
locale". The constructor std::locale() creates a copy of the
current global locale, which if you're writing library code, is
unknown, but which will usually be the user's locale, since the
very first action in most main functions is to set the global
locale to "".
> Why does ::toupper actually take an int?
So that things like:
for ( int ch = getchar() ; isspace( ch ) ; ch = getchar() )
...
work. It is defined for EOF, as well as all of the values in
the range 0...UCHAR_MAX. (The reason for toupper, of course, is
coherence---all of the functions in <ctype.h> take the same type
of argument.) It's a useful idiom; I still use it a lot (not
with ::toupper, etc., but with some of my own stuff).
The real question is why plain char is allowed to be signed, if
it is intended to contain "characters". I don't know of any
character encoding which uses negative values.
> > > > > The short snippet below demonstrates the problem I'm having with
> > > > > std::lexicographical_compare() in that it does not reliably work!
> > > > >
> > > > > #include <iostream>
> > > > > #include <vector>
> > > > > #include <ctype.h>
> > > > >
> > > > > bool compare_ignore_case_equals(char c1, char c2)
> > > > > {
> > > > > return toupper(c1) == toupper(c2);
> > > > Just a reminder, but this is, of course, undefined behavior.
> > >
> > > #include <locale>
> > >
> > > struct compare_ignore_case_equals
> > > {
> > > compare_ignore_case_equals(const std::locale& loc_ = std::locale())
> > > : loc(loc_) {}
> > >
> > > bool operator()(char c1, char c2) const
> > > {
> > > return std::tolower(c1, loc) == std::tolower(c2, loc);
> > > }
> > >
> > > private:
> > > std::locale loc;
> > > };
> > >
> > > How about this? Doesn't depend on users locale, you can provide your own
> > > locale, and isn't UB.
It also won't work reliably for all languages. Personally I don't think
anything will work reliably for all languages. A programmer is better
off IMHO to ignore locals and the "upper" and "lower" functions in
<cctype>, and write his own code that works with the languages he has to
deal with.
> > Still doesn't work with lexicographical_compare...
>
> Replace the == with < and you've got the ordering predicate needed for
> lexicographical_compare.
You might want to look at the OPs question again. His complaint (as can
be seen by the subject line) was that "lexicographical_compare with
ignore case *equality* doesn't always work." [stress added] Think about
that sentence for a second... :-)
If the OP hasn't already figured it out, lexicographical_compare isn't
*designed* to work with equality functors in the first place.
> > Replace the == with < and you've got the ordering predicate needed
> > for lexicographical_compare.
>
> You might want to look at the OPs question again. His complaint (as
> can be seen by the subject line) was that "lexicographical_compare
> with ignore case *equality* doesn't always work." [stress added]
> Think about that sentence for a second... :-)
>
> If the OP hasn't already figured it out, lexicographical_compare
> isn't *designed* to work with equality functors in the first place.
[pained grin]
Yeah.
Perhaps this should be a FAQ: How do we do a case insensitive equality
compare on std::string values?
> > > > #include <locale>
> > > > struct compare_ignore_case_equals
> > > > {
> > > > compare_ignore_case_equals(const std::locale& loc_ = std::locale())
> > > > : loc(loc_) {}
> > > > bool operator()(char c1, char c2) const
> > > > {
> > > > return std::tolower(c1, loc) == std::tolower(c2, loc);
> > > > }
> > > > private:
> > > > std::locale loc;
> > > > };
> > > > How about this? Doesn't depend on users locale, you can
> > > > provide your own locale, and isn't UB.
> It also won't work reliably for all languages. Personally I
> don't think anything will work reliably for all languages. A
> programmer is better off IMHO to ignore locals and the "upper"
> and "lower" functions in <cctype>, and write his own code that
> works with the languages he has to deal with.
It's supposed to work reliably for all supported locales. (A
locale is more than just a language.) Which is sort of vague:
the standard doesn't make any requirements with regards to what
locales are supported (other than "C"), and it leaves the
definition as to what the behavior is in a given locale
"implementation defined".
If you're targetting a single compiler, for a single locale or a
small set of locales, and that compiler provides them, and they
behave "correctly" (for your definition of "correctly"), there's
no problem with using locales for this. Otherwise, you're
right: it can be a bit tricky.
Why? It's easy enough to find on Google already. Here is a good
article discussing all of the issues with proposed solutions, which
everybody involved in this thread should read:
http://lafstern.org/matt/col2_new.pdf
It was linked to from GCC's page on case-insensitive strings:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt05ch13s02.html
Which was linked to in a forum post in the first Google result for
"std string case insensitive compare":
http://bytes.com/groups/c/489747-lowercase-std-string-compare
Although it did require a bit of poking around on gcc.gnu.org since
the link in the forum post was actually broken.
Jason
As this thread, and every other thread/article on the subject shows, it
is a rather complex subject. Pretty much any subject that deals with
natural language is.
I suggest you don't perform case insensitive compares in your code.
Thanks for all that, I'd already seen some of these pages.
> > Perhaps this should be a FAQ: How do we do a case insensitive
> > equality compare on std::string values?
>
> As this thread, and every other thread/article on the subject shows,
> it is a rather complex subject. Pretty much any subject that deals
> with natural language is.
>
> I suggest you don't perform case insensitive compares in your code.
Seems a lot of thought has gone into designing the STL libraries. I've
just been playing with std::locale and std::locale::global, with
currencies. I can see how useful this can be in cojunction with glibc.