False Positives From String Comparison using string.Equals()

Smithers

unread,

Jun 20, 2009, 2:29:50 AM6/20/09

to

I am using the .Equals() method of the string class, as follows, to compare
string properties of two different instances of a given class.

if
(!valuesFromUI.Other_Description.Equals(valuesAsRetrievedFromDB.Other_Description,
StringComparison.OrdinalIgnoreCase))
{
return true;
}

What I am observing in testing is that the above .Equals() periodically
finds that the string values are different when they do not in fact appear
to be different in any way. By "appear to be different in any way" I mean
that I hover the mouse over each property and view in the "Text Visualizer",
then copy from there into Notepad and compare the values in notepad - they
appear to be the exact same string.

The bigger picture is this: I'm using Visual Studio Pro 2008. This is
happening in an ASP.NET Web application. I instantiate the class and
populate it with values from the underlying SQL Server 2005 database, then
load those properties into the UI, and then store the object instance in a
Session variable. Then, when the user navigates to a different page, I
instantiate a new instance of the class and populate it with values from the
UI. I then compare relevant properties of that instance - as in the sample
code above - against the same properties from the instance that was
previously stored in Session state. I do this to determine if the user
actually modified any values from the underlying database and therefore
whether to save any changes to the database.

How can I compare the strings in a way that would not result in such "false
positives"? I would think my approach, above, is perfectly fine. What can I
do differently?

Thanks.

Jeroen Mostert

unread,

Jun 20, 2009, 6:15:00 AM6/20/09

to

Smithers wrote:
> I am using the .Equals() method of the string class, as follows, to compare
> string properties of two different instances of a given class.
>
>
> if
> (!valuesFromUI.Other_Description.Equals(valuesAsRetrievedFromDB.Other_Description,
> StringComparison.OrdinalIgnoreCase))
> {
> return true;
> }
>
> What I am observing in testing is that the above .Equals() periodically
> finds that the string values are different when they do not in fact appear
> to be different in any way. By "appear to be different in any way" I mean
> that I hover the mouse over each property and view in the "Text Visualizer",
> then copy from there into Notepad and compare the values in notepad - they
> appear to be the exact same string.
>

But are they? Your test is not definitive. Whitespace is not distinguishable
visually and there are plenty of visible characters who share glyphs.

For starters, compare the string *lengths*. If those are the same, go into
the immediate window and check if Encoding.UTF8.GetBytes() yields the same
result on both strings. If that's the case and the strings still don't
compare equal then you might've found a genuine bug in the framework, but
that's very unlikely.

> The bigger picture is this: I'm using Visual Studio Pro 2008. This is
> happening in an ASP.NET Web application. I instantiate the class and
> populate it with values from the underlying SQL Server 2005 database

If you're using CHAR fields (fixed length, whitespace padding,
encoding-specific) or VARCHAR fields (encoding-specific) as opposed to
NVARCHAR fields, this could be the cause of your strings not round-tripping.

--
J.

Michael Covington

unread,

Jun 20, 2009, 11:55:27 AM6/20/09

to

Michael Covington wrote:
> Is there any way you could capture and post an example of two strings
> that compare different but do not appear different?
>
> Or unpack them into their component bytes and display them in hex? (Some
> other C# programmer who has the method at his fingertips will, I hope,
> chime in with an exact method of doing that.)

I'll chime in myself. Do this:

string s = "Hello, world!"; // just an example

foreach (char c in s) Console.WriteLine("{0:X4}", (int)c);

That gives you 4 hex digits (sufficient for Unicode).

Michael Covington

unread,

Jun 20, 2009, 11:34:25 AM6/20/09

to

Peter Duniho

unread,

Jun 20, 2009, 12:40:09 PM6/20/09

to

On Fri, 19 Jun 2009 23:29:50 -0700, Smithers <A...@b.com> wrote:

> I am using the .Equals() method of the string class, as follows, to
> compare
> string properties of two different instances of a given class.
>
> if
> (!valuesFromUI.Other_Description.Equals(valuesAsRetrievedFromDB.Other_Description,
> StringComparison.OrdinalIgnoreCase))
> {
> return true;
> }
>
> What I am observing in testing is that the above .Equals() periodically
> finds that the string values are different when they do not in fact
> appear
> to be different in any way.

First, that's a "false negative", not a "false positive". Just because
you translate the negative into it's opposite, that doesn't mean that the
string comparison itself is generating a "false positive". The first step
in getting a good answer to a question is asking a good question. :)

Second, I am confident that if the Equals() method returns "false", then
the strings are not the same according to the comparison criteria you've
provided to it. As Jeroen and Michael both point out, there are lots of
ways two strings could _look_ the same but would not actually be the same.

If you want to know specifically how they are different, you can retrieve
the string encoded as bytes using the Encoding.GetBytes() method. If you
use the UTF-16 encoding, you should wind up with exactly the same bytes
.NET is using; note that in any case, you need to use an encoding that can
represent your strings exactly, otherwise the act of encoding the string
as bytes may actually cause them to appear to be the same even though they
are not (which is one reason that UTF-16 is probably best for this
exercise).

Of course, all that will do is show you that yes, the strings are
different, which in fact should be no question anyway. As far as figuring
out why the strings are different even when you expect them not to be,
that depends a lot on where the strings come from. Without a
concise-but-complete code example that reliably demonstrates the problem,
there's no way to answer that question.

Pete

unread,

Jun 22, 2009, 1:05:44 PM6/22/09

to

On Mon, 22 Jun 2009 07:37:36 -0700, Cor Ligthert[MVP]
<Notmyfi...@planet.nl> wrote:

>
> Hans,
>
> Most people here are clever enough that they know that it is possible to
> write, I take mostly the shortest notation when I want to show something.
>
> ("A".ToUpper() == "a".ToUpper())

That is also not a correct way to accomplish a case-insensitive
comparison. There are languages in which converting by case doesn't
produce the same results.

MSDN has an article that elaborates on the question somewhat:
http://msdn.microsoft.com/en-us/library/xk2wykcz(VS.100).aspx

The basic issue is that case-conversion can produce characters that might
compare equal as ordinals, but which aren't considered the same characters
in a particular culture.

Pete

Göran Andersson

unread,

Jun 23, 2009, 4:01:33 AM6/23/09

to

Cor Ligthert[MVP] wrote:
> Hans,
>
> Most people here are clever enough that they know that it is possible to
> write, I take mostly the shortest notation when I want to show something.
>
> ("A".ToUpper() == "a".ToUpper())
>
> Cor
>

Why would you process the strings just to use one comparison, when there
is another comparison that does exactly what you want without first
processing the strings?

--
Gï¿œran Andersson
_____
http://www.guffa.com

Peter Duniho

unread,

Jun 23, 2009, 1:20:19 PM6/23/09

to

On Tue, 23 Jun 2009 01:01:33 -0700, Göran Andersson <gu...@guffa.com>
wrote:

>
> Cor Ligthert[MVP] wrote:
>> Hans,
>> Most people here are clever enough that they know that it is possible
>> to write, I take mostly the shortest notation when I want to show
>> something.
>> ("A".ToUpper() == "a".ToUpper())
>> Cor
>>
>
> Why would you process the strings just to use one comparison, when there
> is another comparison that does exactly what you want without first
> processing the strings?

Well, if the comparison were valid, the argument in favor would be that it
makes the code simpler and more obvious.

Performance is almost never more important than maintenance and
readability.

The main thing wrong with his example isn't that it's less efficient, it's
that it also may produce the wrong answer. I like my code _correct_,
maintainable, and efficient. In that order. :)

Pete

Cor Ligthert[MVP]

unread,

Jun 24, 2009, 8:55:03 AM6/24/09

to

Goran,

Most people are clever enough to know what is done behind the scene.

Strings to Upper doesn't process the string, it only takes a subset of the
ascii code, it does not work by instance with the Turkish uppercase i.

Cor

"Gï¿œran Andersson" <gu...@guffa.com> wrote in message
news:%23WtvSj9...@TK2MSFTNGP04.phx.gbl...

Jeroen Mostert

unread,

Jun 24, 2009, 1:23:52 PM6/24/09

to

Cor Ligthert[MVP] wrote:
> Most people are clever enough to know what is done behind the scene.
>

Those people shouldn't propagate techniques they should know are bad.

> Strings to Upper doesn't process the string,

What Gï¿œran meant is that .ToUpper() creates a new string, which is then
compared. Using .Equals() with a StringComparison saves you a string
creation (or two) because it compares character-by-character (and it skips
the comparison altogether if the lengths differ).

In short: never use .ToUpper() (or .ToLower()) to compare strings in a
case-insensitive manner. It's strictly worse than using .Equals() with the
appropriate StringComparison flag.

Yes, even if you think it doesn't look as nice. And even (or actually
*especially*) for demonstration code.

--
J.

Merk

unread,

Jun 24, 2009, 3:53:55 PM6/24/09

to

RE

> Most people are clever enough to know what is done behind the scene.

No - they are not!

"Most people" can't even, well, I better stop right there...

Göran Andersson

unread,

Jun 26, 2009, 6:57:30 PM6/26/09

to

Cor Ligthert[MVP] wrote:
> Goran,
>
> Most people are clever enough to know what is done behind the scene.

What do you mean by that?

> Strings to Upper doesn't process the string,

Of course it does. A method that just returns the same string without
doing anything would be pretty useless, would it not?

> it only takes a subset of
> the ascii code, it does not work by instance with the Turkish uppercase i.

How is that relevant?

Ben Voigt [C++ MVP]

unread,

Jul 6, 2009, 9:23:19 PM7/6/09

to

"Gï¿œran Andersson" <gu...@guffa.com> wrote in message

news:uNUy5Fr9...@TK2MSFTNGP03.phx.gbl...

There exist strings a, b, c, and d for which

a.ToUpper() == b.ToUpper() but string.Equal(a,b,IgnoreCase) is false

c.ToUpper() != d.ToUpper() but string.Equal(c,d,IgnoreCase) is true

>
> --
> Gï¿œran Andersson
> _____
> http://www.guffa.com
>

> __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 4221 (20090706) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4221 (20090706) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com