Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

expr ... eq ... versus string equal

694 views
Skip to first unread message

Alan Grunwald

unread,
Sep 14, 2010, 2:23:42 PM9/14/10
to
I read with interest the learned discussion in the thread "Best way to
compare two lists".

All the proposed solutions compare two strings using [expr $str1 eq
$str2]. Why not [string equal $str1 $str2]? Is it just for better
readability, or is it more efficient?

I regularly attempt to use expr ... eq ... and equally regularly give
up. Please could someone tell me how to test whether the value of
variable a is "foo"? All of

[expr $a eq foo], [expr $a eq "foo"], [expr $a eq {foo}],
[expr "$a" eq foo], [expr "$a" eq "foo"], [expr "$a" eq {foo}],
[expr {$a} eq foo], [expr {$a} eq "foo"], [expr {$a} eq {foo}]

throw errors, even

set b foo; expr $a eq $b

throws an error. Surely I don't really need to write

set b foo; expr {$a} eq {$b}

do I? Sigh, all those braces!

Many thanks
--
Alan

jr4412

unread,
Sep 14, 2010, 2:45:58 PM9/14/10
to
On Sep 14, 7:23 pm, Alan Grunwald <alan-clt-pos...@nospam.demon.co.uk>
wrote:

expr {$a eq $b}
http://www.tcl.tk/man/tcl8.4/TclCmd/expr.htm

Aric Bills

unread,
Sep 14, 2010, 2:57:41 PM9/14/10
to
On Sep 14, 12:23 pm, Alan Grunwald <alan-clt-

I don't think there's a significant difference in performance between
[string equals $a $b] and [expr {$a eq $b}]. For me personally, I
usually use the [expr] version (implicitly) because most of the string
comparisons I do are within an [if] statement: if {$a eq $b} ...

As jr4412 demonstrates, you should surround your entire expression in
braces. Also, any bare strings in an expression should be in quotes
or their own braces, thusly:

[expr {$a eq "foo"}]
or
[expr {$a eq {foo}}]

Aric Bills

unread,
Sep 14, 2010, 3:02:40 PM9/14/10
to

By the way, "surround your entire expression in braces" applies to all
expressions, not just those that compare strings. The expression
engine is optimized for this format, so braced expressions evaluate
faster. Try this:

puts "unbraced: [time { expr 15 * 20 } 1000]"
puts "braced: [time { expr {15 * 20} } 1000]"

Robert Heller

unread,
Sep 14, 2010, 3:29:49 PM9/14/10
to

Did you try:

[expr {"$a" eq "foo"}]

You should brace your expressions. Expr behaves better in all cases and
is faster too.

>
> Many thanks

--
Robert Heller -- 978-544-6933
Deepwoods Software -- Download the Model Railroad System
http://www.deepsoft.com/ -- Binaries for Linux and MS-Windows
hel...@deepsoft.com -- http://www.deepsoft.com/ModelRailroadSystem/

Alexandre Ferrieux

unread,
Sep 14, 2010, 4:17:34 PM9/14/10
to
On Sep 14, 8:57 pm, Aric Bills <aric.bi...@gmail.com> wrote:
>
> I don't think there's a significant difference in performance between
> [string equal(s) $a $b] and [expr {$a eq $b}].

Indeed they compile to the same bytecode INST_STR_EQ, as you can
verify yourself with

::tcl::unsupported disassemble script {string equal $a $b}
::tcl::unsupported disassemble script {expr {$a eq $b}}

Rule of thumb: try in order:

1. Reading the manpages (here expr.n only hints at the identity of
[eq] and [string equal])

2. Disassembling

3. Reading the source

Of course you're welcome to add "Asking here" anywhere after 1 ;-)

-Alex

Alan Grunwald

unread,
Sep 14, 2010, 4:58:57 PM9/14/10
to

Many thanks to all who replied. I can now add expr {... eq ...} to my
armoury.
--
Alan

Aric Bills

unread,
Sep 14, 2010, 10:13:56 PM9/14/10
to
On Sep 14, 2:17 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

Wasn't aware of disassembling. Thanks for the tip!

Gerhard Reithofer

unread,
Sep 17, 2010, 12:12:10 PM9/17/10
to
On Tue, 14 Sep 2010, Alexandre Ferrieux wrote:
> On Sep 14, 8:57 pm, Aric Bills <aric.bi...@gmail.com> wrote:
> >
> > I don't think there's a significant difference in performance between
> > [string equal(s) $a $b] and [expr {$a eq $b}].

But what the hell is that?

set str1 [string repeat "*" 10000]
set str2 ${str1}b
append str1 a

% time {string equal $str1 $str2} 10000
1.8146 microseconds per iteration
% time {string compare $str1 $str2} 10000
13.0694 microseconds per iteration

Why factor 7???

> Indeed they compile to the same bytecode INST_STR_EQ, as you can
> verify yourself with
>
> ::tcl::unsupported disassemble script {string equal $a $b}
> ::tcl::unsupported disassemble script {expr {$a eq $b}}

% :tcl::unsupported disassemble script {string compare $str1 $str2}
invalid command name ":tcl::unsupported"

% inf pa
8.5.8

Must be 8.6?

--
Gerhard Reithofer
Tech-EDV Support Forum - http://support.tech-edv.co.at

Ian Gay

unread,
Sep 17, 2010, 1:35:55 PM9/17/10
to
Gerhard Reithofer wrote:

> :tcl::unsupported disassemble script {string compare $str1 $str2}

should be ::tcl::unsupported::disassemble

Works in 8.5 & 8.6, not 8.4.

Ian


--
*********** To reply by e-mail, make w single in address **************

Alexandre Ferrieux

unread,
Sep 17, 2010, 5:26:54 PM9/17/10
to
On Sep 17, 6:12 pm, Gerhard Reithofer <gerhard.reitho...@tech-

edv.co.at> wrote:
> On Tue, 14 Sep 2010, Alexandre Ferrieux wrote:
> > On Sep 14, 8:57 pm, Aric Bills <aric.bi...@gmail.com> wrote:
>
> > > I don't think there's a significant difference in performance between
> > > [string equal(s) $a $b] and [expr {$a eq $b}].
>
> But what the hell is that?
>
> set str1 [string repeat "*" 10000]
> set str2 ${str1}b
> append str1 a
>
> % time {string equal   $str1 $str2} 10000
> 1.8146 microseconds per iteration
> % time {string compare $str1 $str2} 10000
> 13.0694 microseconds per iteration
>
> Why factor 7???

Different code paths...
Looking at the code, we can also see that special-casing on the
internal representation, _and_ the absence of shimmering in those two
functions, make the timings very sensitive on values' history. Indeed
in your example, $str1 gets a String intrep, but $str2 stays a pure
string (ie only the strep, no intrep). This makes [string compare] do
the comparison on (quasi)utf-8 streps. Now if you ask

string range $str2 0 end

then you'll get a *much* faster [string compare]. On my machine it
gets thrice faster than [string equal]: nice revenge ! The reason is
the specific use of an unicode-based (not utf-8) comparator
TclUniCharNcmp when the two objects have the String type. Now why it
is even faster than [string equal]'s strncmp on the utf-8 is not
entirely clear to me. Maybe the 16-bit words are a bit easier to
handle for a 32-bit processor than 8-bit bytes ?

> % :tcl::unsupported disassemble script {string compare $str1 $str2}
> invalid command name ":tcl::unsupported"

Oops, sorry for the typo. Unconscious display of my dream to let space
become the namespace separator ;-)

-Alex

Gerhard Reithofer

unread,
Sep 17, 2010, 6:27:26 PM9/17/10
to
On Fri, 17 Sep 2010, Ian Gay wrote:
> Gerhard Reithofer wrote:
>
> > :tcl::unsupported disassemble script {string compare $str1 $str2}
>
> should be ::tcl::unsupported::disassemble
>
> Works in 8.5 & 8.6, not 8.4.

Sorry, thanks for pointing me...

Nevertheless:
::tcl::unsupported::disassemble script {string equal $str1 $str2}
...
(5) loadScalarStk
(6) streq
(7) done

::tcl::unsupported::disassemble script {string compare $str1 $str2}
...
(5) loadScalarStk
(6) strcmp
(7) done

So, why is strcmp so much slower than streq?

Furthermore it depends heavily on the string size:
string length $str1 $str2=10 10
string equal $str1 $str2: 0.3072 microseconds per iteration
string compare $str1 $str2: 0.3858 microseconds per iteration
string length $str1 $str2=100 100
string equal $str1 $str2: 0.3391 microseconds per iteration
string compare $str1 $str2: 0.7285 microseconds per iteration
string length $str1 $str2=1000 1000
string equal $str1 $str2: 0.4071 microseconds per iteration
string compare $str1 $str2: 4.17 microseconds per iteration
string length $str1 $str2=10000 10000
string equal $str1 $str2: 1.5443 microseconds per iteration
string compare $str1 $str2: 36.8114 microseconds per iteration

Donal K. Fellows

unread,
Sep 17, 2010, 7:31:27 PM9/17/10
to
On 17 Sep, 17:12, Gerhard Reithofer <gerhard.reitho...@tech-edv.co.at>
wrote:

> % time {string equal   $str1 $str2} 10000
> 1.8146 microseconds per iteration
> % time {string compare $str1 $str2} 10000
> 13.0694 microseconds per iteration
>
> Why factor 7???

Because they are of different lengths. With [string equal], it is
enough to just compare the lengths and make a decision based on that
as different length strings cannot be equal. With [string compare],
every character (up to the length of the shortest argument string)
must be looked at in order to decide on lexical ordering, even when
the strings are of a different length. Try comparing two equal
strings. :-)

(As for why the difference is 7, I couldn't say. I get less than a
factor of two with 10k strings, but that might be with a different
processor/version.)

Donal.

Donald Arseneau

unread,
Sep 17, 2010, 9:02:52 PM9/17/10
to
"Donal K. Fellows" <donal.k...@manchester.ac.uk> writes:

> Because they are of different lengths.

If you look at the original code, it was careful to make them
the same length.


--
Donald Arseneau as...@triumf.ca

tom.rmadilo

unread,
Sep 17, 2010, 9:15:54 PM9/17/10
to
On Sep 17, 4:31 pm, "Donal K. Fellows"

Right, [string compare] returns -1, 0 or 1. Zero means no differences
were found. I think [string compare] could return before reaching the
end of the shortest string, you are just looking for the first lexical
difference.

Alexandre Ferrieux

unread,
Sep 18, 2010, 8:52:03 AM9/18/10
to
On Sep 18, 1:31 am, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> On 17 Sep, 17:12, Gerhard Reithofer <gerhard.reitho...@tech-edv.co.at>
> wrote:
>
> > % time {string equal   $str1 $str2} 10000
> > 1.8146 microseconds per iteration
> > % time {string compare $str1 $str2} 10000
> > 13.0694 microseconds per iteration
>
> > Why factor 7???
>
> Because they are of different lengths. With [string equal], it is
> enough to just compare the lengths and make a decision based on that
> as different length strings cannot be equal. With [string compare],
> every character (up to the length of the shortest argument string)
> must be looked at in order to decide on lexical ordering, even when
> the strings are of a different length. Try comparing two equal
> strings. :-)

They are :/

>
> (As for why the difference is 7, I couldn't say. I get less than a
> factor of two with 10k strings, but that might be with a different
> processor/version.)

Donal, can you look at my post and offer insight ?

-Alex

Jeff Hobbs

unread,
Sep 18, 2010, 12:22:51 PM9/18/10
to

Donal merely misspoke, and it is the first lexical difference that is
calculated. This is of course all done with strcmp or related byte-
appropriate function.

Jeff

Alexandre Ferrieux

unread,
Sep 20, 2010, 5:45:07 PM9/20/10
to
On Sep 18, 2:52 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

OK, found it :)

Once both values have been converted to String representation (eg with
[string length]), we observe that [string compare] is roughly 3x
faster than [string equal]. The reason is that under these
circumstances, the former does a memcmp() while the latter does an
strcmp(). Indeed memcmp() is faster on modern architectures because it
accesses memory and does comparisons by whole machine words (typically
32 bits on a Pentium).

This gives an idea for a quick win: convert [string equal] to memcmp
too !

Thanks for unearthing that :)

-Alex

Alexandre Ferrieux

unread,
Sep 25, 2010, 11:47:46 AM9/25/10
to
On Sep 20, 11:45 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
>

> This gives an idea for a quick win: convert [string equal] to memcmp
> too !
> Thanks for unearthing that :)

Hear, hear: Jeff has just ironed out all these discrepancies. Now
8.6HEAD lets all variants of string comparisons use the same
convergent code, which then forks over special cases of intrep (String
and ByteArray) for speedups.

Thanks Jeff !

-Alex

0 new messages