All the proposed solutions compare two strings using [expr $str1 eq
$str2]. Why not [string equal $str1 $str2]? Is it just for better
readability, or is it more efficient?
I regularly attempt to use expr ... eq ... and equally regularly give
up. Please could someone tell me how to test whether the value of
variable a is "foo"? All of
[expr $a eq foo], [expr $a eq "foo"], [expr $a eq {foo}],
[expr "$a" eq foo], [expr "$a" eq "foo"], [expr "$a" eq {foo}],
[expr {$a} eq foo], [expr {$a} eq "foo"], [expr {$a} eq {foo}]
throw errors, even
set b foo; expr $a eq $b
throws an error. Surely I don't really need to write
set b foo; expr {$a} eq {$b}
do I? Sigh, all those braces!
Many thanks
--
Alan
expr {$a eq $b}
http://www.tcl.tk/man/tcl8.4/TclCmd/expr.htm
I don't think there's a significant difference in performance between
[string equals $a $b] and [expr {$a eq $b}]. For me personally, I
usually use the [expr] version (implicitly) because most of the string
comparisons I do are within an [if] statement: if {$a eq $b} ...
As jr4412 demonstrates, you should surround your entire expression in
braces. Also, any bare strings in an expression should be in quotes
or their own braces, thusly:
[expr {$a eq "foo"}]
or
[expr {$a eq {foo}}]
By the way, "surround your entire expression in braces" applies to all
expressions, not just those that compare strings. The expression
engine is optimized for this format, so braced expressions evaluate
faster. Try this:
puts "unbraced: [time { expr 15 * 20 } 1000]"
puts "braced: [time { expr {15 * 20} } 1000]"
Did you try:
[expr {"$a" eq "foo"}]
You should brace your expressions. Expr behaves better in all cases and
is faster too.
>
> Many thanks
--
Robert Heller -- 978-544-6933
Deepwoods Software -- Download the Model Railroad System
http://www.deepsoft.com/ -- Binaries for Linux and MS-Windows
hel...@deepsoft.com -- http://www.deepsoft.com/ModelRailroadSystem/
Indeed they compile to the same bytecode INST_STR_EQ, as you can
verify yourself with
::tcl::unsupported disassemble script {string equal $a $b}
::tcl::unsupported disassemble script {expr {$a eq $b}}
Rule of thumb: try in order:
1. Reading the manpages (here expr.n only hints at the identity of
[eq] and [string equal])
2. Disassembling
3. Reading the source
Of course you're welcome to add "Asking here" anywhere after 1 ;-)
-Alex
Many thanks to all who replied. I can now add expr {... eq ...} to my
armoury.
--
Alan
Wasn't aware of disassembling. Thanks for the tip!
But what the hell is that?
set str1 [string repeat "*" 10000]
set str2 ${str1}b
append str1 a
% time {string equal $str1 $str2} 10000
1.8146 microseconds per iteration
% time {string compare $str1 $str2} 10000
13.0694 microseconds per iteration
Why factor 7???
> Indeed they compile to the same bytecode INST_STR_EQ, as you can
> verify yourself with
>
> ::tcl::unsupported disassemble script {string equal $a $b}
> ::tcl::unsupported disassemble script {expr {$a eq $b}}
% :tcl::unsupported disassemble script {string compare $str1 $str2}
invalid command name ":tcl::unsupported"
% inf pa
8.5.8
Must be 8.6?
--
Gerhard Reithofer
Tech-EDV Support Forum - http://support.tech-edv.co.at
> :tcl::unsupported disassemble script {string compare $str1 $str2}
should be ::tcl::unsupported::disassemble
Works in 8.5 & 8.6, not 8.4.
Ian
--
*********** To reply by e-mail, make w single in address **************
Different code paths...
Looking at the code, we can also see that special-casing on the
internal representation, _and_ the absence of shimmering in those two
functions, make the timings very sensitive on values' history. Indeed
in your example, $str1 gets a String intrep, but $str2 stays a pure
string (ie only the strep, no intrep). This makes [string compare] do
the comparison on (quasi)utf-8 streps. Now if you ask
string range $str2 0 end
then you'll get a *much* faster [string compare]. On my machine it
gets thrice faster than [string equal]: nice revenge ! The reason is
the specific use of an unicode-based (not utf-8) comparator
TclUniCharNcmp when the two objects have the String type. Now why it
is even faster than [string equal]'s strncmp on the utf-8 is not
entirely clear to me. Maybe the 16-bit words are a bit easier to
handle for a 32-bit processor than 8-bit bytes ?
> % :tcl::unsupported disassemble script {string compare $str1 $str2}
> invalid command name ":tcl::unsupported"
Oops, sorry for the typo. Unconscious display of my dream to let space
become the namespace separator ;-)
-Alex
Sorry, thanks for pointing me...
Nevertheless:
::tcl::unsupported::disassemble script {string equal $str1 $str2}
...
(5) loadScalarStk
(6) streq
(7) done
::tcl::unsupported::disassemble script {string compare $str1 $str2}
...
(5) loadScalarStk
(6) strcmp
(7) done
So, why is strcmp so much slower than streq?
Furthermore it depends heavily on the string size:
string length $str1 $str2=10 10
string equal $str1 $str2: 0.3072 microseconds per iteration
string compare $str1 $str2: 0.3858 microseconds per iteration
string length $str1 $str2=100 100
string equal $str1 $str2: 0.3391 microseconds per iteration
string compare $str1 $str2: 0.7285 microseconds per iteration
string length $str1 $str2=1000 1000
string equal $str1 $str2: 0.4071 microseconds per iteration
string compare $str1 $str2: 4.17 microseconds per iteration
string length $str1 $str2=10000 10000
string equal $str1 $str2: 1.5443 microseconds per iteration
string compare $str1 $str2: 36.8114 microseconds per iteration
Because they are of different lengths. With [string equal], it is
enough to just compare the lengths and make a decision based on that
as different length strings cannot be equal. With [string compare],
every character (up to the length of the shortest argument string)
must be looked at in order to decide on lexical ordering, even when
the strings are of a different length. Try comparing two equal
strings. :-)
(As for why the difference is 7, I couldn't say. I get less than a
factor of two with 10k strings, but that might be with a different
processor/version.)
Donal.
> Because they are of different lengths.
If you look at the original code, it was careful to make them
the same length.
--
Donald Arseneau as...@triumf.ca
Right, [string compare] returns -1, 0 or 1. Zero means no differences
were found. I think [string compare] could return before reaching the
end of the shortest string, you are just looking for the first lexical
difference.
They are :/
>
> (As for why the difference is 7, I couldn't say. I get less than a
> factor of two with 10k strings, but that might be with a different
> processor/version.)
Donal, can you look at my post and offer insight ?
-Alex
Donal merely misspoke, and it is the first lexical difference that is
calculated. This is of course all done with strcmp or related byte-
appropriate function.
Jeff
OK, found it :)
Once both values have been converted to String representation (eg with
[string length]), we observe that [string compare] is roughly 3x
faster than [string equal]. The reason is that under these
circumstances, the former does a memcmp() while the latter does an
strcmp(). Indeed memcmp() is faster on modern architectures because it
accesses memory and does comparisons by whole machine words (typically
32 bits on a Pentium).
This gives an idea for a quick win: convert [string equal] to memcmp
too !
Thanks for unearthing that :)
-Alex
Hear, hear: Jeff has just ironed out all these discrepancies. Now
8.6HEAD lets all variants of string comparisons use the same
convergent code, which then forks over special cases of intrep (String
and ByteArray) for speedups.
Thanks Jeff !
-Alex