In source files of some Tcl packages, I sometimes find the following
syntax for determining whether a string equals an empty string:
string equal x x$str
Personally, I always have obtained satisfactory results by doing just:
string equal "" $str
Is there a specific reason to do one or the other?
Thanks for your comment,
Erik
--
leunissen@ nl | Merge the left part of these two lines into one,
e. hccnet. | respecting a character's position in a line.
The first would probably only be written by those familiar with shell
(sh) syntax that sometimes didn't behave so intuitively with empty
strings. For Tcl I would always use the latter.
--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions
Recently, I've been using the test:
if { [string length $str] } {
# do something
}
To test the performance of the different methods:
# ----
set str "a string"
puts [time {string equal x x$str} 100000]
puts [time {string equal "" $str} 100000]
puts [time {string equal $str ""} 100000]
puts [time {string length $str} 100000]
proc time1 {str} {
return [string equal x x$str]
}
proc time2 {str} {
return [string equal "" $str]
}
proc time3 {str} {
return [string equal $str ""]
}
proc time4 {str} {
return [string length $str]
}
proc time5 {str} {
if { $str == "" } { return 1 } { return 0 }
}
for {set i 1} {$i <= 5} {incr i} {
time$i $str ; # to byte-compile
puts "time$i: [time {time$i $str} 100000]"
}
# ----
gives the results:
3 microseconds per iteration
2 microseconds per iteration
2 microseconds per iteration
2 microseconds per iteration
time1: 14 microseconds per iteration
time2: 8 microseconds per iteration
time3: 8 microseconds per iteration
time4: 9 microseconds per iteration
time5: 9 microseconds per iteration
Not much in it, apart from the "ugly" [string equal x x$str]. It's so
ugly it reminds of old DOS batch files. :-)
--
Mark G. Saye
markgsaye @ yahoo.com
> Erik Leunissen wrote:
> > string equal x x$str
> > string equal "" $str
> > Is there a specific reason to do one or the other?
>
> Recently, I've been using the test:
>
> if { [string length $str] } {
The program or programmer may pre-date [string equal].
My guess is that the test [string equal x x$str] originated
as x == "x$string". The x would prevent numeric comparisons.
(I do know of cases where null strings give a zero numeric
representation, but I can't recall under what circumstances.)
With older Tcl, before [string equal], and way before "eq",
the preferred (by me at least) tests would be
[string length $str] for [string equal $str ""] or $str eq ""
![string compare $a $b] for [string equal $a $b] or $str eq ""
> To test the performance of the different methods:
I added
proc time6 {str} {
return [string compare $str ""]
}
proc time7 {str} {
return [string match "" $str]
}
and got the results
time1: 6 microseconds per iteration
time2: 5 microseconds per iteration
time3: 5 microseconds per iteration
time4: 5 microseconds per iteration
time5: 5 microseconds per iteration
time6: 5 microseconds per iteration
time7: 5 microseconds per iteration
Funny. Even time1 (x$str) isn't so bad compared with
> time1: 14 microseconds per iteration
> time2: 8 microseconds per iteration
> time3: 8 microseconds per iteration
> time4: 9 microseconds per iteration
> time5: 9 microseconds per iteration
Donald Arseneau as...@triumf.ca
> I added
>
> proc time6 {str} { return [string compare $str ""] }
> proc time7 {str} { return [string match "" $str] }
Note that you really have to be careful to understand timing data,
especially with Tcl 8.4, which I'm guessing you are using. For
example, the string match above will actually become string equal
in byte code, because the pattern has no special chars in it. Also,
any str comp/eq check against just "" has special checks.
In any case, string compare should be used sparingly when string
equal or string length is an alternative. string equal can do an
extra quick check on whether the strings are of equal size first
(since we know their size) before any char-by-char comparison must
be done.
For general benchmarks see http://wiki.tcl.tk/1611, and look at the
STR ones for string comparisons.
I often use { $v == "a-string" } in 'if' statements but have often
wondered if I should get in the habit of using { [string equal $v
"a-string"] }. Is there a performance difference between the two
expresions?
Tom K.
> I often use { $v == "a-string" } in 'if' statements but have often
> wondered if I should get in the habit of using { [string equal $v
> "a-string"] }. Is there a performance difference between the two
> expresions?
Yes, the 'string equal $v "a-string"' (or '$v eq "a-string" in 8.4)
will be faster (we're talking maybe 1-2 usecs though). The reason
is that == is the multi-purpose equality operator. It tries to
convert its args to numbers first, and then failing that does the
string comparison. For that reason it is also important to use the
strict string equality operators if you are dealing with things that
could be mistaken for numbers.
Not in the core yet, but something I am considering as an opt is a
compile-time check of the args to see if one (or both) of the static
strings is already going to fail the number checks and then always
do string equality at runtime.
How old is the code? Just guessing here, but ...
It may have originally been written as:
if { x == x$str } { ... }
and later changed to [string equal x x$str] during
routine maintenance.
The construct:
if { x$str1 == x$str2 } { ... }
was a fairly common idiom before Tcl added [string equal]
and the "eq" operator; it ensures that Tcl compares $str1
and $str2 as strings even if they both happen to look like
numbers. (It's still a common idiom in Bourne shell scripts,
for much the same reason.)
[string equal] always treats its arguments as strings,
so prepending an "x" is unnecessary.
--Joe English
This reminds me of when I first learned html and read the
tutorial section about logical versus physical tags.
The purpose of your code here is to determine equality with
equal string, not to check the length. right? Checking the
length provides your purpose as a side effect, right?
So for code-maintenance urposes shouldn't you use the analogy of
the logical versus physical approach?
This is the argument I use during code reviews. But does it
really matter or do I just fool myself?
I am arguing this way for sake of a newcomber who may not know
what the true purpose is and may mistake the finger pointing to
the moon for the moon. (In this case, they may focus on the length
of the string rather than its emptiness.)
But if I were to get all logical positivist about it, there is
no difference now, and will there ever be? If there never will be
then perhaps I am just getting too philosophical and it's a non
issue and I shouldn't enforce this style.
--
sheila
In this case I think the test for length is precisely right -- he's
wanting to check whether there are any bytes in the string. I try to
always compare against [string length] if I'm trying to determine if the
string is empty or not.
Jeffrey Hobbs wrote:
>
> Donald Arseneau wrote:
> > With older Tcl, before [string equal], and way before "eq",
> > the preferred (by me at least) tests would be
> >
> > [string length $str] for [string equal $str ""] or $str eq ""
> >
> > ![string compare $a $b] for [string equal $a $b] or $str eq ""
>
> > I added
> >
> > proc time6 {str} { return [string compare $str ""] }
> > proc time7 {str} { return [string match "" $str] }
>
> Note that you really have to be careful to understand timing data,
> especially with Tcl 8.4, which I'm guessing you are using. For
> example, the string match above will actually become string equal
> in byte code, because the pattern has no special chars in it. Also,
> any str comp/eq check against just "" has special checks.
Also, correct me if I'm wrong, but [string length $str] convert str
to unicode, not [string equal $str ""].
I'd like to have an [empty] command that work on any type without
modifying its type.
--
-eric
That is correct, as it will call Tcl_GetCharLength, whereas string
equal will check to see if it is of the unicode string type before
doing a unicode string compare, otherwise defaulting to a utf
string comparison.
What about [string bytelength]?
--
Glenn Jackman
NCF Sysadmin
gle...@ncf.ca
> What about [string bytelength]?
What about it? That just returns the length field of the string
object (number of utf-8 chars), so whatever you have will be
ensured to have a string rep.
Yes.
> (number of utf-8 chars),
No. It's the number of bytes (not chars) in the UTF-8 encoding of
the string.
--
| Don Porter Mathematical and Computational Sciences Division |
| donald...@nist.gov Information Technology Laboratory |
| http://math.nist.gov/~DPorter/ NIST |
|______________________________________________________________________|