empty string comparison

Erik Leunissen

unread,

Apr 19, 2003, 1:21:53 PM4/19/03

to

L.S.

In source files of some Tcl packages, I sometimes find the following
syntax for determining whether a string equals an empty string:

string equal x x$str

Personally, I always have obtained satisfactory results by doing just:

string equal "" $str

Is there a specific reason to do one or the other?

Thanks for your comment,

Erik
--
leunissen@ nl | Merge the left part of these two lines into one,
e. hccnet. | respecting a character's position in a line.

Jeffrey Hobbs

unread,

Apr 19, 2003, 2:03:34 PM4/19/03

to

Erik Leunissen wrote:
> In source files of some Tcl packages, I sometimes find the following
> syntax for determining whether a string equals an empty string:
> string equal x x$str
> Personally, I always have obtained satisfactory results by doing just:
> string equal "" $str
> Is there a specific reason to do one or the other?

The first would probably only be written by those familiar with shell
(sh) syntax that sometimes didn't behave so intuitively with empty
strings. For Tcl I would always use the latter.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

Mark G. Saye

unread,

Apr 19, 2003, 3:35:22 PM4/19/03

to Erik Leunissen

Erik Leunissen wrote:
> L.S.
>
> In source files of some Tcl packages, I sometimes find the following
> syntax for determining whether a string equals an empty string:
>
> string equal x x$str
>
>
> Personally, I always have obtained satisfactory results by doing just:
>
> string equal "" $str
>
>
> Is there a specific reason to do one or the other?

Recently, I've been using the test:

if { [string length $str] } {
# do something
}

To test the performance of the different methods:

# ----
set str "a string"

puts [time {string equal x x$str} 100000]
puts [time {string equal "" $str} 100000]
puts [time {string equal $str ""} 100000]
puts [time {string length $str} 100000]

proc time1 {str} {
return [string equal x x$str]
}

proc time2 {str} {
return [string equal "" $str]
}

proc time3 {str} {
return [string equal $str ""]
}

proc time4 {str} {
return [string length $str]
}

proc time5 {str} {
if { $str == "" } { return 1 } { return 0 }
}

for {set i 1} {$i <= 5} {incr i} {
time$i $str ; # to byte-compile
puts "time$i: [time {time$i $str} 100000]"
}
# ----

gives the results:

3 microseconds per iteration
2 microseconds per iteration
2 microseconds per iteration
2 microseconds per iteration
time1: 14 microseconds per iteration
time2: 8 microseconds per iteration
time3: 8 microseconds per iteration
time4: 9 microseconds per iteration
time5: 9 microseconds per iteration

Not much in it, apart from the "ugly" [string equal x x$str]. It's so
ugly it reminds of old DOS batch files. :-)

--
Mark G. Saye
markgsaye @ yahoo.com

Donald Arseneau

unread,

Apr 19, 2003, 6:32:56 PM4/19/03

to

"Mark G. Saye" <mark...@yahoo.com> writes:

> Erik Leunissen wrote:
> > string equal x x$str

> > string equal "" $str
> > Is there a specific reason to do one or the other?
>
> Recently, I've been using the test:
>
> if { [string length $str] } {

The program or programmer may pre-date [string equal].
My guess is that the test [string equal x x$str] originated
as x == "x$string". The x would prevent numeric comparisons.
(I do know of cases where null strings give a zero numeric
representation, but I can't recall under what circumstances.)

With older Tcl, before [string equal], and way before "eq",
the preferred (by me at least) tests would be

[string length $str] for [string equal $str ""] or $str eq ""

![string compare $a $b] for [string equal $a $b] or $str eq ""

> To test the performance of the different methods:

I added

proc time6 {str} {
return [string compare $str ""]
}
proc time7 {str} {
return [string match "" $str]
}

and got the results

time1: 6 microseconds per iteration
time2: 5 microseconds per iteration
time3: 5 microseconds per iteration
time4: 5 microseconds per iteration
time5: 5 microseconds per iteration
time6: 5 microseconds per iteration
time7: 5 microseconds per iteration

Funny. Even time1 (x$str) isn't so bad compared with

> time1: 14 microseconds per iteration
> time2: 8 microseconds per iteration
> time3: 8 microseconds per iteration
> time4: 9 microseconds per iteration
> time5: 9 microseconds per iteration

Donald Arseneau as...@triumf.ca

Jeffrey Hobbs

unread,

Apr 19, 2003, 10:34:29 PM4/19/03

to

Donald Arseneau wrote:
> With older Tcl, before [string equal], and way before "eq",
> the preferred (by me at least) tests would be
>
> [string length $str] for [string equal $str ""] or $str eq ""
>
> ![string compare $a $b] for [string equal $a $b] or $str eq ""

> I added

>
> proc time6 {str} { return [string compare $str ""] }
> proc time7 {str} { return [string match "" $str] }

Note that you really have to be careful to understand timing data,
especially with Tcl 8.4, which I'm guessing you are using. For
example, the string match above will actually become string equal
in byte code, because the pattern has no special chars in it. Also,
any str comp/eq check against just "" has special checks.

In any case, string compare should be used sparingly when string
equal or string length is an alternative. string equal can do an
extra quick check on whether the strings are of equal size first
(since we know their size) before any char-by-char comparison must
be done.

For general benchmarks see http://wiki.tcl.tk/1611, and look at the
STR ones for string comparisons.

Tom Krehbiel

unread,

Apr 21, 2003, 11:17:44 AM4/21/03

to

Jeffrey Hobbs <Je...@ActiveState.com> wrote in message news:<3EA2078E...@ActiveState.com>...

I often use { $v == "a-string" } in 'if' statements but have often
wondered if I should get in the habit of using { [string equal $v
"a-string"] }. Is there a performance difference between the two
expresions?

Tom K.

Jeffrey Hobbs

unread,

Apr 21, 2003, 12:05:31 PM4/21/03

to

Tom Krehbiel wrote:
>>For general benchmarks see http://wiki.tcl.tk/1611, and look at the
>>STR ones for string comparisons.

> I often use { $v == "a-string" } in 'if' statements but have often
> wondered if I should get in the habit of using { [string equal $v
> "a-string"] }. Is there a performance difference between the two
> expresions?

Yes, the 'string equal $v "a-string"' (or '$v eq "a-string" in 8.4)
will be faster (we're talking maybe 1-2 usecs though). The reason
is that == is the multi-purpose equality operator. It tries to
convert its args to numbers first, and then failing that does the
string comparison. For that reason it is also important to use the
strict string equality operators if you are dealing with things that
could be mistaken for numbers.

Not in the core yet, but something I am considering as an opt is a
compile-time check of the args to see if one (or both) of the static
strings is already going to fail the number checks and then always
do string equality at runtime.

Joe English

unread,

Apr 21, 2003, 12:39:06 PM4/21/03

to

Erik Leunissen wrote:
>
>In source files of some Tcl packages, I sometimes find the following
>syntax for determining whether a string equals an empty string:
>
> string equal x x$str
>
>Personally, I always have obtained satisfactory results by doing just:
>
> string equal "" $str
>
>Is there a specific reason to do one or the other?

How old is the code? Just guessing here, but ...

It may have originally been written as:

if { x == x$str } { ... }

and later changed to [string equal x x$str] during
routine maintenance.

The construct:

if { x$str1 == x$str2 } { ... }

was a fairly common idiom before Tcl added [string equal]
and the "eq" operator; it ensures that Tcl compares $str1
and $str2 as strings even if they both happen to look like
numbers. (It's still a common idiom in Bourne shell scripts,
for much the same reason.)

[string equal] always treats its arguments as strings,
so prepending an "x" is unnecessary.

--Joe English

jeng...@flightlab.com

sheila miguez herndon

unread,

Apr 21, 2003, 1:55:31 PM4/21/03

to

Mark G. Saye wrote:
> Erik Leunissen wrote:
>> [on determining whether a string equals an empty string]

>
> Recently, I've been using the test:
>
> if { [string length $str] } {
> # do something
> }

This reminds me of when I first learned html and read the
tutorial section about logical versus physical tags.

The purpose of your code here is to determine equality with
equal string, not to check the length. right? Checking the
length provides your purpose as a side effect, right?

So for code-maintenance urposes shouldn't you use the analogy of
the logical versus physical approach?

This is the argument I use during code reviews. But does it
really matter or do I just fool myself?

I am arguing this way for sake of a newcomber who may not know
what the true purpose is and may mistake the finger pointing to
the moon for the moon. (In this case, they may focus on the length
of the string rather than its emptiness.)

But if I were to get all logical positivist about it, there is
no difference now, and will there ever be? If there never will be
then perhaps I am just getting too philosophical and it's a non
issue and I shouldn't enforce this style.

--
sheila

Bryan Oakley

unread,

Apr 21, 2003, 2:50:39 PM4/21/03

to

sheila miguez herndon wrote:
> Mark G. Saye wrote:
>
>> Erik Leunissen wrote:
>>
>>> [on determining whether a string equals an empty string]
>>
>>
>> Recently, I've been using the test:
>>
>> if { [string length $str] } {
>> # do something
>> }
>
>
> This reminds me of when I first learned html and read the
> tutorial section about logical versus physical tags.
>
> The purpose of your code here is to determine equality with
> equal string, not to check the length. right? Checking the
> length provides your purpose as a side effect, right?

In this case I think the test for length is precisely right -- he's
wanting to check whether there are any bytes in the string. I try to
always compare against [string length] if I'm trying to determine if the
string is empty or not.

Eric Boudaillier

unread,

Apr 22, 2003, 3:43:35 AM4/22/03

to

Jeffrey Hobbs wrote:
>
> Donald Arseneau wrote:
> > With older Tcl, before [string equal], and way before "eq",
> > the preferred (by me at least) tests would be
> >
> > [string length $str] for [string equal $str ""] or $str eq ""
> >
> > ![string compare $a $b] for [string equal $a $b] or $str eq ""
>
> > I added
> >
> > proc time6 {str} { return [string compare $str ""] }
> > proc time7 {str} { return [string match "" $str] }
>
> Note that you really have to be careful to understand timing data,
> especially with Tcl 8.4, which I'm guessing you are using. For
> example, the string match above will actually become string equal
> in byte code, because the pattern has no special chars in it. Also,
> any str comp/eq check against just "" has special checks.

Also, correct me if I'm wrong, but [string length $str] convert str
to unicode, not [string equal $str ""].
I'd like to have an [empty] command that work on any type without
modifying its type.

--
-eric

Jeffrey Hobbs

unread,

Apr 22, 2003, 11:48:31 AM4/22/03

to

Eric Boudaillier wrote:
> Also, correct me if I'm wrong, but [string length $str] convert str
> to unicode, not [string equal $str ""].
> I'd like to have an [empty] command that work on any type without
> modifying its type.

That is correct, as it will call Tcl_GetCharLength, whereas string
equal will check to see if it is of the unicode string type before
doing a unicode string compare, otherwise defaulting to a utf
string comparison.

Glenn Jackman

unread,

Apr 22, 2003, 12:13:50 PM4/22/03

to

Jeffrey Hobbs <Je...@ActiveState.com> wrote:
> Eric Boudaillier wrote:
> > Also, correct me if I'm wrong, but [string length $str] convert str
> > to unicode, not [string equal $str ""].
> > I'd like to have an [empty] command that work on any type without
> > modifying its type.
>
> That is correct, as it will call Tcl_GetCharLength, whereas string
> equal will check to see if it is of the unicode string type before
> doing a unicode string compare, otherwise defaulting to a utf
> string comparison.

What about [string bytelength]?

--
Glenn Jackman
NCF Sysadmin
gle...@ncf.ca

Jeffrey Hobbs

unread,

Apr 23, 2003, 11:40:47 AM4/23/03

to

Glenn Jackman wrote:

> Jeffrey Hobbs wrote:
>>Eric Boudaillier wrote:
>>>Also, correct me if I'm wrong, but [string length $str] convert str
>>>to unicode, not [string equal $str ""].
>>>I'd like to have an [empty] command that work on any type without
>>>modifying its type.
>>
>>That is correct, as it will call Tcl_GetCharLength, whereas string
>>equal will check to see if it is of the unicode string type before
>>doing a unicode string compare, otherwise defaulting to a utf
>>string comparison.

> What about [string bytelength]?

What about it? That just returns the length field of the string
object (number of utf-8 chars), so whatever you have will be
ensured to have a string rep.

Don Porter

unread,

Apr 23, 2003, 12:21:24 PM4/23/03

to

Glenn Jackman wrote:
>> What about [string bytelength]?

Jeffrey Hobbs wrote:
> What about it? That just returns the length field of the string
> object

Yes.

> (number of utf-8 chars),

No. It's the number of bytes (not chars) in the UTF-8 encoding of
the string.

--
| Don Porter Mathematical and Computational Sciences Division |
| donald...@nist.gov Information Technology Laboratory |
| http://math.nist.gov/~DPorter/ NIST |
|______________________________________________________________________|