Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dictionary Compare for Alphanumeric Strings in TCL

276 views
Skip to first unread message

Rohit M

unread,
Apr 20, 2010, 11:33:01 AM4/20/10
to
Hi All,
I am looking for the fastest method to compare two alphanumeric
strings according
to dictionary algorithm and return 1, 0, or -1 just like "string
compare" does.
By dictionary algorithm I mean comparing "ABC (52)" with "ABC (152)"
should return
-1 as 52 is lesser than 152.

What modifications can be done to improve perfomance (there are
millions of such
comparisons involved) if the format of the string is always alphabets
followed by
numerals and separated by a space, i.e. following kind of strings can
be present:

ABC
ABC (52)
DEF (152)
etc. ...

The best I came up with is as follows, any improvements are
appreciated.
thanks
Rohit

set timeVal [time {
set value1 "ABC (52)"
set value2 "ABC (452)"

set index1 [string first "(" $value1]
set index2 [string first "(" $value2]
if { -1 == $index1 && -1 == $index2 } {
set retVal [string compare $value1 $value2]
} else {
set splitList1 [split $value1]
set splitList2 [split $value2]

set retVal 0

set alphaVal1 [lindex $splitList1 0]
set alphaVal2 [lindex $splitList2 0]

# If alphaVal are different do a simple string compare
if { $alphaVal1 != $alphaVal2 } {
set retVal [string compare $value1 $value2]
}

# If numeralVals are different we need to compare them also
if { [lindex $splitList1 1] != [lindex $splitList2 1]} {
set numeralVal1 [regsub -all {[^0-9]} [lindex $splitList1 1]
""]
set numeralVal2 [regsub -all {[^0-9]} [lindex $splitList2 1]
""]
if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {
set retVal 1
} elseif { [expr {$numeralVal1}] < [expr {$numeralVal2}] } {
set retVal -1
}
}
}
# puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
$numeralVal2, $retVal"
} 100000]
puts "time: $timeVal"

Donald G Porter

unread,
Apr 20, 2010, 12:07:57 PM4/20/10
to
Rohit M wrote:
> if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {

Please find whoever taught you that and tell them never to teach
Tcl again.

DGP

Uwe Klein

unread,
Apr 20, 2010, 12:14:49 PM4/20/10
to

I am too dumb to compare strings.
But I can sort them:

set vals {
"ABC"
"ABC (334)"
"ABC (34)"
"ABC (4)"
"ABD (335)"
"ABD (35)"
"ABD (5)"
"ABCBD (335)"
"ABCBD (35)"
"ABCBD (5)"
}

# prepare
foreach val $vals {
set lst [ split $val () ]
lappend lst 0 0 0 0 0 0 0 0
set lst [ lrange $lst 0 4]
lappend list $lst
set ::arr($lst) $val
}
# sort
set sorted $list
set sorted [ lsort -index 1 -integer -increasing $sorted ]
set sorted [ lsort -index 0 $sorted ]

# retrieve the original string
foreach item $sorted {
lappend final $::arr($item)
}
# print
puts [ join $final \n ]
# end
anyway the way to go imho is to format strings
into a sortable ( as string or list ) presentation
and keep a reference to the original.

uwe

uwe

Donald G Porter

unread,
Apr 20, 2010, 12:20:47 PM4/20/10
to
Uwe Klein wrote:
> I am too dumb to compare strings.
> But I can sort them:

Why not use [lsort -dictionary] if the aim is sorting?

The [lsort -dictionary] command could also be the core of
a pairwise comparison function, but I'll leave it to other
to test whether the performance of such a thing is acceptable
and/or better than other alternatives.

DGP

Georgios Petasis

unread,
Apr 20, 2010, 12:27:07 PM4/20/10
to

In my opinion this shows that string compare is missing an option. Isn't it?

George

Uwe Klein

unread,
Apr 20, 2010, 12:41:01 PM4/20/10
to
Donald G Porter wrote:
> Uwe Klein wrote:
>
>> I am too dumb to compare strings.
>> But I can sort them:
>
>
> Why not use [lsort -dictionary] if the aim is sorting?

duh!

uwe

Glenn Jackman

unread,
Apr 20, 2010, 3:17:44 PM4/20/10
to

Here's an implementation

proc string_comp_dictionary {s1 s2} {
if {$s1 eq $s2} {
return 0
} else {
set l [list $s1 $s2]
if {$l == [lsort -dictionary $l]} {
return -1
} else {
return 1
}
}
}

Is that the best way to examine list equality? I also considered
examining them as strings:

{"$s1 $s2" eq [join [lsort -dictionary [list $s1 $s2]]]}

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

Donald G Porter

unread,
Apr 20, 2010, 5:13:52 PM4/20/10
to

>> The [lsort -dictionary] command could also be the core of
>> a pairwise comparison function, but I'll leave it to other
>> to test whether the performance of such a thing is acceptable
>> and/or better than other alternatives.

Georgios Petasis wrote:
> In my opinion this shows that string compare is missing an option. Isn't
> it?

I think a good case could be made that the comparison function embedded
in [lsort -dictionary] ought to be exposed to scripts as an independent
command.

I do not think it would be a good idea to do so by adding an option to
[string compare]. As a matter of taste, I don't much like options. If
a command doesn't do what you want, create a new one that does, rather
than add an option to transform an existing command into a different one.

What about "-nocase" you ask? I lost.

DGP

Rohit M

unread,
Apr 21, 2010, 12:19:44 AM4/21/10
to

I have not one to blame DGP, self taught. But I know the code is
wrong,
written hurriedly, dont think there is need to do expr as we already
have pure numbers.

Rohit M

unread,
Apr 21, 2010, 12:20:11 AM4/21/10
to

Thanks Glenn, I think this would be faster that what I have.

Donald Arseneau

unread,
Apr 21, 2010, 2:22:40 AM4/21/10
to
On Apr 20, 12:17 pm, Glenn Jackman <gle...@ncf.ca> wrote:

>     proc string_comp_dictionary {s1 s2} {
>         if {$s1 eq $s2} {
>             return 0
>         } else {
>             set l [list $s1 $s2]
>             if {$l == [lsort -dictionary $l]} {
>                 return -1
>             } else {
>                 return 1
>             }
>         }
>     }

You don't need to test list equality, fortunately, because
there is the -indices option:

proc string_comp_dictionary {s1 s2} {
if {$s1 eq $s2} {
return 0
} else {

set il [lsort -dictionary -indices [list $s1 $s2]]
return [expr {[lindex $il 0] ? -1 : 1}]
}
}

Rohit M

unread,
Apr 21, 2010, 3:15:22 AM4/21/10
to

Rohit M

unread,
Apr 21, 2010, 3:15:29 AM4/21/10
to
On Apr 21, 9:20 am, Rohit M <rohit.marka...@gmail.com> wrote:

Indeed improvements are there, here are wost case timings over 100
thousand interations:
My code old code: 27 microseconds per iteration
Glenn's code: 6 microseconds per iteration

Uwe Klein

unread,
Apr 21, 2010, 3:05:24 AM4/21/10
to

[if] and [while] do the "expression test" via [expr].
any additional embedded use of [expr] is unnecessary.

uwe

Glenn Jackman

unread,
Apr 21, 2010, 8:38:52 AM4/21/10
to

Nice. Thanks.

Rohit M

unread,
Apr 21, 2010, 10:34:10 AM4/21/10
to

Thanks to all the wise tcl masters!
Donald A, I think "indices" option is available only in 8.5 right?

Larry W. Virden

unread,
Apr 21, 2010, 12:39:28 PM4/21/10
to
Here's a tweaked version of your original code. I tried to reduce the
code to achieve the philosophy of the original. However, the retVal is
< 0, 0, or > 0 rather than specifically -1, 0, 1 .

Most sorting routines only expect < 0 though.

set loop 100000


set timeVal [time {
set value1 "ABC (52)"
set value2 "ABC (452)"

set splitList1 [ split $value1 " ()"]
set splitList2 [ split $value2 " ()"]

set alphaVal1 [lindex $splitList1 0]
set alphaVal2 [lindex $splitList2 0]

set retVal [string compare $alphaVal1 $alphaVal2]
set numeralVal1 {}
set numeralVal2 {}

if { $retVal == 0 } {

set numeralVal1 [lindex $splitList1 2]
set numeralVal2 [lindex $splitList2 2]

# If numeralVals are different we need to compare them also

set retVal [ expr {$numeralVal1 - $numeralVal2}]


}
# puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
$numeralVal2, $retVal"

} $loop ]
puts "time: $timeVal"

Rohit M

unread,
Apr 21, 2010, 2:35:36 PM4/21/10
to

Thanks Larry, yes it is better but I dont think it as fast as lsort
simply because I guess "split" takes too much time.

Joe English

unread,
Apr 21, 2010, 7:56:43 PM4/21/10
to
Glenn Jackman wrote:

> Donald G Porter wrote:
>> The [lsort -dictionary] command could also be the core of
>> a pairwise comparison function, but I'll leave it to other
>> to test whether the performance of such a thing is acceptable
>> and/or better than other alternatives.
>
> Here's an implementation
>
> proc string_comp_dictionary {s1 s2} {
> if {$s1 eq $s2} {
> return 0
> } else {
> set l [list $s1 $s2]
> if {$l == [lsort -dictionary $l]} {
> return -1
> } else {
> return 1
> }
> }
> }


Here's what I use when I need this:

proc dictionary_compare {a b} {
lindex [lsort -dictionary -index 0 [list [list $a -1] [list $b 1]]] 0 1
}

[ And then I always grumble about the unholy mess of options
flags and feechurs that is the [lsort] command and wish
that Tcl just provided [dictionary_compare] as a standalone
primitive instead. But that's a different problem. ]


> Is that the best way to examine list equality? I also considered
> examining them as strings:
>
> {"$s1 $s2" eq [join [lsort -dictionary [list $s1 $s2]]]}

I wouldn't really use either one of those.

"List equality" isn't really well-defined -- you need to
first decide "lists of *what*?" Comparing two lists
of numbers is a different function than comparing two
lists of strings, or two lists of dictionaries, or
two lists of....

It usually comes down to [all [zipWith $predicate $l1 $l2]],
where $predicate is a comparison function appropriate
to the type of the elements of the list.


--Joe English

Donald Arseneau

unread,
Apr 22, 2010, 8:58:28 AM4/22/10
to
On Apr 21, 7:34 am, Rohit M <rohit.marka...@gmail.com> wrote:
> Donald A, I think "indices" option is available only in 8.5 right?

Ah yes, I have forgotten when it came in, but it isn't
in 8.4.13.

Schelte Bron

unread,
Apr 22, 2010, 10:55:37 AM4/22/10
to
Glenn Jackman wrote:
> Is that the best way to examine list equality? I also considered
> examining them as strings:
>
> {"$s1 $s2" eq [join [lsort -dictionary [list $s1 $s2]]]}
>
Fortunately you don't need to examine list equality at all. You are
working with a two-element list where you have already established
that the two elements are different. So you can just check the first
element against the original first element to know for sure whether
the elements have swapped places or not:

if {[lindex [lsort -dictionary [list $s1 $s2]] 0] eq $s1} {


return -1
} else {
return 1
}


Schelte

0 new messages