What modifications can be done to improve perfomance (there are
millions of such
comparisons involved) if the format of the string is always alphabets
followed by
numerals and separated by a space, i.e. following kind of strings can
be present:
ABC
ABC (52)
DEF (152)
etc. ...
The best I came up with is as follows, any improvements are
appreciated.
thanks
Rohit
set timeVal [time {
set value1 "ABC (52)"
set value2 "ABC (452)"
set index1 [string first "(" $value1]
set index2 [string first "(" $value2]
if { -1 == $index1 && -1 == $index2 } {
set retVal [string compare $value1 $value2]
} else {
set splitList1 [split $value1]
set splitList2 [split $value2]
set retVal 0
set alphaVal1 [lindex $splitList1 0]
set alphaVal2 [lindex $splitList2 0]
# If alphaVal are different do a simple string compare
if { $alphaVal1 != $alphaVal2 } {
set retVal [string compare $value1 $value2]
}
# If numeralVals are different we need to compare them also
if { [lindex $splitList1 1] != [lindex $splitList2 1]} {
set numeralVal1 [regsub -all {[^0-9]} [lindex $splitList1 1]
""]
set numeralVal2 [regsub -all {[^0-9]} [lindex $splitList2 1]
""]
if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {
set retVal 1
} elseif { [expr {$numeralVal1}] < [expr {$numeralVal2}] } {
set retVal -1
}
}
}
# puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
$numeralVal2, $retVal"
} 100000]
puts "time: $timeVal"
Please find whoever taught you that and tell them never to teach
Tcl again.
DGP
I am too dumb to compare strings.
But I can sort them:
set vals {
"ABC"
"ABC (334)"
"ABC (34)"
"ABC (4)"
"ABD (335)"
"ABD (35)"
"ABD (5)"
"ABCBD (335)"
"ABCBD (35)"
"ABCBD (5)"
}
# prepare
foreach val $vals {
set lst [ split $val () ]
lappend lst 0 0 0 0 0 0 0 0
set lst [ lrange $lst 0 4]
lappend list $lst
set ::arr($lst) $val
}
# sort
set sorted $list
set sorted [ lsort -index 1 -integer -increasing $sorted ]
set sorted [ lsort -index 0 $sorted ]
# retrieve the original string
foreach item $sorted {
lappend final $::arr($item)
}
# print
puts [ join $final \n ]
# end
anyway the way to go imho is to format strings
into a sortable ( as string or list ) presentation
and keep a reference to the original.
uwe
uwe
Why not use [lsort -dictionary] if the aim is sorting?
The [lsort -dictionary] command could also be the core of
a pairwise comparison function, but I'll leave it to other
to test whether the performance of such a thing is acceptable
and/or better than other alternatives.
DGP
In my opinion this shows that string compare is missing an option. Isn't it?
George
duh!
uwe
Here's an implementation
proc string_comp_dictionary {s1 s2} {
if {$s1 eq $s2} {
return 0
} else {
set l [list $s1 $s2]
if {$l == [lsort -dictionary $l]} {
return -1
} else {
return 1
}
}
}
Is that the best way to examine list equality? I also considered
examining them as strings:
{"$s1 $s2" eq [join [lsort -dictionary [list $s1 $s2]]]}
--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous
Georgios Petasis wrote:
> In my opinion this shows that string compare is missing an option. Isn't
> it?
I think a good case could be made that the comparison function embedded
in [lsort -dictionary] ought to be exposed to scripts as an independent
command.
I do not think it would be a good idea to do so by adding an option to
[string compare]. As a matter of taste, I don't much like options. If
a command doesn't do what you want, create a new one that does, rather
than add an option to transform an existing command into a different one.
What about "-nocase" you ask? I lost.
DGP
I have not one to blame DGP, self taught. But I know the code is
wrong,
written hurriedly, dont think there is need to do expr as we already
have pure numbers.
Thanks Glenn, I think this would be faster that what I have.
> proc string_comp_dictionary {s1 s2} {
> if {$s1 eq $s2} {
> return 0
> } else {
> set l [list $s1 $s2]
> if {$l == [lsort -dictionary $l]} {
> return -1
> } else {
> return 1
> }
> }
> }
You don't need to test list equality, fortunately, because
there is the -indices option:
proc string_comp_dictionary {s1 s2} {
if {$s1 eq $s2} {
return 0
} else {
set il [lsort -dictionary -indices [list $s1 $s2]]
return [expr {[lindex $il 0] ? -1 : 1}]
}
}
Indeed improvements are there, here are wost case timings over 100
thousand interations:
My code old code: 27 microseconds per iteration
Glenn's code: 6 microseconds per iteration
[if] and [while] do the "expression test" via [expr].
any additional embedded use of [expr] is unnecessary.
uwe
Nice. Thanks.
Thanks to all the wise tcl masters!
Donald A, I think "indices" option is available only in 8.5 right?
Most sorting routines only expect < 0 though.
set loop 100000
set timeVal [time {
set value1 "ABC (52)"
set value2 "ABC (452)"
set splitList1 [ split $value1 " ()"]
set splitList2 [ split $value2 " ()"]
set alphaVal1 [lindex $splitList1 0]
set alphaVal2 [lindex $splitList2 0]
set retVal [string compare $alphaVal1 $alphaVal2]
set numeralVal1 {}
set numeralVal2 {}
if { $retVal == 0 } {
set numeralVal1 [lindex $splitList1 2]
set numeralVal2 [lindex $splitList2 2]
# If numeralVals are different we need to compare them also
set retVal [ expr {$numeralVal1 - $numeralVal2}]
}
# puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
$numeralVal2, $retVal"
} $loop ]
puts "time: $timeVal"
Thanks Larry, yes it is better but I dont think it as fast as lsort
simply because I guess "split" takes too much time.
Here's what I use when I need this:
proc dictionary_compare {a b} {
lindex [lsort -dictionary -index 0 [list [list $a -1] [list $b 1]]] 0 1
}
[ And then I always grumble about the unholy mess of options
flags and feechurs that is the [lsort] command and wish
that Tcl just provided [dictionary_compare] as a standalone
primitive instead. But that's a different problem. ]
> Is that the best way to examine list equality? I also considered
> examining them as strings:
>
> {"$s1 $s2" eq [join [lsort -dictionary [list $s1 $s2]]]}
I wouldn't really use either one of those.
"List equality" isn't really well-defined -- you need to
first decide "lists of *what*?" Comparing two lists
of numbers is a different function than comparing two
lists of strings, or two lists of dictionaries, or
two lists of....
It usually comes down to [all [zipWith $predicate $l1 $l2]],
where $predicate is a comparison function appropriate
to the type of the elements of the list.
--Joe English
Ah yes, I have forgotten when it came in, but it isn't
in 8.4.13.
if {[lindex [lsort -dictionary [list $s1 $s2]] 0] eq $s1} {
return -1
} else {
return 1
}
Schelte