Improving expr performance?

slks...@gmail.com

unread,

Jan 2, 2018, 2:13:24 PM1/2/18

to

I came across this page which compares the speed of some math functions across languages (perl, python, c, javascript etc):

http://flux242.blogspot.ca/2013/09/javascript-vs-perl-vs-python-vs-lua.html

Tried my hand at creating a corresponding tcl script. The performance is abysmal and I don't know if there is a more tclish way of writing this code to make it faster or if I have done something fundamentally incorrect.

time exp.tcl takes 65m33s !!
time exp.js takes 0.598s
time exp.bin (c) takes 1.324s
time exp.pl takes 35.879s

My code is as follows. Thanks in advance for any insight that may be forthcoming.

==========================================================

#!/usr/bin/env tclsh

set explim 100
set step 0.000001

proc myexp {val} {
global explim
set sum 0.0
set fact 1.0
set x 1.0

for {set i 1} {$i < $explim} {incr i} {
set fact [expr $fact * $i]
set x [expr $x * $val]
set sum [expr $sum + [expr $x / $fact]]
}
return [expr $sum + 1.0]
}

proc integrate {min max} {
global step
set sum 0.0
while {$min < $max} {
set sum [expr {$sum + [expr {[myexp $min] * $step }] }]
#puts $sum
set min [expr {$min + $step}]
#puts $min
}
return $sum
}

puts [myexp 1.0]
puts [integrate 0.0 1.0]

slks...@gmail.com

unread,

Jan 2, 2018, 2:27:37 PM1/2/18

to

> time exp.tcl takes 65m33s !!
> time exp.js takes 0.598s
> time exp.bin (c) takes 1.324s
> time exp.pl takes 35.879s
>
> My code is as follows. Thanks in advance for any insight that may be forthcoming.
>

My bad. Forgot the braces in the expr's in the first proc. Added those in and made a huge difference. Time went down to 52.5s. Goes to show how much of a difference the braces make in math operations. Still the slowest language of the bunch though :(

Donal K. Fellows

unread,

Jan 2, 2018, 3:58:41 PM1/2/18

to

On 02/01/2018 19:27, slks...@gmail.com wrote:
> Goes to show how much of a difference the braces make in math operations.

Yes. Without braces, the expression has to be recompiled each time
through, and there's really no way to be fast about that.

With the braces in correctly, it takes 35.6s on this machine. (I guess
my CPU is stepping faster than yours.) Changing to taking the explim and
step as optional arguments (a more Tcl-ish way of doing things here)
cuts the time down to 34.2s. Here's what that best version looks like
(the changes to [integrate] are comparatively trivial). NB: Tcl follows
normal PEMDAS rules so we don't need those nested [expr] calls either.

proc myexp {val {explim 100}} {

set sum 0.0
set fact 1.0
set x 1.0

for {set i 1} {$i < $explim} {incr i} {
set fact [expr {$fact * $i}]
set x [expr {$x * $val}]

set sum [expr {$sum + $x / $fact}]

}
return [expr {$sum + 1.0}]
}

The major problem with the performance of all this is the cost of boxing
and unboxing all those Tcl_Obj* values, and that's a known *massive*
problem that we're working on; it's the subject of the BIG FlightAware
prize challenge. Running the code through our experimental tclquadcode
compiler leads me to estimate that it would take around 0.38s (excluding
the cost of compilation, which would actually be small by comparison
with the original runtime here). I can't compare with your figures
directly, but under a uniform scaling assumption (NB: THIS IS UNSAFE!)
I'd expect execution time around 0.57s, which should be pretty similar
to the Javascript and C. The acceleration is just a little under 90×
(you probably shouldn't read more than one significant figure in that!)
and is in line with what we'd expect from heavy numeric code with
multiple procedures. (There's a lot of uncertainty in this due to the
fact that our difference in processors could hide all sorts of weird
things. Alas, asking you to just build with tclquadcode isn't yet a
viable option; we're still pre-alpha for a bunch of unrelated reasons.)

Also, I think your figures for C are well out. Did you not build it in
optimized form? It should be a little faster than the Javascript.

Donal.
--
Donal Fellows — Tcl user, Tcl maintainer, TIP editor.

Fred Blogd

unread,

Jan 2, 2018, 4:01:17 PM1/2/18

to

Hi SLKS

Try removing the secondary calls to expr, these aren't needed.

proc myexp {val} {
global explim
set sum 0.0
set fact 1.0
set x 1.0

for {set i 1} {$i < $explim} {incr i} {
set fact [expr {$fact * $i}]
set x [expr {$x * $val}]

set sum [expr {$sum + ( $x / $fact)}]

}
return [expr {$sum + 1.0}]
}

proc integrate {min max} {
global step
set sum 0.0
while {$min < $max} {

set sum [expr {$sum + ([myexp $min] * $step) }]

#puts $sum
set min [expr {$min + $step}]
#puts $min
}
return $sum
}

This makes a massive improvement on my MAC as a delta.

Dec

Brad Lanam

unread,

Jan 2, 2018, 4:43:24 PM1/2/18

to

On Tuesday, January 2, 2018 at 12:58:41 PM UTC-8, Donal K. Fellows wrote:
> ...

> With the braces in correctly, it takes 35.6s on this machine. (I guess
> my CPU is stepping faster than yours.) Changing to taking the explim and
> step as optional arguments (a more Tcl-ish way of doing things here)
> cuts the time down to 34.2s. Here's what that best version looks like

> ...

Interesting. Takes longer if I remove the globals. 51s vs. 50s. (8.6.8).

slks...@gmail.com

unread,

Jan 2, 2018, 5:57:46 PM1/2/18

to

On Tuesday, January 2, 2018 at 4:01:17 PM UTC-5, Fred Blogd wrote:
> Hi SLKS
>
> Try removing the secondary calls to expr, these aren't needed.
>

Thanks, I realized this after I posted and made the changes. Haven't written tcl code in a while. It didn't seem to make much of a difference performance wise. The largest gains were from using brackets for the expr statements. Fastest time was around 47s on my mac mini 2012 2.5gz.

slks...@gmail.com

unread,

Jan 2, 2018, 7:29:11 PM1/2/18

to

On Tuesday, January 2, 2018 at 3:58:41 PM UTC-5, Donal K. Fellows wrote:

> > Goes to show how much of a difference the braces make in math operations.
>
> Yes. Without braces, the expression has to be recompiled each time
> through, and there's really no way to be fast about that.
>

Thanks. I found some info about that in the wiki today.

. NB: Tcl follows
> normal PEMDAS rules so we don't need those nested [expr] calls either.
>

Brain cramp on my part. Haven't written any code in a while. Decided to take up javascript as a new years resolution to learn a new language. All those nested expr were not necessary yet they didn't seem to affect runtime too much.

>
> The major problem with the performance of all this is the cost of boxing
> and unboxing all those Tcl_Obj* values, and that's a known *massive*
> problem that we're working on; it's the subject of the BIG FlightAware
> prize challenge. Running the code through our experimental tclquadcode
> compiler leads me to estimate that it would take around 0.38s (excluding
> the cost of compilation, which would actually be small by comparison
> with the original runtime here). I can't compare with your figures
> directly, but under a uniform scaling assumption (NB: THIS IS UNSAFE!)
> I'd expect execution time around 0.57s, which should be pretty similar
> to the Javascript and C. The acceleration is just a little under 90×
> (you probably shouldn't read more than one significant figure in that!)
> and is in line with what we'd expect from heavy numeric code with
> multiple procedures. (There's a lot of uncertainty in this due to the
> fact that our difference in processors could hide all sorts of weird
> things. Alas, asking you to just build with tclquadcode isn't yet a
> viable option; we're still pre-alpha for a bunch of unrelated reasons.)
>

Thanks for taking the time with your response. It's good to know the tcl team is aware and taking action.

> Also, I think your figures for C are well out. Did you not build it in
> optimized form? It should be a little faster than the Javascript.

I was suprised by the C vs javascript times as well. Nothing was optimized. Just a simple gcc compilation "gcc -o exp.bin exp.c".

thanks again,
Stephen

stefan

unread,

Jan 3, 2018, 5:42:26 AM1/3/18

to

> It's good to know the tcl team is aware and taking action.

Kind of lame reply, I am aware, but in Tcl, if you were after C's performance, then you would implement your computation in C and expose it to Tcl (genuine extension, critcl, name it).

Besides: Note that in the original write-up, it is not entirely clear (to me and some commenters) which language implementations are being compared (CPython vs. PyPy etc.) That said, always run your own ... which you did :)

Stefan

Donal K. Fellows

unread,

Jan 3, 2018, 6:31:37 AM1/3/18

to

On 02/01/2018 22:57, slks...@gmail.com wrote:
> It didn't seem to make much of a difference performance wise.

The secondary [expr] calls won't, provided the expressions themselves
are braced[*], as the bytecode sequence will be nearly identical. I
still think it is good to remove them, just because they're unnecessary
and a bit messy in how they look. :-)

Donal.
[* As noted elsewhere in this thread, unbraced expressions are slow
because they need to be recompiled every time they are evaluated. ]

slks...@gmail.com

unread,

Jan 3, 2018, 7:08:12 PM1/3/18

to

On Wednesday, January 3, 2018 at 5:42:26 AM UTC-5, stefan wrote:
> > It's good to know the tcl team is aware and taking action.
>
> Kind of lame reply, I am aware, but in Tcl, if you were after C's performance, then you would implement your computation in C and expose it to Tcl (genuine extension, critcl, name it).
>

Yeah I agree. It was not a critical component of a software system, just a fun experiment. I wanted to see how a pure vanilla tcl interpretation of the script would stack up.

slks...@gmail.com

unread,

Jan 3, 2018, 7:17:46 PM1/3/18

to

On Wednesday, January 3, 2018 at 6:31:37 AM UTC-5, Donal K. Fellows wrote:

>
> The secondary [expr] calls won't, provided the expressions themselves
> are braced[*], as the bytecode sequence will be nearly identical. I
> still think it is good to remove them, just because they're unnecessary
> and a bit messy in how they look. :-)
>

Yes when first written, I was thinking along the lines of nested calls ala elisp. It definitely looks cleaner when the calls are combined.

Thanks for taking the time to respond
regards,
Stephen

Tom Poindexter

unread,

Jan 3, 2018, 11:49:51 PM1/3/18

to

In article <8374121d-41da-4188...@googlegroups.com>,

I too ran a slightly modified version with my TSP compiler, here are the
numbers I got:

$ time ../../jtcltsp.sh flux.tcl
2.71828182846
1.71828096932
989000 microseconds per iteration

real 0m3.083s
user 0m6.965s
sys 0m0.215s

This was with the Java backend (I didn't run the C backend, didn't have
the required Tcl 'parser' extension handy. The 989000 microseconds
(i.e., ~ 1.0 second) is the actual compute time, the real time of
~ 3.0 seconds includes the compile time.

TSP treats the variables as native ints and doubles, and produces a
near direct translation of the Tcl code into Java or C.

Here is my version. Biggest differences are in 'integrate', because
TSP only compiles expr commands that do not have nested command
evaluation; also added the type definitions for the procs and
variables that are not inferred; made 'explim' and 'step' as arguments
passed in from the main proc.

------------------------------------------------------------------------------
source ../../tsp.tcl

set explim 100
set step 0.000001

tsp::proc myexp {val explim} {
#tsp::procdef double -args double int
#tsp::int i

set sum 0.0
set fact 1.0
set x 1.0

for {set i 1} {$i < $explim} {incr i} {
set fact [expr {$fact * $i}]
set x [expr {$x * $val}]

set sum [expr {$sum + $x / $fact}]

}
return [expr {$sum + 1.0}]
}

tsp::proc integrate {min max step explim} {
#tsp::procdef double -args double double double int

set sum 0.0
while {$min < $max} {

set temp [myexp $min $explim]
set sum [expr {$sum + ($temp * $step) }]

set min [expr {$min + $step}]
}

return $sum
}

# uncomment for compiler logging
# tsp::printLog

proc main {} {
global explim step
puts [myexp 1.0 $explim]
puts [integrate 0.0 1.0 $step $explim]
}

puts [time main]

------------------------------------------------------------------------------

Tom

TSP: https://github.com/tpoindex/tsp/
http://wiki.tcl.tk/41633

--
Tom Poindexter
tpoi...@nyx.net

slks...@gmail.com

unread,

Jan 5, 2018, 4:46:10 PM1/5/18

to

On Wednesday, January 3, 2018 at 11:49:51 PM UTC-5, Tom Poindexter wrote:

>
>
> I too ran a slightly modified version with my TSP compiler, here are the
> numbers I got:
>
> $ time ../../jtcltsp.sh flux.tcl
> 2.71828182846
> 1.71828096932
> 989000 microseconds per iteration
>
> real 0m3.083s
> user 0m6.965s
> sys 0m0.215s
>
>
> This was with the Java backend (I didn't run the C backend, didn't have
> the required Tcl 'parser' extension handy. The 989000 microseconds
> (i.e., ~ 1.0 second) is the actual compute time, the real time of
> ~ 3.0 seconds includes the compile time.
>
> TSP treats the variables as native ints and doubles, and produces a
> near direct translation of the Tcl code into Java or C.

This is a interesting project that I'll look into. Didn't know it existed. Thanks Tom.

Andreas Kupries

unread,

Mar 28, 2018, 1:38:07 AM3/28/18

to

slks...@gmail.com writes:

>> normal PEMDAS rules so we don't need those nested [expr] calls either.
>>
>

> Brain cramp on my part. Haven't written any code in a while. Decided
> to take up javascript as a new years resolution to learn a new
> language. All those nested expr were not necessary yet they didn't
> seem to affect runtime too much.

With braces the expression can be compiled directly to math bytecodes.
At that point the nested expr's effectively go away on their own.

--
See you,
Andreas Kupries <akup...@shaw.ca>
<http://core.tcl.tk/akupries/>
Developer @ SUSE (MicroFocus Canada LLC)
<andreas...@suse.com>

EuroTcl 2018, Jul 7-8, Munich/DE, http://eurotcl.eu/
Tcl'2018, Oct 15-19, Houston, TX, USA. https://www.tcl.tk/community/tcl2018/
-------------------------------------------------------------------------------