Playing with Rowland's prime generator

Alexandre Ferrieux

unread,

Oct 10, 2008, 6:36:41 PM10/10/08

to

Hello,
Since Tcl now has arbitrary-precision integers, I wondered how
convincing it could prove with simple number-theoretic algorithms.
One that has shown up recently is the awesome Rowland series

http://www.maa.org/mathtourist/mathtourist_8_8_08.html

which is worth a few lines of (any language, including Tcl):

# first define GCD
proc gcd {a b} {
while {1} {
if {$a==$b} {return $a}
if {$a<$b} {set t $a;set a $b;set b $t}
if {!$b} {return $a}
set a [expr {$a % $b}]
}
}
# Rowland's: compute a(n)-a(n-1) where a(1)=7 and
# a(n) = a(n – 1) + gcd(n, a(n – 1))
set a 7
set n 1
while {1} {
incr n
set b $a
set a [expr {$b+[gcd $n $b]}]
set p [expr {$a-$b}]
if {$p>1} {puts "$p\t\t$n"}
}

Nothing to add. Just beautiful.

-Alex

suchenwi

unread,

Oct 15, 2008, 8:57:54 AM10/15/08

to

On 11 Okt., 00:36, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> # first define GCD
> proc gcd {a b} {
> while {1} {
> if {$a==$b} {return $a}
> if {$a<$b} {set t $a;set a $b;set b $t}
> if {!$b} {return $a}
> set a [expr {$a % $b}]
> }
> }

...

> Nothing to add. Just beautiful.

As an alternative for gcd, recursive but more beautiful:

proc tcl::mathfunc::gcd {a b} {expr {$b == 0? $a: gcd($b,$a % $b)}}

Alexandre Ferrieux

unread,

Oct 15, 2008, 4:15:49 PM10/15/08

to

On Oct 15, 2:57 pm, suchenwi <richard.suchenwirth-

Yes. This by the way reminds me I could remove the $a==$b line in my
iterative version ;-)
As for the tail recursion, please notice that even with [tailcall] the
speed will stay a bit behind the iterative style, because of the
overhead of resetting locals. Changing this will involve an aggressive
bytecode optimization step...

-Alex

leszek.ho...@gmail.com

unread,

Oct 18, 2008, 3:43:32 PM10/18/08

to

On Oct 11, 12:36 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
...

> Nothing to add. Just beautiful.

Something to subtract? I'd change the body of the main loop to

set p [gcd [incr n] $a]
incr a $p

if {$p>1} {puts "$p\t\t$n"}

Shorter and faster (saving two [expr] in expense of one [incr]).

--Leszek

Donal K. Fellows

unread,

Oct 18, 2008, 4:30:53 PM10/18/08

to

Richard Suchenwirth wrote:
> As an alternative for gcd, recursive but more beautiful:
>
> proc tcl::mathfunc::gcd {a b} {expr {$b == 0? $a: gcd($b,$a % $b)}}

This is faster, and elegant in a different way:

proc tcl::mathfunc::gcd {a b} {

while {$b} {set b [expr {$a % [set a $b]}]}
return $a
}

(The elegance? You try getting it to generate fewer bytecodes while
still being iterative.)

Donal.

Alexandre Ferrieux

unread,

Oct 18, 2008, 5:06:27 PM10/18/08

to

On Oct 18, 10:30 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

Wow !
I stand delighted.

-Alex

Alexandre Ferrieux

unread,

Oct 18, 2008, 5:13:37 PM10/18/08

to

Thanks. Though I'm not sure the speed improvement will be noticeable
while we're GCDing big numbers ;-)

Moreover, please notice that Rowland's series is not a serious
contender in efficiency: $p<=$n. So even a naive implementation of
Eratosthenes' sieve should outperform it -- but not in so few lines of
code: its merit is simplicity (which is not a minor achievement if you
look at the rest of the field).

-Alex

leszek.ho...@gmail.com

unread,

Oct 18, 2008, 7:19:03 PM10/18/08

to

On Oct 18, 10:30 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

> This is faster, and elegant in a different way:
>
> proc tcl::mathfunc::gcd {a b} {
> while {$b} {set b [expr {$a % [set a $b]}]}
> return $a
> }
>
> (The elegance? You try getting it to generate fewer bytecodes while
> still being iterative.)

I wonder if the following (and ugly) version saves an extra bytecode?

proc gcd {a b} {
# assume $b!=0 (which is the case for Rowland's sequence)
while {[set b [expr {$a % [set a $b]}]]} {}
return $a
}

--Leszek

Erik Leunissen

unread,

Oct 19, 2008, 10:44:47 AM10/19/08

to

Donal K. Fellows wrote:
>
> This is faster, and elegant in a different way:
>
> proc tcl::mathfunc::gcd {a b} {
> while {$b} {set b [expr {$a % [set a $b]}]}
> return $a
> }
>
> (The elegance? You try getting it to generate fewer bytecodes while
> still being iterative.)
>

Lately, several threads go into execution efficiency of Tcl code. I
would very much like to get an appreciation of the efficiency of various
examples like the one above. However, I lack the insight of how Tcl code
maps to byte codes (given a certain algorithm or command sequence, like
the ones above).

If that knowledge has been described somewhere, I'd very much appreciate
a pointer. Otherwise, would that knowledge be sufficiently concise that
its rules can be described/listed here?

Thanks for your response,

Erik
====

> Donal.

--
leunissen@ nl | Merge the left part of these two lines into one,
e. hccnet. | respecting a character's position in a line.

Alexandre Ferrieux

unread,

Oct 19, 2008, 11:59:39 AM10/19/08

to

On Oct 19, 4:44 pm, Erik Leunissen <l...@the.footer.invalid> wrote:
>
> Lately, several threads go into execution efficiency of Tcl code. I
> would very much like to get an appreciation of the efficiency of various
> examples like the one above. However, I lack the insight of how Tcl code
> maps to byte codes (given a certain algorithm or command sequence, like
> the ones above).

Due to pressure from the crowd, Donal gave the following to
earthlings:

::tcl::unsupported::disassemble

I like to use 8.5's [namespace path] and just type 'dis' in an
interactive interpreter instead ;-)

> If that knowledge has been described somewhere, I'd very much appreciate
> a pointer. Otherwise, would that knowledge be sufficiently concise that
> its rules can be described/listed here?

First, you'll learn a great deal just by observing the output of the
above. You'll quickly guess about compiled locals and literals, and
also you'll very soon know the difference between something that's
_inlined_ (like a 'for' or an 'if') and the rest of vanilla procs,
which just get invoked in a generic way ("invokeStk*" opcodes being
the rough equivalent of a 'call' in assembler).

One (good?) reason for not spending too much energy in documenting all
this is to keep the freedom to change all this on short notice, which
is still possible in the absence of too much serialized bytecode (tbc
files are an ActiveState specialty).

Anyway, with the disassembler above it is rather straightforward to
analyse the bytecode impact of idioms (see the recent thread on the
speed of user-provided iterators). Actually changing the behavior
involves writing or modifying a compileProc, which does the context-
specific bytecode generation. This is obviously not easy to do without
an in-depth knowledge of existing opcodes...

-Alex

Donal K. Fellows

unread,

Oct 19, 2008, 12:57:46 PM10/19/08

to

leszek.ho...@gmail.com wrote:
> I wonder if the following (and ugly) version saves an extra bytecode?
>
> proc gcd {a b} {
> # assume $b!=0 (which is the case for Rowland's sequence)
> while {[set b [expr {$a % [set a $b]}]]} {}
> return $a
> }

Alas, no. I tried that and instead it does inelegant stuff with the
empty loop body which lead to no saving at all. Potential for more
optimizations there I suppose (though I'm not too worried about working
on them...)

Good attempt though. :-)

Donal.

Donal K. Fellows

unread,

Oct 19, 2008, 1:29:17 PM10/19/08

to

Alexandre Ferrieux wrote:
> This is obviously not easy to do without
> an in-depth knowledge of existing opcodes...

As someone who has written a few of the compileProcs, it's not easy to
do. Period. Indeed, it's probably far *too* difficult right now...

Donal.

Erik Leunissen

unread,

Oct 19, 2008, 3:06:21 PM10/19/08

to

Alexandre Ferrieux wrote:
>
> Due to pressure from the crowd, Donal gave the following to
> earthlings:
>
> ::tcl::unsupported::disassemble

> ...

Thanks for pointing the way,

Erik

Donal K. Fellows

unread,

Oct 20, 2008, 5:07:51 AM10/20/08

to

Erik Leunissen wrote:

> Alexandre Ferrieux wrote:
> > ::tcl::unsupported::disassemble
> > ...
>
> Thanks for pointing the way,
>

Note that the disassembler is *really* unsupported — we can change the
format of the output without warning, add bytecodes, or even remove
the command completely; it even doesn't really preserve the Tcl_Obj
rules properly — but is a neat debugging tool. It sure beats what we
were doing before (special debugging builds, etc.)

On the other hand, the generated bytecode isn't the whole story.
There's a whole level of runtime peephole optimizations in play (RTFS
to learn about them) and not all bytecodes are anything like the same
cost. To predict performance, you must also correctly understand the
Tcl_Obj reference counting and copying systems. There's a lot of
detail that matters...

It's usually easier to use [time] on real code. :-)

Donal.

Erik Leunissen

unread,

Oct 20, 2008, 8:10:02 AM10/20/08

to

Donal K. Fellows wrote:
> ...

> On the other hand, the generated bytecode isn't the whole story.
> There's a whole level of runtime peephole optimizations in play (RTFS
> to learn about them) and not all bytecodes are anything like the same
> cost. To predict performance, you must also correctly understand the
> Tcl_Obj reference counting and copying systems. There's a lot of
> detail that matters...
>

Thanks for completing the story. It makes me realize there is more to it
than I had hoped for.

> It's usually easier to use [time] on real code. :-)
>

Yeah, if only I could be satisfied with a performance measurement after
the fact ...

Greetings,

Erik.

> Donal.

Alexandre Ferrieux

unread,

Oct 20, 2008, 9:52:56 AM10/20/08

to

On Oct 20, 2:10 pm, Erik Leunissen <l...@the.footer.invalid> wrote:
>
> > It's usually easier to use [time] on real code. :-)
>
> Yeah, if only I could be satisfied with a performance measurement after
> the fact ...

Maybe you're being a bit too strict here ;-)
I agreed that in DSP circles, where each and every machine instruction
(or virtual machine instruction like "basic ops") has a perfectly
known timing, and where automatic static analysis tools can spit out
the exact MIPs consumption of an algorithm (provided it hasn't too
many conditional computations), this request makes sense.

But here in the "Tcl VM" we're operating a few kilometers above, with
syscalls, allocations and shimmering all under the hood of a single
"opcode".
So you won't have [tcl2mips] any time soon :-(
But I don't feel this as a strong limitation: "performace measurement
after the fact" + disassembling together are enormously more powerful
than either alone.
Indeed, you can quickly focus on "critical pairs" where two variants
differ by only a handful of opcodes. In this case the problem becomes
"human-size" again: RTFS on the specific opcode of interest (*not* on
the 90+ others), or just guess about unwanted shimmering or COW (these
are usual suspects)...

In short: if you're interested in that area, don't be shy, disassemble
and look: you'll be glad you did !

-Alex

Erik Leunissen

unread,

Oct 20, 2008, 12:29:16 PM10/20/08

to

Alexandre Ferrieux wrote:
>
> Maybe you're being a bit too strict here ;-)

> ...

> Indeed, you can quickly focus on "critical pairs" where two variants
> differ by only a handful of opcodes. In this case the problem becomes
> "human-size" again: RTFS on the specific opcode of interest (*not* on
> the 90+ others), or just guess about unwanted shimmering or COW (these
> are usual suspects)...
>

Very useful and specific directions; much appreciated!
Saves a lot of time otherwise spent in beating around the bush.

> In short: if you're interested in that area, don't be shy, disassemble
> and look: you'll be glad you did !
>

That shifted the balance from slight despair into hope.
Thanks for the incentive,

Erik

> -Alex