What: generator 0.1
Where: http://www.cs.nott.ac.uk/~nem/tcl/
Man: http://www.cs.nott.ac.uk/~nem/tcl/generator.html
License: Tcl-style (see license.terms in distribution).
Impl: Pure Tcl 8.6
Generators are procedures that produce a series of values. Rather than
returning a list, a generator "yields" each value in turn and then
suspends until the next value is requested. This allows for easy
construction of on-demand producer/consumer designs, such as iterating
over the elements of a collection, handling rows of a database query
result, or decoupling the interface between components such as a parser
and a tokenizer.
The generator package provides a number of convenience routines for
creating and using generators, building on top of the coroutine
mechanism in Tcl 8.6. The core of the package consists of a single
ensemble command "generator" and three core sub-commands: define, yield,
and foreach. The "define" command is a replacement for [proc], but
defines a procedure that returns a fresh generator each time it is
called, "yield" is a replacement for [return], and "foreach" works
exactly like the built-in [foreach] but over generators rather than
lists. For example, to define a generator that yields each line in a
text file we can write:
generator define lines file {
set in [open $file]
# Ensure file is always closed when finished
generator finally close $in
while {[gets $in line]} { generator yield $line }
}
We can then use this in a loop as follows:
generator foreach line [lines /etc/passwd] {
puts [format "%4d| %s" [incr l] $line]
}
The loop works exactly like the built-in [foreach], fully supporting
break and continue (with resource cleanup), supporting iterating over
multiple generators and even extracting multiple results at once
("striding"):
generator foreach {a b} [lines /etc/passwd] \
{c d} [lines /etc/group] {
...
}
In addition to these basic facilities, the package also provides the
usual introspection and manipulation commands, as well as a host of
higher-level operations over generators such as functional maps, folds,
filters and zips. For more information and examples, see the manual page.
This package is very much an initial release and should be considered
somewhat experimental. I don't expect the interface of the core commands
(define, yield, foreach) to change, but other aspects may do so.
Comments, suggestions, bugs and general feedback all welcomed. I plan to
submit this to tcllib once I am happy with the API.
Cheers,
Neil
I think the general concept is helpful in Tcl, but it already exists
as far as I am concerned.
Looking at wikipedia's pages on generators, the useful language is:
"In Python, a generator can be thought of as an iterator that contains
a frozen stack frame. Whenever the iterator's next() method is called,
Python resumes the frozen frame, which executes normally until the
next yield statement is reached. The generator's frame is then frozen
again, and the yielded value is returned to the caller."
We have frozen stack frames in Tcl, they are called namespaces, I call
them named stack frames.
Of course not everything works in Tcl as in Python. For instance,
loops like foreach require a complete list of values, you can't
substitute a function which returns one value at a time. Here is a
basic generator with use:
def countfrom(n):
while True:
yield n
n += 1
for i in countfrom(10):
if i <= 20:
print(i)
else:
break
In tcl:
namespace eval ::countfrom {
variable i -1
variable step 1
}
proc ::countfrom::startFrom { {start 0} {incr 1}} {
variable step $incr
variable i [expr {$start - $step}]
}
proc ::countfrom::next { } {
variable i
variable step
return [incr i $step]
}
::countfrom::startFrom 10
while {1} {
if {[set i [::countfrom::next]] > 20} {
break
}
puts $i
}
I guess you could argue about the difference between named generators
and namespaces, but they are really just named stacks, with namespaces
slightly more general than generators.
You can also do the for loop like this:
for {set i [::countfrom::startFrom 10]} {$i <= 20 } {set i
[::countfrom::next]} {
puts $i
}
They're not quite the same thing, since coroutines are really
independent execution stacks (that's how they're implemented). One of
the main things about coroutines in Tcl is that you can stop them at
(almost[*]) any point - including in calls to other procedures - and
yield a value to the caller, while still allowing the code to resume
from where it had got to. Doing that with namespaces was... very
challenging before to say the least since you'd have also had to do
all the restructuring of the code into continuation form (that's done
behind the scenes in 8.6 for you).
We go one better than Python's generators too; [yield] isn't
restricted to the top level of the generator/coroutine, which makes
them a lot more powerful and easier to use. I don't know whether
Stackless Python eases that restriction, but that's still an
officially experimental version. (We have a reasonably good handle on
the performance of Tcl's coroutine implementation; it's been powering
the Wiki implementation for a while now...)
Donal.
[* Doesn't work in traces and a few other more obscure spots. Also not
inside [subst] right now, but we plan to have that fixed by 8.6.0. ]
In Tcl 8.6, you can do this:
proc countfrom n {
coroutine countfrom[incr ::countofcountfroms] {apply {n {
if {[yield [info coroutine]]} return
while {![yield [incr n]]} {}
}}} [incr n -1]
}
set c [countfrom 10]
while 1 {
if {[set i [$c false]] > 20} {
break
}
puts $i
}
$c true; # Clean up
OK, so that's not much different to your code. Except that you can
trivially create more copies of the counter, use them interleaved,
yield from inside procedures called inside or even from [uplevel]
calls inside procedures, etc. Doing the same with namespaces would be
just a bit more awkward. (Hah!)
About the main thing missing is automated garbage collection of
counters. That's (currently) a Tcl limitation.
> I guess you could argue about the difference between named generators
> and namespaces, but they are really just named stacks, with namespaces
> slightly more general than generators.
Namespaces are named stack frames. Coroutines are named stacks. (Tcl's
stacks are really trees because of [uplevel], but that's another
story.)
Donal.
Okay, so after looking at more of the generator API, the magic seems
to be in the more advanced code.
With simple examples (and I looked at some python examples too) the
thing I didn't understand is why the need to use a loop with a yield
in the middle. You have two loops, but I guess that is because they
are basically independent of each other.
Here is about as far as I could go with namespaces, which makes me
appreciate a the generator API:
# Note API definitions don't match new generators API:
namespace eval ::generator {
variable generators
}
proc ::generator::exists {generator} {
variable generators
return [info exists generators($generator)]
}
proc ::generator::init { generator {paramList {}} } {
variable generators
if {![exists $generator]} {
return ""
}
set params $generators($generator)
foreach paramSpec $params {
variable ${generator}::[lindex $paramSpec 0] [lindex $paramSpec 1]
}
foreach paramSpec $paramList {
variable ${generator}::[lindex $paramSpec 0] [lindex $paramSpec 1]
}
return $generator
}
# ParamList like proc args list, with defaults
proc ::generator::define {generator paramList body} {
variable generators
if {[exists $generator]} {
return $generator
}
set generators($generator) $paramList
namespace eval ${generator} { }
init $generator
set variableList [list]
foreach paramSpec $paramList {
lappend variableList [list variable [lindex $paramSpec 0]]
}
set variableHeader "\n[join $variableList "\n "]\n"
proc ${generator}::next { } $variableHeader$body
}
proc ::generator::next { generator } {
return [${generator}::next]
}
::generator::define myincr {{i 5}} {incr i}
puts "[::generator::next myincr]"
::generator::define listItem {{i 0} {list {a b c d e f}}} {
set next [lindex $list $i]
incr i
return $next
}
puts "[::generator::next listItem]"
puts "[::generator::next listItem]"
::generator::init listItem {{i 4}}
puts "[::generator::next listItem]"
::generator::define range {{n 0} {m 0}} {
if {$n <= $m} {
set current $n
incr n
return $current
} else {
return ""
}
}
::generator::init range {{n 4} {m 9}}
puts "[::generator::next range]"
puts "[::generator::next range]"
puts "[::generator::next range]"
::generator::define gridxy {{x 0} {y -1} {dim 9}} {
if {($x % $dim) == 0} {
incr y
}
set next [list $y [expr {$x % 9}]]
incr x
return $next
}
I didn't add a destroy feature, and I have no clue how some of the new
generator examples work:
proc + {a b} { expr {$a + $b} }
proc sum gen { generator reduce + 0 $gen }
puts [sum [range 1 10]] ;# => 55
How does this work? (I was expecting the last line should be [sum
{range 1 10}] ). It seems clear that a full generator API would not be
very efficient in pure Tcl, even if it was possible, due to the need
to manage the lifetime of generators and using multiple generators at
one time, but there is much to be appreciated here. I recommend
additional examples for the non-functional programmers.
Like I said, for non-functional, or non-python programmers, (or maybe
just me) one mystery is the need for "yield" and the internal loop. My
guess is that internally, yield is like a special form of "break" or
"continue", that kind of chops the loop in half and can then reenter
the second half without the initial setup. You might be able to do
that with namespace code, but it would be very inefficient and not as
reusable as generators appear to be.
Ah! Thanks, that's a very helpful description.
There's been some really neat things done with coroutines already,
like being able to write code that looks virtually like the old things
that got into trouble with nested [vwait]s, but which don't have the
problem at all. That's because it suspends the current coroutine
instead, only calling back into it once the event being waited for
occurs. That should make it far easier for users to write apps that
work in "container" environments (e.g. tclhttpd) in predictable and
sane ways.
Donal.
And with the generator package it becomes:
package require generator
generator define countfrom n {
while 1 { generator yield $n; incr n }
}
generator foreach i [countfrom 10] {
if {$i <= 20} {
puts $i
} else {
break
}
}
In other words, pretty much identical to the Python. As Donal says,
though, the Tcl coroutines (from which generators are built) allow
yielding from nested procedure calls. The [generator foreach] loop is
also much more powerful than Python's, supporting the full syntax of
Tcl's built-in [foreach] -- iterating over multiple generators at once,
and "striding" over multiple elements at a time. Python does allow that
with list comprehensions, I believe, but the result is then a list and
not a new generator.
> OK, so that's not much different to your code. Except that you can
> trivially create more copies of the counter, use them interleaved,
> yield from inside procedures called inside or even from [uplevel]
> calls inside procedures, etc. Doing the same with namespaces would be
> just a bit more awkward. (Hah!)
>
> About the main thing missing is automated garbage collection of
> counters. That's (currently) a Tcl limitation.
The generator package does its best to ensure that generators are
cleaned up automatically in most use-cases. It doesn't cover everything,
but for the majority of uses things should Just Work. A caveat is that
you have to be careful when sharing generators as some of the constructs
are a trifle eager in their collection strategies. To be safe,
generators should be used in a linear fashion -- exactly one consumer
for each producer. You can relax that if you know what you are doing.
>
>> I guess you could argue about the difference between named generators
>> and namespaces, but they are really just named stacks, with namespaces
>> slightly more general than generators.
>
> Namespaces are named stack frames. Coroutines are named stacks. (Tcl's
> stacks are really trees because of [uplevel], but that's another
> story.)
Good description. I might add that to the docs.
Cheers,
Neil
The [range] command creates a fresh coroutine which immediately yields,
and then returns it. This coroutine is the generator. An initial version
of the API did indeed use the [sum {range 1 10}] syntax, but I adjusted
it as the command syntax is easier to use in practice, especially when
nesting generators and using variables as arguments.
As to how it works, the definition of [generator reduce] might be
enlightening:
generator define reduce {f z xs} {
# Ensure the producer generator is cleaned up
generator finally generator destroy $xs
generator foreach x $xs {
set z [{*}$f $z $x]
}
return $z
}
This implements what is known as a "left-fold" operation. This is a very
general and useful function. See
http://www.cs.nott.ac.uk/~gmh/bib.html#fold for more information on its
generality (that paper refers to foldr -- the right-associative version
-- but the same applies to foldl/reduce). The reason for the "fold" name
is that we can think of it as like folding a binary operator in between
elements of a list. In the case of summing a list (or generator) of
numbers, consider this example:
foldl + 0 {1 2 3 4 5}
= ((((0 + 1) + 2) + 3) + 4) + 5
= 15
In tree form:
0 1
\ /
+ 2
\ /
+ 3
\ /
+
\ ...
Here you can see the "folding" effect better. See also http://foldl.com/
for an interactive animation of the effect (summing an infinite series
of 1s in that case). The foldr version is right-associative:
foldr + 0 {1 2 3 4 5}
= 1 + (2 + (3 + (4 + (5 + 0))))
= 15
The generator package provides both, but the foldl version is much more
efficient in Tcl as it can traverse the generator in-order, computing
the result as it goes. The foldr operation, in contrast, has to unravel
the entire generator into memory before it can begin the reduction. In
lazy languages like Haskell, foldr is much more efficient.
> It seems clear that a full generator API would not be
> very efficient in pure Tcl, even if it was possible, due to the need
> to manage the lifetime of generators and using multiple generators at
> one time, but there is much to be appreciated here. I recommend
> additional examples for the non-functional programmers.
I'll see about adding some more examples to a separate tutorial
document/wiki page. Note that the generator package is itself written in
pure Tcl, just requiring Tcl 8.6 for the coroutines. While the package
is reasonably efficient and allows some algorithms to be implemented in
a memory-efficient way, it is not exactly fast (good big-Oh, but fairly
large constant factors). The reason for this appears to be that the
overhead of a coroutine context-switch in Tcl currently is quite high.
E.g., [sum [range 1 1000000]] takes an awfully long time to compute
(20-30 seconds on my machine here). Profiling shows a few areas where I
can speed things up a little bit, but the biggest bottleneck is the
"takeList" operation (which is a primitive on which the rest is built),
and in that by far the longest operation is the "set item [$generator]"
which resumes the coroutine.
>
> Like I said, for non-functional, or non-python programmers, (or maybe
> just me) one mystery is the need for "yield" and the internal loop. My
> guess is that internally, yield is like a special form of "break" or
> "continue", that kind of chops the loop in half and can then reenter
> the second half without the initial setup. You might be able to do
> that with namespace code, but it would be very inefficient and not as
> reusable as generators appear to be.
You could do it by parsing the code in the body and breaking it up into
separate procs. I believe there is some code on the wiki that does
approximately this. It's very hard to do that in anything like as good a
way as the coroutine mechanism does.
-- Neil
YM: namespace import ::tcl::mathop::+
> proc sum gen { generator reduce + 0 $gen }
> puts [sum [range 1 10]] ;# => 55
DGP
Well, that's a starting point for describing them, and
might help when thinking about variables, but stack
frames don't get their own commands, while namespaces do.
DGP
This is essentially how Java iterators work. Generators don't add any
fundamental new feature here, as you could write things this way.
However, what they do add is much more convenience in defining and using
such iterators. For example, consider the following generator that loops
over the lines in a file:
generator define lines file {
set in [open $file]
generator finally close $in
while {[gets $in line] >= 0} {
generator yield $line
}
}
generator foreach line [lines /etc/passwd] {
puts [format "%4d| %s" [incr lineNum] $line]
}
Note that in all cases, including use of [break] or [continue] or an
error being thrown in the loop, both the generator and the underlying
channel will always be cleaned up. The equivalent as an iterator
namespace would be something like the following:
namespace eval lines {
namespace export from
namespace ensemble create
proc from {name file} {
set in [open $file]
set map [dict create hasNext [list ::lines::hasNext $in] \
next [list ::lines::next $in]]
namespace ensemble create -command ::$name -map $map
trace add command $name delete [list ::lines::cleanup $in]
return $name
}
proc cleanup {chan args} { close $chan }
proc hasNext chan { expr {![eof $chan]} }
proc next chan { gets $chan }
}
lines from it /etc/passwd
try {
for {set line [it next]} {[it hasNext]} {set line [it next]} {
puts [format "%4d| %s" [incr lineNum] $line]
}
} finally { rename it "" }
It's not that difficult to create, but it is more cumbersome. As you
move to more and more complicated iterations, the generator/coroutine
approach becomes ever more appealing. In particular, notice how little
code in the iterator approach is actually dealing with the iteration:
the rest is involved with creating commands and handling cleanup. Worse
still, some of this cleanup code has leaked into the loop where it is
used, leading to code duplication and a poor separation of concerns.
Previously, you could solve this with a custom control structure:
proc foreach-line {lineVar file script} {
upvar 1 $lineVar line
set in [open $file]
try {
while {[gets $in line] >= 0} {
try {
uplevel 1 $script
} on return {msg opts} - \
on error {msg opts} {
dict incr opts -level
return -options $opts $msg
}
}
} finally { close $in }
}
foreach-line line /etc/passwd {
puts [format "%4d| %s" [incr lineNum] $line]
}
Works well, but the definition of the control structure is still
littered with irrelevant error handling code. The generator package
solves both of these problems in a reuseable manner, leading to shorter,
clearer code on both sides of the fence: implementation and use. And
that's before you even start to consider some of the more advanced
features, like looping over multiple generators or extracting multiple
results per iteration.
-- Neil
That's true. I suppose it's better to say that namespaces are
subclasses of stack frames, or would be if they were described in an
OO sense. The subclassing adds naming, commands, and a bunch of other
things too. An even more accurate description would use a delegate
class that is the frame and which forwards most operations to the
namespace, but now we're getting in deep. :-)
Similarly, what I said about coroutines being named stacks is
incomplete too. The key missing part there relates how the coros are
connected together, which increases the complexity quite a lot...
Donal.
Yes, unfortunately. Stack-frame local commands would be great, although
impossible for lambdas. Per-frame interp aliases would be possible, in a
similar manner to an ensemble -map (http://wiki.tcl.tk/20829).
-- Neil