Best iterator interface design in golang

9,163 views
Skip to first unread message

Nyan Htoo Tin

unread,
Apr 27, 2011, 3:05:21 PM4/27/11
to golang-nuts
Hi I've been thinking how could I implement general iterator interface
in go.
I've seen iterator using channel. They look cool! but after reading
posts regarding issue with using channel as iterator.(like break will
cause go routine to hang around...).
So far, I come up with this..

func (p mylist) Iter() (next func() interface{} ,end func() bool) {
cursor := p.Front()

next = func() interface{} {

if cursor != nil {
v := cursor.Value
cursor = cursor.Next()
return v
}
panic("iterator exhausted")
}
end = func() bool {
if cursor == nil {
return true
}
return false
}

return
}

// usage
for next,end:=l.Iter(); !end(); {
fmt.Println( next() )
}

I'd like to know what approach would be better??
How could I use interface here?

Warm Regards,
Nyan

Anschel

unread,
Apr 27, 2011, 9:22:45 PM4/27/11
to golang-nuts
My knee-jerk reaction to this question is to ask whether you really
need an iterator. In most cases, you can get simpler, more efficient
code by focusing on the specific problem; if you give mylist a First()
and Next() method or the equivalent, that will probably be good enough.

Remi Gillig

unread,
Apr 28, 2011, 3:26:07 AM4/28/11
to golang-nuts
And why not?

for cursor:=list.Front(); cursor != nil; cursor =
cursor.Next() {
fmt.Println( cursor.Value )
}

I think that using the cursor directly seems more straightforward.

Cheers,
Remi Gillig.

Eric Ivancich

unread,
Apr 29, 2011, 5:34:55 PM4/29/11
to golang-nuts
On Apr 27, 9:22 pm, Anschel <ansche...@gmail.com> wrote:
> On Apr 27, 3:05 pm, Nyan Htoo Tin <nyanhtoo...@gmail.com> wrote:
> > Hi I've been thinking how could I implement general iterator interface
> > in go.
...
> > I'd like to know what approach would be better??
> > How could I use interface here?
>
> > Warm Regards,
> > Nyan
>
> My knee-jerk reaction to this question is to ask whether you really
> need an iterator. In most cases, you can get simpler, more efficient
> code by focusing on the specific problem; if you give mylist a First()
> and Next() method or the equivalent, that will probably be good enough.

Well there are two issues here that are closely related. The first
issue is HOW would you implement an iterator in Go, and the second is
WHY would you want to do so. I'm new to Go and still playing with it
to better understand it. But I'll post one solution to the HOW
question momentarily.

There are a couple answers to the WHY question. A shallow answer is
that it's a good exercise to better understand the language. A deeper
answer is that by adding a level of abstraction, you can write more
general code that can be re-used in a variety of circumstances.

Imagine you have a collection and you need to perform some operation
on each element of it. You could write that code to work with your
slice. And then you realize a linked list or a tree of some sort is a
better data structure. Now you have to re-write the code to perform
your operation on each element. And if you do a lot of these
collection-wide operations, that's a lot of code to rewrite.

On the other hand, if you had a generic iterator interface that
slices, linked lists, and trees could all provide, you're set. You
change your data structure of your collection, and the rest of your
code still works. I think that kind of flexibility is a clear win.

Eric Ivancich

unread,
Apr 29, 2011, 5:48:34 PM4/29/11
to golang-nuts
On Apr 27, 3:05 pm, Nyan Htoo Tin <nyanhtoo...@gmail.com> wrote:
> Hi I've been thinking how could I implement general iterator interface
> in go.
...
> I'd like to know what approach would be better??
> How could I use interface here?

Here is my solution. It defines both an Iterator interface and an
Iterable interface. I'd appreciate any feedback with respect to good
or bad practices and whether I'm following established Go patterns.

Thanks!

==== iterator.go ====

package iterator

type Iterator interface {
Next() interface{}
HasNext() bool
}

type Iterable interface {
GetIterator() Iterator
}

==== myslice.go ====

package myslice

import "./iterator"

type MySlice []interface{}

type sliceIterator struct {
slice *MySlice
index int
}

func (i *sliceIterator) Next() interface{} {
i.index++
return (*i.slice)[i.index - 1]
}

func (i *sliceIterator) HasNext() bool {
return i.index < len(*i.slice)
}

func (s *MySlice) GetIterator() iterator.Iterator {
return &sliceIterator{s, 0}
}

==== mylinkedlist.go ====


package mylinkedlist

import "./iterator"

type node struct {
value interface{}
next *node
}

type LinkedList struct {
head *node
}

type linkedListIterator struct {
cursor *node
}

func (i *linkedListIterator) HasNext() bool {
return i.cursor != nil
}

func (i *linkedListIterator) Next() interface{} {
result := i.cursor.value
i.cursor = i.cursor.next
return result
}

func NewList() *LinkedList {
return &LinkedList{nil}
}

func (l *LinkedList) Insert(value interface{}) {
n := node{value, l.head}
l.head = &n
}

func (l *LinkedList) GetIterator() iterator.Iterator {
return &linkedListIterator{l.head}
}

==== iterator-test.go ====

package main

import (
"fmt"
"./iterator"
"./myslice"
"./mylinkedlist"
)

func list(collection iterator.Iterable) {
iter := collection.GetIterator()
for iter.HasNext() {
fmt.Println(iter.Next())
}
}

func main() {
array := []interface{}{3, 1.0, "four", 1, 5, 9}
slice := myslice.MySlice(array[:])
list(&slice)

ll := mylinkedlist.NewList()
ll.Insert(9)
ll.Insert(5)
ll.Insert(1)
ll.Insert("four")
ll.Insert(1.0)
ll.Insert(3)
list(ll)
}

Eric Ivancich

unread,
Apr 29, 2011, 6:38:21 PM4/29/11
to golang-nuts
Another approach at generalizing these types of operations is to allow
the collections to apply a function to each element within. So in this
example I create an Eachable interace that means a collection provides
an Each function that will take as a parameter a function to apply to
all of the elements in turn.

Again, I'd appreciate feedback.

==== eachable.go ====

package eachable

type Eachable interface {
Each(func (interface{}))
}

==== myslice.go ====

package myslice

type MySlice []interface{}

func (s *MySlice) Each(f func(interface{})) {
for i := 0 ; i < len(*s) ; i ++ {
f((*s)[i])
}
}

==== mylinkedlist.go ====

package mylinkedlist

type node struct {
value interface{}
next *node
}

type LinkedList struct {
head *node
}

func NewList() *LinkedList {
return &LinkedList{nil}
}

func (l *LinkedList) Insert(value interface{}) {
n := node{value, l.head}
l.head = &n
}

func (l *LinkedList) Each(f func (interface{})) {
for cursor := l.head ; cursor != nil ; cursor = cursor.next {
f(cursor.value)
}
}

==== eachable-test.go ====

package main

import (
"fmt"
"./eachable"
"./myslice"
"./mylinkedlist"
)

func list(collection eachable.Eachable) {
count := 0
collection.Each(
func (v interface{}) {
count++;
fmt.Println(v);
})
fmt.Printf("%d items seen\n", count);
}

func average(collection eachable.Eachable) {
var count int = 0
var total float32 = 0.0
collection.Each(
func (v interface{}) {
count++;
switch v.(type) {
default:
count--;
case int:
total = total + float32(v.(int))
case float32:
total = total + v.(float32)
case float64:
total = total + float32(v.(float64))
}})

fmt.Printf("%d items seen\n", count);
fmt.Printf("The mean is %f\n", total / float32(count));
}


func main() {
array := []interface{}{3, 1.0, "four", 1, 5, 9}
slice := myslice.MySlice(array[:])
list(&slice)

ll := mylinkedlist.NewList()
ll.Insert(9)
ll.Insert(5)
ll.Insert(1)
ll.Insert("four")
ll.Insert(1.0)
ll.Insert(3)
list(ll)

average(&slice)
average(ll)
}

Job van der Zwan

unread,
Aug 24, 2012, 4:45:09 AM8/24/12
to golan...@googlegroups.com
On Thursday, 23 August 2012 19:49:31 UTC+2, Jamie McCrindle wrote:
- As a developer, I only have to remember 1 way of iterating through a data structure, as opposed to finding out case by case
- Best practice can be encapsulated in a single design
- One can design generalised code that only needs to know about an 'iterator'
Isn't the reason for having data structures (instead of a plain slice) the fact that you can read/write from them faster in specific use-cases, and wouldn't it make the code more understandable if that remained explicit?
 

Kyle Lemons

unread,
Aug 24, 2012, 12:00:35 PM8/24/12
to Jamie McCrindle, golan...@googlegroups.com
On Thu, Aug 23, 2012 at 10:49 AM, Jamie McCrindle <jamiemc...@gmail.com> wrote:
Hi,

The reason having iterators is handy is because:

- As a developer, I only have to remember 1 way of iterating through a data structure, as opposed to finding out case by case
That's the problem with iterators in a lot of languages.  It turns out that you can make many better decisions if you do know how to iterate through it, and understand the performance penalties (or at least characteristics) of doing so.
 
- Best practice can be encapsulated in a single design
There is no best practice that applies to all iteration strategies.
 
- One can design generalised code that only needs to know about an 'iterator'
This is a valid point, but less so in Go. It can usually be flipped around on its head in cases like sort.Interface, where it specifies the "algorithm" and you specify how to access the data.
 

regards,
Jamie.

Jamie McCrindle

unread,
Aug 24, 2012, 12:33:45 PM8/24/12
to Kyle Lemons, golan...@googlegroups.com
Hi Kyle,

Thanks for the response. 
 
- As a developer, I only have to remember 1 way of iterating through a data structure, as opposed to finding out case by case
That's the problem with iterators in a lot of languages.  It turns out that you can make many better decisions if you do know how to iterate through it, and understand the performance penalties (or at least characteristics) of doing so.

I agree with most of this. And certainly if making something fit a pattern inhibits understanding or makes things slower, then it's inadvisable. That said, where things do look and behave the same way (e.g. Reader, whether they backing off onto sockets, files, etc.) and don't hide the performance characteristics or inhibit understanding then I think that making them available through a single interface is useful. I think what we disagree on is whether there are cases where a general purpose iterator would be useful. I believe there might be.

- Best practice can be encapsulated in a single design
There is no best practice that applies to all iteration strategies.

Fair enough. I think I didn't make my point as well as I would have liked. The reason I said this is because I was looking for iterators in go, I found a lot of people trying to write iterators, usually with channels, only to find that their strategy was inadvisable. This may have been better stated as: if an iterator makes sense for a data structure and there's an iterator interface, I believe it would encourage the data structure designers to provide iterator implementations for their data structures that performed well.

- One can design generalised code that only needs to know about an 'iterator'
This is a valid point, but less so in Go. It can usually be flipped around on its head in cases like sort.Interface, where it specifies the "algorithm" and you specify how to access the data.

Conversely, there are interfaces like Reader which do provide a consistent interface so that algorithms don't have to care whether they're reading from sockets, files or any other byte streams.

And now I'll argue the other side:

- Iterators (generators) are awesome in languages like python and c# because of the 'yield' keyword, this is no doubt why a lot of folk try and make iterators using channels. Writing iterators in Java is unpleasant.

- I'm a golang novice, so, for anyone reading this, it's quite likely I'm wrong. If get to spend more time with go and I learn the error of my ways, I'll come back and update my response.

regards,
j.

Kamil Kisiel

unread,
Aug 24, 2012, 1:11:20 PM8/24/12
to golan...@googlegroups.com
As both a Python and Go programmer, I've found that channels pretty much replace the role of generators.

Whereas in Python you could use:

def f():
    for i in range(5):
        yield i

for j in f():
    print j

In Go you can achieve effectively the same thing with:

package main

import "fmt"

func f() (c chan int) {
c = make(chan int)
go func() {
for i := 0; i < 5; i++ {
c <- i
}
close(c)
}()
return
}

func main() {
for i := range f() {
fmt.Println(i)
}
}

Sure the Go example is longer, but the goroutine and channel constructs are also more powerful and flexible than what's available with Python generators. I don't think there'd be anything gained by adding the concept of generators and/or a yield type construct to the language when in effect it's already there. You effectively have an iterator "protocol" with channels if you want, just create an interface like:

type IntGenerator interface {
GenerateInts() (c chan int)
}

type A struct{}

func (a A) GenerateInts() chan int {
return f()
}

func IntConsumer(g IntGenerator) {
for i := range g.GenerateInts() {
fmt.Println(i)
}
}

So personally I don't think there's anything missing in this regard from the Go language, I think everything you want to do is possible in your programs. It's a little bit more verbose but that's because it's more explicit and there's less magic going on behind the scenes.

http://play.golang.org/p/gUFyAR8NGz if you want to try playing with the code.

Kyle Lemons

unread,
Aug 24, 2012, 1:42:55 PM8/24/12
to Kamil Kisiel, golan...@googlegroups.com
The problem with using a goroutine as a generator is that if you abandon it, it will not get garbage collected.  You could try to set a timeout (say, 30s) in your generator, and if it takes longer than that to run it just stops and returns, but it's not really a great solution.  You can provide a mechanism to stop the generator, but that requires just as much discipline to use as would requiring clients to always consume every value.

Maxim Khitrov

unread,
Aug 24, 2012, 1:57:24 PM8/24/12
to Kamil Kisiel, golan...@googlegroups.com
On Fri, Aug 24, 2012 at 1:11 PM, Kamil Kisiel <kamil....@gmail.com> wrote:
> As both a Python and Go programmer, I've found that channels pretty much
> replace the role of generators.
>
> Whereas in Python you could use:
>
> def f():
> for i in range(5):
> yield i
>
> for j in f():
> print j
>
> In Go you can achieve effectively the same thing with:
>
> package main
>
> import "fmt"
>
> func f() (c chan int) {
> c = make(chan int)
> go func() {
> for i := 0; i < 5; i++ {
> c <- i
> }
> close(c)
> }()
> return
> }
>
> func main() {
> for i := range f() {
> fmt.Println(i)
> }
> }

Careful, you end up with a memory leak if you terminate the for loop
early. The example becomes a lot less appealing when you have to add
extra machinery to support breaking out of the for loop without
leaving a deadlocked goroutine behind.

The performance is also poor. 1000000 iterations of your python code
(minus the print statement) executes in 81 ms on my machine, while the
same Go code takes 186 ms. This goes up to 365 ms if the channel is
buffered. I imagine that much of this comes from the extra
synchronization operations.

- Max

Kamil Kisiel

unread,
Aug 24, 2012, 2:04:36 PM8/24/12
to golan...@googlegroups.com
Agree with both you and Kyle's post earlier about the earlier termination, it's definitely a problem for the general case of iteration. The pattern still works for the common cases of infinite generators that run for the lifetime of a program or for cases where you will always consume the entire generator.

As far as performance goes, I'm not sure how much you can read in to that as the example is extremely synthetic. I don't think you'd be creating goroutines that do nothing else other than yield 5 integer values 1000000 times..

Martin Bruse

unread,
Aug 24, 2012, 3:03:05 PM8/24/12
to golan...@googlegroups.com
I have been reading this thread now, and nobody seems to even mention the solution I tend to use:

---------8<------------
type IteratorFunc func(val Value)

func (self *DataStructure) Iterate(fun IteratorFunc) {
... walk somehow through my structure ...
      fun(val)
...
}

...

var sum []Value
myStructure.Iterate(func(val Value) {
  sum = append(sum, val)
})
-----------8<---------------

Is this a bad pattern, or is it just that people find it ugly?

regards,
Martin

Rémy Oudompheng

unread,
Aug 24, 2012, 3:16:14 PM8/24/12
to Martin Bruse, golan...@googlegroups.com
On 2012/8/24 Martin Bruse <zond...@gmail.com> wrote:
> var sum []Value
> myStructure.Iterate(func(val Value) {
> sum = append(sum, val)
> })
> -----------8<---------------
>
> Is this a bad pattern, or is it just that people find it ugly?
>
> regards,
> Martin

I almost always use that pattern.

Rémy.

Dan Kortschak

unread,
Aug 24, 2012, 6:17:16 PM8/24/12
to Martin Bruse, golan...@googlegroups.com
When I was just starting, I used the chan iteratot approach (the novelty seems to make this attractive to novices from my own experience and from seeing how others use this often when they start out).

Now I use passed functions, the expressiveness is much greater, as is the control if the iterator/closure function are engineered properly, which is not hard.

Dan

Jonathan Amsterdam

unread,
Sep 2, 2012, 11:37:51 PM9/2/12
to golan...@googlegroups.com, Martin Bruse
There are two problems with function-passing iteration (sometimes called internal iteration).

One issue is early termination. Your passed function has to return a bool to indicate that, but that means functions that don't care about early termination are cluttered by a bool.

Much worse is the problem of parallel iteration, meaning iterating over two or more data structures at the same time. E.g. say I want to do element-by-element comparison of two data structures. There doesn't seem to be a clean way to do it with internal iteration, but it is quite straightforward with external iteration.

It is true that an internal iterator can use recursion, which makes it much easier to write for recursive data structures.

Kyle Lemons

unread,
Sep 4, 2012, 2:41:08 PM9/4/12
to Jonathan Amsterdam, golan...@googlegroups.com, Martin Bruse
On Sun, Sep 2, 2012 at 8:37 PM, Jonathan Amsterdam <jbams...@gmail.com> wrote:
There are two problems with function-passing iteration (sometimes called internal iteration).

One issue is early termination. Your passed function has to return a bool to indicate that, but that means functions that don't care about early termination are cluttered by a bool.

Much worse is the problem of parallel iteration, meaning iterating over two or more data structures at the same time. E.g. say I want to do element-by-element comparison of two data structures. There doesn't seem to be a clean way to do it with internal iteration, but it is quite straightforward with external iteration.

While not practical or correct in every circumstance, the "tree comparison" example from the old playground (not sure where it is now) is one way to do it.  You'd run both internal iterations in goroutines given a closure that sends values on a channel (and returning the "stop iterating" sentinel when a "done" channel is closed).

Nigel Tao

unread,
Sep 4, 2012, 8:34:58 PM9/4/12
to Kyle Lemons, Jonathan Amsterdam, golan...@googlegroups.com, Martin Bruse
On 5 September 2012 04:41, Kyle Lemons <kev...@google.com> wrote:
> While not practical or correct in every circumstance, the "tree comparison"
> example from the old playground (not sure where it is now)

It's linked from the drop-down box on the home page, or you can go
straight to http://play.golang.org/p/_gSUYh0jHV

It's also in the source, as $GOROOT/doc/play/tree.go

wksch...@gmail.com

unread,
Sep 15, 2013, 12:51:00 AM9/15/13
to golan...@googlegroups.com
First time posting, so I hope I'm in the right place.

I sat down to design an API for my next project, which requires a couple of iterators over complex data types inside several inner loops. Speed is a core concern in my project. I was concerned about the idea of creating new channels or slices all the time -- channels have internal locks and slices require copying out all the data I'm iterating over all at once, which I anticipate would be memory inefficient in my application. Passing through a callback seems like it could be performant (seems like Ruby blocks, no?) but it's less flexible than an iterator. In any case, there's no replacement for benchmarking and experimenting. As a first step to allow me to begin drafting different versions, I made an iterator API. Here's what I came up with: https://github.com/wkschwartz/iter

Would love feedback.

andrey mirtchovski

unread,
Sep 15, 2013, 2:58:52 AM9/15/13
to wksch...@gmail.com, golang-nuts
> slices require copying out all the data I'm iterating over all at once

this is a red flag. slices are very cheap to create and use. also,
looking at the same sentence, one (generally) does not create and
destroy channels all the time just to accomplish a simple operation
such as iteration.

can you please describe the problem outside of the solution you're
thinking about?

Billy Schwartz

unread,
Sep 15, 2013, 8:35:16 AM9/15/13
to andrey mirtchovski, golang-nuts
I have several related operations in mind, but let's just take the example of graph (as in graph theory, not bar charts) processing. I'll use it to summarize the discussion in the foregoing about approaches to iteration as I currently understand it. I'm interested to hear your thoughts on what I've got right and what I've got wrong.

Let's say I have a graph data type, implemented with the adjacency-set representation described in Algorithms 4 by Sedgewick and Wayne, like http://algs4.cs.princeton.edu/41undirected/Graph.java.html but with sets replacing bags because I want to guarantee there are no self loops or parallel edges.

The simplest way to represent sets of nodes in a Go graph implementation is with an object s of type map[uint]bool. This allows convenient iteration with `for element, _ := range s`, but the usual problems with exposing the implementation of the graph data type apply here: I'd like to be able to change how I implement set, and I don't want to allow the caller to mutate the graph except through the graph's methods. One reason I'd like general code is so I can benchmark to test the hypotheses I'm making below.

If the Adj method returns a new slice containing a list of the nodes adjacent to a given node -- let's say there are a thousand of them -- Adj must first allocate just over 4 or 8 kB of memory, fill it with zeros (as Go guarantees zeroed slices), then fill it with the nodes. THEN the calling code has to go back to the beginning of the slice and iterate over it a third time. I can relieve pressure on the garbage collector by keeping a free pool of slices, but that still means iterating over the adjacent nodes twice instead of once. (Assume speed and memory usage matter to me, so let's skip the debate about whether I'm prematurely optimizing.)

Using channels (see https://groups.google.com/d/msg/golang-nuts/v6m86sTRbqA/97YvCLM-kSQJ) with small buffers instead of slices large enough to hold all the neighbor nodes at least solves the double-iteration problem. However, this technique involves overheads that seems superfluous for my application as a general matter.

Another possibility is passing as a closure the code I would have put in the for block iterating over the nodes in a language with iterators like Java or Python, as contemplated in https://groups.google.com/d/msg/golang-nuts/v6m86sTRbqA/J8WvG76bGCgJ. This is like the visitor pattern, or blocks in Ruby. I think this approach is neat, but frankly I think callbacks in the middle of a recursive graph algorithm would confuse the hell out of me when trying to debug it.

This leaves me with the idea of explicit iterators, particularly with a uniform interface so that I can swap out which implementation of set underlies the graph implementation.

That said, I'm not actually sure I would know how to iterate over a map[uint]bool with an iterator structure, since you can't really stop and start a `for item, _ := range map[uint]bool{...}` between calls to an Iterator.Next method.


--
Billy Schwartz
cell: 614-578-8007

Steven Blenkinsop

unread,
Sep 15, 2013, 1:46:55 PM9/15/13
to wksch...@gmail.com, andrey mirtchovski, golang-nuts
On Sunday, September 15, 2013, Billy Schwartz wrote:

That said, I'm not actually sure I would know how to iterate over a map[uint]bool with an iterator structure, since you can't really stop and start a `for item, _ := range map[uint]bool{...}` between calls to an Iterator.Next method.

Since maps are unordered, an internal iterator should be fine. Iterating over two data structures simultaneously isn't very useful when the order of the elements is arbitrary.

andrey mirtchovski

unread,
Sep 15, 2013, 2:14:09 PM9/15/13
to wksch...@gmail.com, golang-nuts
> The simplest way to represent sets of nodes in a Go graph implementation is
> with an object s of type map[uint]bool. This allows convenient iteration with
> `for element, _ := range s`

if you're never going to have a false value then map[uint]struct{} is
better, since struct{} consumes no memory. but again, that's just an
implementation detail.

> If the Adj method returns a new slice containing a list of the nodes
> adjacent to a given node -- let's say there are a thousand of them -- Adj
> must first allocate just over 4 or 8 kB of memory, fill it with zeros (as Go
> guarantees zeroed slices), then fill it with the nodes. THEN the calling
> code has to go back to the beginning of the slice and iterate over it a
> third time.

that's not how slices work. slices are 3-word reference structures
that are cheap to conjure and pass around. see here:
http://research.swtch.com/godata. you pay the price for allocation
when you build the structure, not when you're accessing it afterwards.

typically, i'd say the advice you're most likely to get around here is
to design the interface (some small subset of setters, getters, Adj
and friends, including Next if you want) and hide the implementation
as non-exported data structures satisfying that interface -- trees,
flat arrays, maps, or a combination thereof.
http://golang.org/pkg/container has examples everywhere you see
"contains unexported fields" ;)

another example of exposing functionality and not implementation is in
sort: http://golang.org/pkg/sort/#Interface. there's a success story
by Andrew Gerrand about substituting the underlying sort algorithm
used by the package without breaking any of the code that satisfies
it. unfortunately I can't find it.
Reply all
Reply to author
Forward
0 new messages