I needed to do this the other day, but I was too embarassed to ask. :(
--
Thomas G. Willis
http://paperbackmusic.net
Generators - I did not know about them. I had a look at this -
http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.h
tml
It does not look like this is a good think to do at all, because all the
state-management that itertors do for free is lost at the cost of
creating another continuation around it. Admittedly this solves the
syntactic problem that I was looking for a solution to, but at a very
large cost to the runtime.
I found a terse description of zip() at
http://whytheluckystiff.net/articles/rubyOneEightOh.html
Is there a better place I can read up about this? I want to know how
this works.
In my post, I chose the array.each only as an example. Its not arrays
for which I want to do this with - I want to solve the problem for
iterator calls. If zip() internally creates a list of values from both
iterations, then it does not help me. I want to be able to do the actual
computation of the iterator calls in lock-step.
Roshan
Hello Roshan,
you can use the Generator class, to wrap any iterator into a generator
(i.e. get a .next function)
Or in simple cases where the additional memory overhead is not
important, you can also use the zip funtion. I.e.
%w(eins zwei drei).zip([1,2,3]).each do | name, value |
puts "The value of #{name} is #{value}"
end
Hm...
class Foo
include Enumerable
def initialize(i)
@enum = 1..i
end
def each(&b) @enum.each(&b) end
end
>> f1=Foo.new 10
=> #<Foo:0x10177298 @enum=1..10>
>> f2=Foo.new 10
=> #<Foo:0x10172498 @enum=1..10>
>> f1.zip(f2) {|*a| p a}
TypeError: cannot convert Foo into Array
from (irb):10:in `zip'
from (irb):10
class Foo
include Enumerable
def initialize(i)
@enum = 1..i
end
def each(&b) @enum.each(&b) end
alias :to_ary :to_a
end
>> f1=Foo.new 10
=> #<Foo:0x101a0080 @enum=1..10>
>> f2=Foo.new 10
=> #<Foo:0x1019a0c8 @enum=1..10>
>> f1.zip(f2) {|*a| p a}
[[1, 1]]
[[2, 2]]
[[3, 3]]
[[4, 4]]
[[5, 5]]
[[6, 6]]
[[7, 7]]
[[8, 8]]
[[9, 9]]
[[10, 10]]
=> nil
Fixed but at the cost of one enum converted to an array. I guess the
generator approach is the only feasible solution for the general case.
Regards
robert
Um that would be, 1000. Not ten. 1000.
In the following experiments I see factors of anywhere between 200 and
950, so three orders of magnitude seems reasonable.
ruby -rgenerator -rbenchmark -e 'Benchmark.bm(7) do |b|
(14 .. 19).each do |level|
size = 2 ** level
a = (0 ... size).to_a
g = Generator.new a
b.report("arr #{level}") { a.each { |x| x } }
b.report("gen #{level}") { while g.next?; g.next; end }
end
end'
arr 14 0.010000 0.000000 0.010000 ( 0.007526)
gen 14 2.280000 0.600000 2.880000 ( 2.912238)
arr 15 0.010000 0.000000 0.010000 ( 0.017132)
gen 15 5.270000 0.030000 5.300000 ( 5.302015)
arr 16 0.040000 0.000000 0.040000 ( 0.030965)
gen 16 11.520000 0.660000 12.180000 ( 12.204337)
arr 17 0.040000 0.000000 0.040000 ( 0.044533)
gen 17 25.030000 3.370000 28.400000 ( 28.563181)
arr 18 0.070000 0.000000 0.070000 ( 0.070832)
gen 18 61.730000 0.500000 62.230000 ( 62.321825)
arr 19 0.150000 0.000000 0.150000 ( 0.151578)
gen 19 142.200000 10.050000 152.250000 (159.363150)
Looks kinda super-linear. Anyways, just entertaining myself.
--
William <wmorgan-...@masanjin.net>
I'm not sure what you mean by that, but:
> Admittedly this solves the syntactic problem that I was looking for a
> solution to, but at a very large cost to the runtime.
There's a definite cost to using generators. Of course, you've already
incurred a huge performance penalty by using Ruby in the first place. Is
the further cost of generators really too much?
I see about a factor of ten slow-down in the following toy example.
It'd be interesting to do more datapoints and see if this is a linear
relationship or what.
ruby -rgenerator -rbenchmark -e 'a=(1..100000).map{|x|x};g=Generator.new(a);[
lambda { a.each { |x| x } },
lambda { while g.next?; g.next; end },
].each {|pr| puts Benchmark.measure(&pr) }'
0.030000 0.000000 0.030000 ( 0.027388)
10.380000 1.190000 11.570000 ( 11.612888)
> I found a terse description of zip() at
> http://whytheluckystiff.net/articles/rubyOneEightOh.html
> Is there a better place I can read up about this? I want to know how
> this works.
Unfortunately not, other than the Ruby source code itself.
> In my post, I chose the array.each only as an example. Its not arrays
> for which I want to do this with - I want to solve the problem for
> iterator calls. If zip() internally creates a list of values from both
> iterations, then it does not help me. I want to be able to do the actual
> computation of the iterator calls in lock-step.
Enum#zip internally turns everything into arrays.
Unless you can deal with the original objects as external (i.e.
Python-style) iterators, this is the price you pay for the niceness of
interal iterators.
--
William <wmorgan-...@masanjin.net>
Warning: total Ruby newby naive approach:
>> a = (1..4).to_a
=> [1, 2, 3, 4]
>> b = (5..8).to_a
=> [5, 6, 7, 8]
>> a.each_with_index {|av, ai| p [av, b[ai]]}
[1, 5]
[2, 6]
[3, 7]
[4, 8]
--
Regards,
Jason
http://blog.casey-sweat.us/
>> In my post, I chose the array.each only as an example. Its not arrays
>> for which I want to do this with - I want to solve the problem for
>> iterator calls. If zip() internally creates a list of values from both
>> iterations, then it does not help me. I want to be able to do the actual
>> computation of the iterator calls in lock-step.
>
> Enum#zip internally turns everything into arrays.
Actually, only the arguments are converted; the receiving object is
iterated over using #each (or for Array, a standard C array
iteration). So, if only one list must not be converted to an array,
just call #zip on that list.
Oh, you're right. Thanks.
I think it still doesn't solve the original poster's problem, though.
--
William <wmorgan-...@masanjin.net>
<Repeat>
In my post, I chose the array.each only as an example. Its not arrays
for which I want to do this with - I want to solve the problem for
iterator calls. If zip() internally creates a list of values from both
iterations, then it does not help me. I want to be able to do the actual
computation of the iterator calls in lock-step.
</Repeat>
I was trying to write code where I can have two or more computations
which generate values that I need to compare - with the ruby syntax
tying iterator usage to the code blocks by syntax, there is no way I can
do this. This is especially applicable when the iterators potentially
return an infinite stream of values. (Except for using contiuations...
Which is not really so much of a solution, because I might as well not
have used an iterator in a first place if I could create good
coroutines)
Yes I understand that I take a hit (wrt perf by choosing ruby), but
that's a BAD argument to favour wrapping state first by a iterators and
then once over by a continuation.
On a related note, I had asked this question about C# and it looks like
C# can do both of these cleanly -
foreach(<type> t in foo())
{
foreach(<type> t in bar())
{
//nested
}
}
And -
IEnumerator<int> ie1 = a.foo().GetEnumerator();
IEnumerator<int> ie2 = a.bar().GetEnumerator();
while (ie1.MoveNext() && ie2.MoveNext())
{
//lockstep
}
Since Ruby is open to (I believe) a major language redesign these days,
this might be a good feature to consider adding.
Roshan
I'm not sure about that one. The issue pops up once in a while but not
too often to justify a general change in iteration style. If it just was
an add on - maybe. But I guess the overhead is significant.
Since you are doing calculations, what prevents you from doing it the
other way round?
class SillyGenerator
def initialize() reset() end
def next?() @val < 1000 end
def reset() @val = 0 end
def move() @val += 2 end
def get() @val end
end
class Calculator
include Enumerable
def initialize(source) @source = source end
def each
@source.reset
while @source.next?
@source.move
yield @source.get
end
self
end
end
Or in one class
class BaseGenerator
def each
reset
while next?
move
yield get
end
self
end
end
class SillyGenerator < BaseGenerator
include Enumerable
def initialize() reset() end
def next?() @val < 1000 end
def reset() @val = 0 end
def move() @val += 2 end
def get() @val end
end
Now you can use it both ways:
>> SillyGenerator.new.inject(0){|max, e| max > e ? max : e}
=> 1000
iter1 = SillyGenerator.new
iter2 = SillyGenerator.new
while iter1.next? && iter2.next?
iter1.move
iter2.move
puts "failure" unless iter1.get == iter2.get
end
Kind regards
robert
the following works. It is oriented on the length of the first array. If the
second one is longer the rest of the values are ignored, if it is shorter,
"nil" will be returned.
>>>> code >>>>>
class Array
def combeach(anarr)
self.each_index{|ind|
yield self[ind], anarr[ind]
}
end
end
a=[1,2,3,4,5,6]
b=[7,8,9]
a.combeach(b){|ai,bi| print "ai=#{ai.inspect}, bi=#{bi.inspect}\n"}
>>>>> Result >>>>>
ai=1, bi=7
ai=2, bi=8
ai=3, bi=9
ai=4, bi=nil
ai=5, bi=nil
ai=6, bi=nil
>>>>> End Of Example >>>>>
Good luck.
Why? - It is somehow the "assembler-way" for iterators. The difference is
that you must put your algorthmic selection inside a method an then call the
block using "yield". This doesn't even make problems if you read a data
stream which length is not specified when starting the action.
> Excerpts from William Morgan's mail of 10 Mar 2005 (EST):
>> I see about a factor of ten slow-down in the following toy example.
>
> Um that would be, 1000. Not ten. 1000.
>
> In the following experiments I see factors of anywhere between 200 and
> 950, so three orders of magnitude seems reasonable.
Anyone got any idea how the performance and memory cost would compare by
using threads to do the parallel internal iteration (using each) instead
of the continuations that Generator uses?
What I'm thinking of is creating a thread for each Enumerable that needs
to be iterated, and having each of these threads pass the yielded values
to the calling thread which can then yield appropriately.
I get the impression that Generator generates (hoho) lots of continuations
and therefore causes extra headache for the garbage collector. Minimising
this might provide a performance boost?
J> Anyone got any idea how the performance and memory cost would compare by
J> using threads to do the parallel internal iteration (using each) instead
J> of the continuations that Generator uses?
Internally a continuation is implemented by a thread.
--
Guy Decoux
> > the following works. It is oriented on the length of the first array.
If
> the
> > second one is longer the rest of the values are ignored, if it is
shorter,
> > "nil" will be returned.
The code works only for Array or other Enumerables that can be indexed by
an integer value. That does not cover the general case Roshan is after.
> As I've seen in the last message you are directly looking for the
"external
> iterators", as implemented in C#.
>
> Why? - It is somehow the "assembler-way" for iterators. The difference
is
> that you must put your algorthmic selection inside a method an then call
the
> block using "yield". This doesn't even make problems if you read a data
> stream which length is not specified when starting the action.
If you look at the thread you'll see that Roshan does not want to use
Arrays; and he wants to iterate two Enumerables in lock step, i.e.,
iterate them in parallel, without paying the extra overhead of conversion
to Array beforehand - which is especially reasonable for Enumerables that
yield an infinite number of values.
Regards
robert
I think you might be overlooking the obvious, which is that you can take
this approach in Ruby. You're writing the objects that do the
computations, right? Simply give them #next? and #next methods and use a
while loop. That's what you'd be doing in these other languages.
The prevalence of internal iterators in Ruby's API is simply a
reflection of the fact that they're so nice to use. It's not the only
way to iterate.
--
William <wmorgan-...@masanjin.net>
Ahah! The difference then would be that Generator creates many
continuations (and therefore threads?) as each value is yielded, whereas
the approach I was thinking about involves creating one thread for each
Enumerable.
I wonder what the tradeoffs are then between creating and invoking
continuations, and locking and context switching between threads.
> I was trying to write code where I can have two or more computations
> which generate values that I need to compare - with the ruby syntax
> tying iterator usage to the code blocks by syntax, there is no way I can
> do this. This is especially applicable when the iterators potentially
> return an infinite stream of values. (Except for using contiuations...
> Which is not really so much of a solution, because I might as well not
> have used an iterator in a first place if I could create good
> coroutines)
Continuations are the right answer in this case.
module Enumerable
attr_accessor :co, :st, :res
def nextco
each { |e|
callcc { |cc|
@co = cc
@res = e
@st.call
}
}
@res = nil
@st.call
end
def next
callcc { |cc|
@st = cc
@co ? @co.call : nextco
}
return @res
end
end
a = [1,2,3]
b = [4,5,6,7,8]
while x = a.next and y = b.next
puts x,y
end
Note there are three possible behaviours around the @res = nil
line. One is to return nil after all items have been used up, another
is to keep returning the last item by removing that line, and finally
a loop can be wrapped around the each, to begin returning from
the first item again.
Its probably possible to do it in one function, but it makes more
sense to me in two.
> Yes I understand that I take a hit (wrt perf by choosing ruby), but
> that's a BAD argument to favour wrapping state first by a iterators and
> then once over by a continuation.
As long as its wrapped safely and nicely, what does it matter?
This way, you even get your choice of what to do when you
run out of items ;)
--
spooq
Replace the first two lines of nextco with
def nextco
each_with_index { |e,i|
--
spooq
http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html
--
spooq
module Enumerable
attr_accessor :co, :st, :res, :sto
def nextco
each_with_index { |e,i|
callcc { |cc|
@co = cc
@res = e
@st.call
}
}
@res = nil
@st.call
end
def next
(t = @sto; @sto = nil; return t) if @sto
callcc { |cc|
@st = cc
@co ? @co.call : nextco
}
return @res
end
def next?
@sto = self.next if !@sto
@sto
end
end
a = [1,2,4]
b = [3,5,'lots']
puts a.next
b.next?
b.next?
b.next?
while a.next? and b.next?
puts a.next,b.next
end
--
spooq
It's a bit faster if you do
begin
while g.next; end
rescue EOFError
end
Not with magnitudes, tho.
Csaba
(snip)
> end
I also wrote some lightweight generator, before I knew that there is one
in the std lib.
Now I see that it's about the same speed as your Enumerable extension,
ie., they are cca. 1.5 times faster than the official one.
This is enough reason to show it... it converts an iterator method to a
generator. Eg.:
require 'iter2gener'
a = (0...3000).to_a
g = I2G.new a.method(:each)
while e=g.next; p e; end
You have some further options, eg, you can set what to do when iterator
is exhausted.
Csaba
The C# and Py examples actually abstract the details of the objects that
they generate from the user. Programmers don't explicitely create these
classes. In the C# world, most people don't know that there is an
intermediate object involved even.
//
// The prevalence of internal iterators in Ruby's API is simply
// a reflection of the fact that they're so nice to use. It's
// not the only way to iterate.
Completely agree William,
The problem as I find it is one of language design - what are the
abstractions that the language gives us to make this possible? Can I
wrap continuations or create classes that manage state by hand to do
iteration? Yes I can. But that's not the problem here.
Let me quote from the pragmatic programmers guide -
<quote>
Our job is to solve problems, not spoonfeed compilers, so we like
dynamic languages that adapt to us, without arbitrary, rigid rules. We
need clarity so we can communicate using our code. We value conciseness
and the ability to express a requirement in code accurately and
efficiently. The less code we write, the less that can go wrong. (And
our wrists and fingers are thankful, too.)
We want to be as productive as possible, so we want our code to run the
first time; time spent in the debugger is time stolen from the
development clock. It also helps if we can try out code as we edit it;
if you have to wait for a 2-hour make cycle, you may as well be using
punch cards and submitting your work for batch compilation.
We want a language that works at a high level of abstraction. The higher
level the language, the less time we spend translating our requirements
into code.
</quote>
When I code in a language I wish it would give me the right
abstractions, so that I can work directly on the problem at hand.
Luke,
Thanks for the continuations implementation - the only problem
with the approach is that the Ruby interpreter there is also paying the
cost of maintaining state of the iterator. The continuation does that
further, only so we can decopule the syntactic dependancy of an iterator
and its block of code into an explicit function call (next).
The problem here IMHO is with language design and it is solved by usage
of expensive language contructs. Today if I have to solve the problem I
would go ahead and use generators - I am hoping that the language does
something better for the future.
Cheers,
Roshan
> Luke,
> Thanks for the continuations implementation - the only problem
> with the approach is that the Ruby interpreter there is also paying the
> cost of maintaining state of the iterator. The continuation does that
> further, only so we can decopule the syntactic dependancy of an iterator
> and its block of code into an explicit function call (next).
>
> The problem here IMHO is with language design and it is solved by usage
> of expensive language contructs. Today if I have to solve the problem I
> would go ahead and use generators - I am hoping that the language does
> something better for the future.
It sounds to me like you've posed a problem at the syntax / abstraction
level but you're rejection solutions for performance / implementation
reasons.
The standard library's generator.rb, Luke's, and Csaba's code all solve
your problem. You can write your code using internal iterators and
someone else can covert them automagically to external iterators for
simultaneous iteration. But you say you don't like these because they
use continuations, even though that's hidden from you and the user of
your object.
Would you be happy if you hadn't looked at the code? Would you be happy
if Ruby continuations were really really fast? Would you be happy if
these generator objects were built-in to the Ruby C api and the
implementation were effective obscured from you?
What is it, exactly, you want Ruby to do?
--
William <wmorgan-...@masanjin.net>
At this point I need to ask you a question -
How do internal iterators work? How do they manage state? I havent seen
a description of this anywhere.
I know about creating objects for iteration. How do the internal
itertors differ from these?
I had thought that ruby would create an object for iteration, by
examining the proc code. It would then live in the context of the Ruby
GC like all objects and maintain state. The only difference I saw with
the 'internal' iterator is that since the syntax of ruby limits the
usage of the iterator to an explicit code block that catches it, the
details of the itertor object have been hidden from the user.
Which is to say that it is 'internal' only in the sense that
traditionally there was no need to give anyone explict access to the
object. The moment the idea of using multiple iterators in tandem is
considered, this assumption of hiding the object does not seem like the
right approach. The correct solution would have been to expose the
iterator object itself.
If however internal iterators are implmented in some way that have
different semantics from normal objects, then wrapping with
continuations is an acceptable solution for what is a design limitation
in the language. Wrapping itertors in continuations should not be
accepted way to do this in the language when there is a simpler
alternative possible.
So you see I really need to understand how internal iterators work,
before taking this ahead.
See the difference I see is that I see this as something the language
should have solved and not the library.
Roshan
The iteration state is all in the iterator method. For example, look at
how the Fibonacci series is written:
def fib
a = 1
b = 0
yield 0
while true
x = a + b
yield x
a = b
b = x
end
end
fib { |n| puts n; break if n > 100 }
All the state (a, b, and x) is in the iterator method.
So the difference between external and internal iterators is: rather
than repeatedly calling an "step" method on an "iterator object" and
doing something with the result, you call an "iterator method" once, and
pass it a lexical closure (in Ruby, a Proc object), and *it* then calls
the Proc on each element.
Internal iterators are not a Ruby-specific construct, nor are they
explicitly built-in to Ruby, except in the sense that the syntactic
sugar makes them very easy to do, and the API uses them all over. Think
of them as a different "style" of iteration.
Hope that answers your question.
--
William <wmorgan-...@masanjin.net>
William, what you are saying is that a closure of the caller's state is
passed to iterator. Which means that the iterator can be a normal method
on the stack?
I had always thought that the caller is a regular method and the callee
(the iterator) preserves state as an object. Here since the lexical
closure of the caller is what is used grab state... all methods that
call iterators are automatically mapped to some sort of object?
Is there something literature I can read about this?
If it is the caller that is modified by the language rather than the
callee, then it is evident why we need to wrap iterators in
continuations to grab their state.
As a matter of fact, it is evident why the language never had a
provision for lock-step iteration, because the if the iterator expects a
closure, then ever closure that can be passed to it has to be bound by a
certain siganture.
Which means that a hopeful syntax like
foo()|bar() {|from_foo|from_bar|
//single itertor block
}
..would not be easy to implement.
Hmm.. Interesting.
Roshan