Ruby includes a useful Generator library for switching internal iterators to external iterators. It is used like this:
require 'generator'
# Generator from an Enumerable object g = Generator.new(['A', 'B', 'C', 'Z'])
while g.next? puts g.next end
I've heard complaints in the past that this library is slow. One reason is that it was implemented with continuations, which have performance issues in the current version of Ruby. "Was" is the keyword there though, because I've just learned that Generator was recently re-implemented. I learned some good tricks reading the new version, so let's try fixing Generator ourselves. (No peeking at the new version!)
This week's Ruby Quiz is to write FasterGenerator, your own re-implementation of Generator with an eye towards working faster. (This is a small library. My version is 54 lines.) It is possible to go even faster than the new implementation, with certain trade-offs:
user system total real Current Generator 0.750000 0.880000 1.630000 ( 1.630639) Old callcc Generator 0.190000 1.170000 1.360000 ( 1.375868) James's FasterGenerator 0.210000 1.230000 1.440000 ( 1.433152)
user system total real Current Generator 15.940000 0.140000 16.080000 ( 16.425068) Old callcc Generator 6.390000 30.160000 36.550000 ( 36.676838) James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.036880)
It you want to see the class documentation, you can find it here:
assert_equal(4, g.pos) assert_equal(false, g.next?) assert_raises(EOFError) { g.next } end
def test_each a = [5, 6, 7, 8, 9]
g = Generator.new(a)
i = 0
g.each { |x| assert_equal(a[i], x)
i += 1
break if i == 3 }
assert_equal(3, i)
i = 0
g.each { |x| assert_equal(a[i], x)
i += 1 }
assert_equal(5, i) end end
The Generator library also includes a SyncEnumerator class, but it is written to use Generator and will work fine with a new version, as long as it is API-compatible.
> Ruby includes a useful Generator library for switching internal iterators to > external iterators. It is used like this: > ... > If you want to make sure your implementation is correct, you can use these tests > straight out of the current implementation: > ...
James, thanks for the new quiz. It would be interesting to add a testcase with an endless internal iterator. I don't know the new implementation of Generator, so I can't say whether an endless iterator should be supported.
> It would be interesting to add a testcase with an endless internal > iterator. I don't know the new implementation of Generator, so I > can't say whether an endless iterator should be supported.
That's an interesting point we will certainly talk more about as we start to see some solutions. Generator *can* handle endless iterators. Quiz solvers can decide whether or not to limit themselves with that additional requirement...
Not being familiar with all the various Ruby packages and libs, I first want to thank ya for posting a good set of test cases that I can review, but was wondering if you also might post the code used to do the timing?
> Not being familiar with all the various Ruby packages and libs, I > first want to thank ya for posting a good set of test cases that I can > review, but was wondering if you also might post the code used to do > the timing?
I will, yes, on Sunday. :)
I pulled down copies on the Generator library, before and after the change. I then modified the class names so they could all peacefully coexist, loaded them, and ran a trivial benchemark:
Benchmark.bmbm do |x| x.report("Current Generator") do tests.times { CurrentGenerator.new(enum) } end x.report("Old callcc Generator") do tests.times { CallCCGenerator.new(enum) } end x.report("James's FasterGenerator") do tests.times { FasterGenerator.new(enum) } end end
puts puts "### next() ###" puts
Benchmark.bmbm do |x| x.report("Current Generator") do generator = CurrentGenerator.new(enum) tests.times { generator.next until generator.end? } end x.report("Old callcc Generator") do generator = CallCCGenerator.new(enum) tests.times { generator.next until generator.end? } end x.report("James's FasterGenerator") do generator = FasterGenerator.new(enum) tests.times { generator.next until generator.end? } end end
__END__
I'll post the modified libraries to go with that after the spoiler period. Obviously, you could go get them yourself, before then. I strongly recommend writing a solution first though.
On Feb 10, 2006, at 10:23 AM, James Edward Gray II wrote:
> I pulled down copies on the Generator library, before and after the > change. I then modified the class names so they could all > peacefully coexist, loaded them, and ran a trivial benchemark:
In answering Matthew's question, I found a small mistake in the benchmarks posted with the quiz (the timings for constructing my FasterGenerator were wrong). Here are the corrected numbers:
user system total real Current Generator 0.740000 0.720000 1.460000 ( 1.464659) Old callcc Generator 0.220000 1.500000 1.720000 ( 1.714859) James's FasterGenerator 0.010000 0.000000 0.010000 ( 0.003258)
user system total real Current Generator 16.630000 0.120000 16.750000 ( 16.878429) Old callcc Generator 7.440000 32.720000 40.160000 ( 40.336902) James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.035432)
> > Not being familiar with all the various Ruby packages and libs, I > > first want to thank ya for posting a good set of test cases that I can > > review, but was wondering if you also might post the code used to do > > the timing?
> I will, yes, on Sunday. :)
But you just posted it. :)
Sorry, guess I wasn't clear. I wasn't looking to see your implementation of Generator, I was to see your benchmarking code.
> I wasn't looking to see your implementation of Generator, I was to > see your benchmarking code.
It's not my implementation I am trying to hide, it's the current Generator code the benchmark uses as a baseline. I realize it's readily available, but I think solving this one is more fun if you don't peek. :)
> It's not my implementation I am trying to hide, it's the current > Generator code the benchmark uses as a baseline. I realize it's > readily available, but I think solving this one is more fun if you > don't peek. :)
Okay.... ummm.... sure.... I won't peek. No, you can't make me.
(In case I still wasn't clear, I wasn't asking to peek at ANY implementation of Generator, but rather wanted exactly what you posted, which has nothing to do with Generator but everything to do with benchmark, i.e., the thing I wanted to know about.)
> user system total real > Current Generator 0.740000 0.720000 1.460000 ( 1.464659) > Old callcc Generator 0.220000 1.500000 1.720000 ( 1.714859) > James's FasterGenerator 0.010000 0.000000 0.010000 ( 0.003258)
> user system total real > Current Generator 16.630000 0.120000 16.750000 ( 16.878429) > Old callcc Generator 7.440000 32.720000 40.160000 ( 40.336902) > James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.035432)
Off topic a bit, but what os/hardware are you using to get these numbers?
>> user system total real >> Current Generator 0.740000 0.720000 1.460000 ( 1.464659) >> Old callcc Generator 0.220000 1.500000 1.720000 ( 1.714859) >> James's FasterGenerator 0.010000 0.000000 0.010000 ( 0.003258)
>> user system total real >> Current Generator 16.630000 0.120000 16.750000 ( 16.878429) >> Old callcc Generator 7.440000 32.720000 40.160000 ( 40.336902) >> James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.035432)
> Off topic a bit, but what os/hardware are you using to get these > numbers?
A dual processor G5 at 2.0 Ghz (each processor), running Mac OS X (10.4.4). The box has 2 Gigs of RAM.
user system total real Old callcc Generator 11.490000 17.390000 28.880000 ( 29.853396) lukfugl's FasterGenerator 0.650000 0.000000 0.650000 ( 0.694442)
My machine is obviously painfully slower than James', and it looks like my implementation isn't quite as quick as his, but it was fun figuring out a few tricks. Now I'm gonna go look at the new code and see if my implementation beats it too!
(I'll post my solution after the spoiler period expires tomorrow morning.)
On Sun, 2006-02-12 at 14:24 +0900, Jacob Fugal wrote: > On 2/10/06, Ruby Quiz <ja...@grayproductions.net> wrote: > > This week's Ruby Quiz is to write FasterGenerator, your own > > re-implementation of Generator with an eye towards working faster.
> Here's the benchmark results from my own implementation: [...]
Here's mine up to now. Being greedy I made two versions, because of the infinite block problem mentioned on list. These are with 10000 elements (the posted code) against the old Generator (I'm going to try the new one later on, I want to try a few more optimisations first):
On Feb 10, 2006, at 17:23 , James Edward Gray II wrote:
> On Feb 10, 2006, at 10:08 AM, Matthew Moss wrote:
>> Not being familiar with all the various Ruby packages and libs, I >> first want to thank ya for posting a good set of test cases that I >> can >> review, but was wondering if you also might post the code used to do >> the timing?
> I will, yes, on Sunday. :)
> I pulled down copies on the Generator library, before and after the > change. I then modified the class names so they could all > peacefully coexist, loaded them, and ran a trivial benchemark:
..snip..
> Benchmark.bmbm do |x| > x.report("Current Generator") do > generator = CurrentGenerator.new(enum) > tests.times { generator.next until generator.end? } > end
I have small problem with this. Won't this mean the loop will only run 1 time?
The first time it correctly loops, but the next time "generator.end?" will be true, and the following 999 runs will consequently does not execute generator.next at all, right?
If we change the row to
test.times do generator.rewind generator.next until generator.end? end
> I've heard complaints in the past that this library is slow. One > reason is that > it was implemented with continuations, which have performance > issues in the > current version of Ruby. "Was" is the keyword there though, > because I've just > learned that Generator was recently re-implemented. I learned some > good tricks > reading the new version, so let's try fixing Generator ourselves. > (No peeking > at the new version!)
> This week's Ruby Quiz is to write FasterGenerator, your own re- > implementation of > Generator with an eye towards working faster. (This is a small > library. My > version is 54 lines.) It is possible to go even faster than the new > implementation, with certain trade-offs:
With the same trade-offs as James' version(?) (i.e. no infinite block iterators) and with the hope that I don't have any bugs in the code, I get something like this on a G4 1.67Mhz (after changing the tests somewhat)
### next() ###
Rehearsal ----------------------------------------------------- Current Generator 20.360000 14.330000 34.690000 ( 51.788014) My Generator 0.080000 0.010000 0.090000 ( 0.116704) ------------------------------------------- total: 34.780000sec
user system total real Current Generator 23.200000 14.610000 37.810000 ( 53.145604) My Generator 0.080000 0.010000 0.090000 ( 0.129965)
(and this code)
8<----
tests = 10 enum = (1..1000).to_a
puts puts "### next() ###" puts
Benchmark.bmbm do |x| x.report("Current Generator") do generator = Generator.new(enum) tests.times { generator.rewind generator.next until generator.end? } end x.report("My Generator") do generator = MyGenerator.new(enum) tests.times { generator.rewind generator.next until generator.end? } end end
8<-----
I originally wanted to use the original values for tests and enum, but I got bored waiting.
Generator should suspend current calculation, and resume it when next() is called. (I uses @queue as array, but this is just for insurance. normally @queue should not contain multiple values)
//////////////////////////////////////////
require "test/unit" require "generator"
class TestGenerator < Test::Unit::TestCase
class C def value=(x) @value = x end def each loop do yield @value end end end
def test_realtime c = C.new g = Generator.new(c) 3.times do |i| c.value = i assert_equal(i, g.next()) end end
end
//////////////////////////////////////////
And python supports generator natively, this behaves like HEAD's Generator class.
def generator(): global value while True: yield value
g = generator() for i in xrange(3): value = i print g.next()
////////////////////////////////////////////
And this one is Java version.
abstract class CoRoutine { private final Thread _thread;
private final Object _lock = new Object();
private java.util.LinkedList _list = new java.util.LinkedList();
private boolean _terminated = false;
public CoRoutine() { final Object first_lock = new Object();
_thread = new Thread() { public void run() { synchronized(_lock) { synchronized (first_lock) { first_lock.notify(); }
> x = 1 > g = Generator.new do |g| > 5.times {|i| g.yield [x,i].max} > end
> until g.end? > x = 10 if g.current > 2 > puts g.next > end
> This outputs 1, 1, 2, 3, 10.
> The next question is, why is the API for this class so crappy? I would > have expected the output to be 1, 1, 2, 10, 10. But creating the > generator automatically advances to the first yield, and "next" advances > to the next one while returning the previous one. This is just wrong.
Actually, there's no way (I can think of) to get the output you expect, and it has nothing to do with an initial advance or anything. My implementation doesn't advance to the yields until required, yet has the same output. This is because of these two lines:
x = 10 if g.current > 2 puts g.next
g.current returns the current value. g.current is not greater than two until g.current == 3. g.next then returns the same as g.current, while also advancing the internal counter. The behavior your describing would require changing the value of g.current by assigning to x. Can't be done.
On Feb 12, 2006, at 19:23 , James Edward Gray II wrote:
> On Feb 12, 2006, at 11:10 AM, Jacob Fugal wrote:
>> Ok, since last night I've corrected my implementation to not evaluate >> ahead, as per H. Yamamoto's and Luke Blanshard's comments.
> Mine does, of course, pull all the items in at once and has all the > problems discussed with that approach. Attached is my > implementation and the altered versions I have been using in > benchmarks.
> James Edward Gray II
Mine looks like a clone of James':
class MyGenerator attr_reader :index def initialize(enum = nil) if enum then @array = enum.to_a else @array = Array.new yield self end @index = 0 end def current raise EOFError unless next? @array[@index] end def next value = current @index += 1 return value end def next? return @index < @array.length end def rewind @index = 0 self end def each(&block) @array.each(&block) end def yield(value) @array << value end def pos return @index end def end? return !next? end end
Jacob Fugal wrote: > On 2/12/06, Luke Blanshard <l...@blanshard.us> wrote:
>> x = 1 >> g = Generator.new do |g| >> 5.times {|i| g.yield [x,i].max} >> end
>> until g.end? >> x = 10 if g.current > 2 >> puts g.next >> end
>> This outputs 1, 1, 2, 3, 10.
>> The next question is, why is the API for this class so crappy? I would >> have expected the output to be 1, 1, 2, 10, 10. But creating the >> generator automatically advances to the first yield, and "next" advances >> to the next one while returning the previous one. This is just wrong.
> Actually, there's no way (I can think of) to get the output you > expect, and it has nothing to do with an initial advance or anything. > My implementation doesn't advance to the yields until required, yet > has the same output. This is because of these two lines:
> x = 10 if g.current > 2 > puts g.next
> g.current returns the current value. g.current is not greater than two > until g.current == 3. g.next then returns the same as g.current, while > also advancing the internal counter. The behavior your describing > would require changing the value of g.current by assigning to x. Can't > be done.
You are correct. My bad.
My point is that the API is pointlessly difficult. I think, actually, that the C# iterator API is the cleanest I've seen: there is one operation that attempts to advance to the next position, returning false if it has reached the end, and another that returns the current value, throwing an exception if it's still before the first advance.
And my other point is that, while you're going down the road of coroutines, you might as well go the whole way.
- passes the tests - runs a couple orders faster than the continuation generator using the provided bench - runs the generator block in a separate thread - intended to handle both stateful and stateless (functional) generator blocks - provides a Generator::stateless constructor that allows for the generator block to run for more than one callback to Generator#yield - is uncommented
I'd been wanting to mess with writing ruby extensions so I wrote the whole thing in c. I do realize one thing already: I'm using the 'entries' method to pull out what to iterate over instead of each. This comes from something that I'm totally clear on: The contract for the method is that we receive in an Enumrable object so can we assume that this method will be implemented since it's part of the Enumerable mixin? Anyways here's the code:
#include "ruby.h"
static ID id_entries; typedef struct _gen { int curr; VALUE values;
} Generator;
//Is there a built in way to do this? #define TEST(t) t?Qtrue:Qfalse
static VALUE t_init(int argc, VALUE *argv, VALUE self) { Generator* gen; Data_Get_Struct(self, Generator, gen); if(argc > 0) { VALUE arr = rb_funcall(argv[0], id_entries, 0); gen->values = arr; } else { VALUE arr = rb_ary_new(); gen->values = arr; rb_yield(self); } gen->curr = 0; return self;
}
static VALUE t_end_q(VALUE self) { Generator* gen; Data_Get_Struct(self, Generator, gen); int size = RARRAY(gen->values)->len; int curr = gen->curr; return TEST(curr >= size);
}
static VALUE t_next_q(VALUE self) { return TEST(!t_end_q(self));
}
static VALUE t_pos(VALUE self) { Generator* gen; Data_Get_Struct(self, Generator, gen); int curr = gen->curr; return INT2NUM(curr);
I haven't written much c in a long while (let alone ruby in c) so be kind. It doesn't seem to be much of a speed improvement at all over the same code written in straight ruby, but it was educational for me.
I did mess with trying to get infinite generators to work, but I cheated and looked at the code after a bit so I was tainted after that. Here is a test case I made however if it's of any use. (require 'matrix' of course):
def fib(n) (Matrix[[1,1],[1,0]]**(n-1))[0,0] end def test_fib g = Generator.new do |g| a, b = 0, 1 while true g.yield a a, b = b, a+b end end 100.times do |i| assert_equal(i, g.pos) assert_equal(fib(i), g.next) end end
Here's my solution. Mine is pretty much similar in spirit to Caleb's solution, except that we use different lock mechanisms. My first take was to replace the Continuation with a thread and a SizedQueue. That is, in the #initialize, I create a new thread with the given block (or the one I generate with the given enum), which will eventually be blocked when writing to the queue with the size of 1 in #yield. Once #next dequeues the head, the #yield thread continues, etc.
This passed James's TC_Generator test suite, but miserably failed on Luke's "shared state" test, although the code was supposed to handle the case.
It turned out, if SizedQueue of size one is the only blocking mechanism, you have two values waiting at the queue's door; one in the queue, the other right before the queue, waiting to be unlocked. This made the reponse to Luke's test 1, 1, 2, 3, 4 (and then 10, 10, 10, ... if I increase the repetition). I needed to make the thread stop right after it enqueued the value until #next consumes it.
My solution was to get rid of SizedQueue and to use a Mutex and a ConditionVariable to accomplish just that. At that point I saw Caleb's solution and thought that starting and stopping a thead should be much slower than using Mutexes and Cond_Vars. To my surprise, that wasn't the case. Mutex mechanism was much slower than Caleb's thread switching solution.
Anyways, here's the code. Benchmark follows the code (I ran on a slower notebook).
Thanks James for the nice quiz.
Jesse
#!/usr/bin/env ruby require 'thread'
class Generator include Enumerable
def initialize(enum = nil, &block) if enum @block = proc { |g| enum.each { |x| g.yield x } } else @block = block end
def yield(value) @q_access.synchronize { @queue << value @q_consumed.wait(@q_access) }
self end
def end?() Thread.pass while @queue.empty? && @thread.alive? @queue.empty? && !...@thread.alive? end
def next?() !end? end
def index() @index end
def pos() @index end
def next() if end? raise EOFError, "no more elements available" end ret = nil @q_access.synchronize { @index += 1 ret = @queue.shift @q_consumed.signal }
ret end
def current() if end? raise EOFError, "no more elements available" end
@queue.first end
def rewind() initialize(nil, &@block) if @index.nonzero?
On 2/12/06, Ross Bamford <ros...@roscopeco.co.uk> wrote:
> (Incidentally, Jacob, Under 1.8 yours was testing about twice as fast as > mine, but it seems to deadlock about half of the time? I have to > interrupt after leaving it for up to a minute. I'm on Ruby 1.8.4 > i686-linux).
Hmm, while I didn't run into any deadlocks with my testing, I'm not too surprised. I'm not very threading-savvy and could easily have made some critical mistake. :)
I also realized that the implementation I posted to the list doesn't rewind correctly. Currently it just drops back the @position marker, but doesn't reset the @values array or generating block. This is bad if the generator is rewound and the block should generate different values on different runs (e.g. it's time dependent or sensitive to the environment).