Idiomatic conversion of yielding block to array

David Brady

unread,

Sep 2, 2005, 4:15:11 PM9/2/05

to

So I have a function that generates like 300 lines of text and I want to
unit test it. If it fails, it barfs a dozen pages of text then says
there's a difference in there somewhere and we wish you the best of luck
trying to find it.

So, yeah.

I'd like to break this out into a line-by-line test. The following code
doesn't work because each_line is a generator function, not an array:

def test.to_html
# test the Calendar.to_html method
cal = Calendar.new(@date_test)
line_no = 0
@html_reference.each_line.zip(cal.to_html.each_line).each do |test_line|
line_no += 1
assert_equal( test_line[0], cal_line[1], "Calendar#to_html output
incorrect at line #{line_no}"
# Test::Unit will then emit the reference line and the failed test
line.
end

Okay, so I want these each_lines done up as arrays.

ref_lines = []; @html_reference.each_line {|line| ref_lines << line }
cal_lines = []; cal.to_html.each_line {|line| cal_lines << line }
line_no = 0
ref_lines.zip(cal_lines).each do |test_line|
line_no += 1
assert_equal( test_line[0], test_line[1], "Calendar.to_html
incorrect on line #{line_no}")
end

This smells very kludgy and procedural. I'm using an index/counter
variable, explicit Array declarations, and zipping two arrays together
only to split them apart again. Ugh!

Can someone help me make this more elegant? C++ has pooped Ruby's pants.

-dB

--
David Brady
ruby...@shinybit.com
C++ Guru. Ruby nuby. Apply salt as needed.

Levin Alexander

unread,

Sep 2, 2005, 4:26:17 PM9/2/05

to

David Brady <ruby...@shinybit.com> wrote:

> cal_lines = []; cal.to_html.each_line {|line| cal_lines << line }

require 'enumerator'
cal_lines = cal.to_html.to_enum(:each_line).to_a

-Levin

Simon Kröger

unread,

Sep 2, 2005, 5:56:11 PM9/2/05

to

You don't need the to_a but still zip will create 3 arrays.

interesting...

Simon

David Brady

unread,

Sep 2, 2005, 6:22:55 PM9/2/05

to

Levin Alexander wrote:

> require 'enumerator'
> cal_lines = cal.to_html.to_enum(:each_line).to_a
>
>

Innnteresting. This condenses the loop initialization into a single
line... but it's pretty hairy:

@html_reference.to_enum(:each_line).zip(cal.to_html.to_enum(:each_line)).each_with_index
do |test_line, line_no|
assert_equal( test_line[0], test_line[1], "Calendar#to_html
incorrect on line #{line_no+1}")
end

I don't think we've really cleaned much up, though. We've moved from
C++ pooping Ruby's pants to Ruby pooping its own pants. It's idiomatic,
but we haven't reached elegance yet.

Simon Kröger

unread,

Sep 2, 2005, 6:26:31 PM9/2/05

to

Simon Kröger wrote:
> You don't need the to_a but still zip will create 3 arrays.

this does not, but its still not very ruby like:
-------------------------------------------------------------
s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

q1, q2 = SizedQueue.new(1), SizedQueue.new(1)
Thread.new {s1.each_line {|l| q1 << l}; q1 << nil}
Thread.new {s2.each_line {|l| q2 << l}; q2 << nil}

line_no = 0
while true
line_no += 1
t1, t2 = q1.pop, q2.pop
break if !t1 || !t2
puts "difference on line #{line_no}" if t1 != t2
end
-------------------------------------------------------------

perhaps better, but still kind of hardcore:

-------------------------------------------------------------
require 'enumerator'
require 'thread'

module Enumerable
def each_zip(other)
q = SizedQueue.new(1)
Thread.new {other.each{|l| q << l}}
self.each{|a| yield a, q.pop}
end
end

s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

s1.to_enum(:each_line).enum_for(:each_zip,
s2.to_enum(:each_line)).each_with_index do |t, l|
puts "difference on line: #{l+1}" if t[0] != t[1]
end
-------------------------------------------------------------

food for thoughts...

Simon

Levin Alexander

unread,

Sep 2, 2005, 6:40:56 PM9/2/05

to

On 9/3/05, Simon Kröger <SimonK...@gmx.de> wrote:

> food for thoughts...

require 'generator'

s1 = %w{this is the first line}.join("\n")
s2 = %w{this is the second line}.join("\n")

compare = SyncEnumerator.new(s1.to_enum(:each_line), s2.to_enum(:each_line))
compare.each { |a,b|
# ...
}

-Levin

Simon Kröger

unread,

Sep 2, 2005, 6:49:18 PM9/2/05

to

Levin Alexander wrote:

Yes, perfekt, that's what i was looking for (and implemented :) )

the trivial solution is of course:

----------------------------------------------------------
s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

(0...s1.size).inject(1) do |l, i|
puts "difference on line: #{l}" or break if s1[i] != s2[i]
l = (s1[i] == 10) ? l+1 : l
end
----------------------------------------------------------

buts that no fun..

Simon

Jacob Fugal

unread,

Sep 2, 2005, 6:54:32 PM9/2/05

to

Good heavens, no! Neither of those are thread safe. Criminy!

Why not something simple like so:

require 'enumerator'

def compare_by_line( expected, actual, message )
expected_lines = expected.to_enum(:each_line)
actual_lines = actual.to_enum(:each_line)
expected_lines.zip(actual_lines).each_with_index do |pair, index|
assert_equal( *pair, "#{message} on line #{index + 1}")
end
end

compare_by_line( @html_reference, cal.to_html, "Calendar.to_html incorrect" )

There's no need for threads...

Jacob Fugal

unread,

Sep 2, 2005, 6:59:36 PM9/2/05

to

On 9/2/05, Jacob Fugal <luk...@gmail.com> wrote:
> Good heavens, no! Neither of those are thread safe. Criminy!

(Sorry, for those in non-tree-threading clients, that was in reply to
ruby-talk:154785)

Jacob Fugal

Simon Kröger

unread,

Sep 2, 2005, 7:02:59 PM9/2/05

to

Jacob Fugal wrote:

because zip converts each argument to an array, which isn't what you
want if there are many lines.

> There's no need for threads...

True SyncEnumerator does it with continuations but i wasn't ready for
this mind bending stuff (yet). (and now i don't need to make knots in my
brain, because Levin was so nice to tell us about SyncEnumerator)

But what exactly isn't thread safe? SizedQueue is.

> Jacob Fugal

Threads aren't evil.

cheers

Simon

Jacob Fugal

unread,

Sep 2, 2005, 7:10:10 PM9/2/05

to

On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:
> s1 = ['this', 'is', 'a', 'test'].join("\n")
> s2 = ['this', 'is', 'the', 'test'].join("\n")
>
> (0...s1.size).inject(1) do |l, i|
> puts "difference on line: #{l}" or break if s1[i] != s2[i]
> l = (s1[i] == 10) ? l+1 : l
> end

What's the second to last line for?

l = (s1[i] == 10) ? l+1 : l

That will always return false (since s1[i] is always a string, and
string/integer comparison is always false), so your line count will
always be 1.

Jacob Fugal

Simon Kröger

unread,

Sep 2, 2005, 7:18:40 PM9/2/05

to

Jacob Fugal wrote:

> [...]

> What's the second to last line for?
>
> l = (s1[i] == 10) ? l+1 : l
>
> That will always return false (since s1[i] is always a string, and
> string/integer comparison is always false), so your line count will
> always be 1.

s1[i] is in fact never a string:

str[fixnum] => fixnum or nil
http://www.ruby-doc.org/core/classes/String.html#M001328

so it is for counting newline chars. It could be written shorter

(s1[i] == 10) ? l+1 : l

without the assignment.

> Jacob Fugal

cheers

Simon

Jacob Fugal

unread,

Sep 2, 2005, 7:20:15 PM9/2/05

to

On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:

> Jacob Fugal wrote:
> > Why not something simple like so:

> > ... <snip> ...

>
> because zip converts each argument to an array, which isn't what you
> want if there are many lines.

Ah, yes. I'd probably do it with SyncEnumerator rather than zip as
well, then[1]. I just used zip since its what you started with and you
didn't specify performance as an issue.

> True SyncEnumerator does it with continuations but i wasn't ready for
> this mind bending stuff (yet). (and now i don't need to make knots in my
> brain, because Levin was so nice to tell us about SyncEnumerator)

Very true. Thanks, Levin.

> But what exactly isn't thread safe? SizedQueue is.

Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Assuming I'm not wasting time by not looking at the SizedQueue
documentation, the problem is that there's no guarantee that the main
thread won't run several pops before returning control to the thread
that's pushing into the queue.

> Threads aren't evil.

No, but naive non-thread-safe code is. Well, it's broken, at least...
not necessarily *evil*. :)

Jacob Fugal

unread,

Sep 2, 2005, 7:21:41 PM9/2/05

to

On 9/2/05, Jacob Fugal <luk...@gmail.com> wrote:

> On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:
> > But what exactly isn't thread safe? SizedQueue is.
>
> Unless SizedQueue is built with threads in mind and does alternating
> blocking on pushes (<<) and pops. If it does, I apologize.

Again, I need to proofread my posts better. That should read "Only if"
instead of "Unless".

Jacob Fugal

unread,

Sep 2, 2005, 7:27:22 PM9/2/05

to

On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:

> Jacob Fugal wrote:
>
> > [...]
> > What's the second to last line for?
> >
> > l = (s1[i] == 10) ? l+1 : l
> >
> > That will always return false (since s1[i] is always a string, and
> > string/integer comparison is always false), so your line count will
> > always be 1.
>
> s1[i] is in fact never a string:

Ah, sorry, I missed the fact that you were iterating over characters
instead of lines. 10 == ?\n. Gotcha.

But isn't that different than the original functionality? He
originally wanted a warning (specifically assert_equal as part of
Test::Unit) for each line that differed. Your code will only match the
first such case then break out of the loop.

Jacob Fugal

Simon Kröger

unread,

Sep 2, 2005, 7:30:43 PM9/2/05

to

Jacob Fugal wrote:

It is a _sized_ queue wich i initailized to be of size 1, so there is
only 1 item max in the queue at any given time. This comes from a lib
called 'thread' so i would assume that it is in fact "built with threads
in mind". It blocks if there is an item in it and you want to write and
it block if there is no such item and you want to read.

cheers

Simon

Simon Kröger

unread,

Sep 2, 2005, 7:39:04 PM9/2/05

to

> [...]

> Ah, sorry, I missed the fact that you were iterating over characters
> instead of lines. 10 == ?\n. Gotcha.
>
> But isn't that different than the original functionality? He
> originally wanted a warning (specifically assert_equal as part of
> Test::Unit) for each line that differed. Your code will only match the
> first such case then break out of the loop.
>
> Jacob Fugal

hmm, true .. on the other hand assert_equal throws an
AssertionFailedError, so this is kind of breaking out of the loop also.
(But right, its not the same thing)

cheers

Simon

Levin Alexander

unread,

Sep 2, 2005, 7:57:08 PM9/2/05

to

On 9/2/05, Jacob Fugal <luk...@gmail.com> wrote:

> Ah, yes. I'd probably do it with SyncEnumerator rather than zip as
> well, then[1]. I just used zip since its what you started with and you
> didn't specify performance as an issue.

Performance would probably be the wrong motivation to switch to
SyncEnumerator. On my machine it is about 100 times slower then
Enumerable#zip.

sync-enum: 1.676
zip-enum: 0.018
sizedqueue: 0.226

-Levin

compare.bm.rb

Jacob Fugal

unread,

Sep 2, 2005, 8:05:23 PM9/2/05

to

On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:

> hmm, true .. on the other hand assert_equal throws an
> AssertionFailedError, so this is kind of breaking out of the loop also.
> (But right, its not the same thing)

Indeed, I forgot about that as well. Most my programming has been in
PHP lately (*shudder*) and the testing framework I use continues past
failed assertions, so you can have many failed assertions in one test.
So if you replaced the puts "..." or break if ... with an assertEqual,
you'd pretty much have it right on. :) And since you fail at the first
mismatched character, your prior newlines are sure to have matched up.
Cool beans.

Jacob Fugal

unread,

Sep 2, 2005, 8:08:15 PM9/2/05

to

On 9/2/05, Levin Alexander <levin.a...@gmail.com> wrote:
> Performance would probably be the wrong motivation to switch to
> SyncEnumerator. On my machine it is about 100 times slower then
> Enumerable#zip.
>
> sync-enum: 1.676
> zip-enum: 0.018
> sizedqueue: 0.226

Yeah, but that performance is probably linear (again, I'm shooting
from my hip here). So in the extreme cases, SyncEnumerable would
outperform #zip. And even before it takes the lead in execution speed,
it definitely has an advantage in memory, which I was more concerned
about. SizedQueue however seems to be a much better choice than
SyncEnumerable for those situations.

Jacob Fugal.

Jacob Fugal

unread,

Sep 2, 2005, 8:09:07 PM9/2/05

to

On 9/2/05, Simon Kröger <SimonK...@gmx.de> wrote:
> It is a _sized_ queue wich i initailized to be of size 1, so there is
> only 1 item max in the queue at any given time. This comes from a lib
> called 'thread' so i would assume that it is in fact "built with threads
> in mind". It blocks if there is an item in it and you want to write and
> it block if there is no such item and you want to read.

Yeah, I should have looked into that before my first post. Again, my
apologies :)

Jacob Fugal

William James

unread,

Sep 2, 2005, 10:54:08 PM9/2/05

to

David Brady wrote:
> Levin Alexander wrote:
>
> > require 'enumerator'
> > cal_lines = cal.to_html.to_enum(:each_line).to_a
> >
> >
> Innnteresting. This condenses the loop initialization into a single
> line... but it's pretty hairy:
>
>
> @html_reference.to_enum(:each_line).zip(cal.to_html.to_enum(:each_line)).each_with_index
> do |test_line, line_no|
> assert_equal( test_line[0], test_line[1], "Calendar#to_html
> incorrect on line #{line_no+1}")
> end
>
> I don't think we've really cleaned much up, though. We've moved from
> C++ pooping Ruby's pants to Ruby pooping its own pants. It's idiomatic,
> but we haven't reached elegance yet.

class Array
# The differences between 2 arrays of equal size.
def dif( other )
( (0...self.size).to_a.zip( self ) -
(0...self.size).to_a.zip( other ) ).map{|x| x<<other[x[0]] }
end
end

@html_reference.to_enum(:each_line).dif(
cal.to_html.to_enum(:each_line) ).each{ |x|
# Line numbering starts with 1.
x[0] += 1
print "Incorrect at line "; puts x; puts
}

Simon Kröger

unread,

Sep 3, 2005, 7:23:18 AM9/3/05

to

Thanks for bringing this up.

The variant with each_zip in Enumerable needs only one thread:

user system total real
sync-enum: 1.594000 0.500000 2.094000 ( 2.110000)
zip-enum: 0.016000 0.000000 0.016000 ( 0.015000)
sizedqueue: 0.250000 0.000000 0.250000 ( 0.250000)
each_zip: 0.125000 0.000000 0.125000 ( 0.125000)

but no match against zip-enum in terms of speed.

Still interesting that threads are 10 times faster in this case than
continuations.

cheers

Simon

compare.bm.rb

Eric Hodel

unread,

Sep 4, 2005, 3:51:12 PM9/4/05

to

On 02 Sep 2005, at 13:15, David Brady wrote:

> So I have a function that generates like 300 lines of text and I
> want to unit test it. If it fails, it barfs a dozen pages of text
> then says there's a difference in there somewhere and we wish you
> the best of luck trying to find it.

Use unit_diff in ZenTest.

http://rubyforge.org/projects/zentest

--
Eric Hodel - drb...@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04