| The each, map and filter functions use a pattern for enumeration that is both extremely slow and incorrect when run in jruby: The pattern is roughly:
enum = object.each |
|
object.size.times do |
yield(enum.next) |
end
|
Because this only calls next as many times as there are objects in the enuemrator, it never triggers StopIteration. That causes the Fiber associated with the enumerator not to be cleaned up. Since JRuby Fibers are mapped to native threads, that leaves an active native thread, which can cause the server to hit its limit of threads. It also means the Fiber serves as a GC root, and so any objects referenced from it will remain uncollected. A more correct pattern is:
enum = object.each |
|
begin |
loop do |
yield(enum.next) |
end |
rescue StopIteration |
end
|
This avoids letting threads accumulate, but it's still very expensive. The most correct version is simply:
object.each do |value| |
yield(value) |
end
|
I used the following example as a comparison:
a = 5000.times.map do |
{ |
'one' => 'foo', |
'two' => 'bar', |
'three' => 'baz', |
'four' => 'quux', |
}.freeze |
end |
|
my_proc = proc do |(k,v)| |
end |
|
Benchmark.bm do |x| |
x.report do |
10.times.each do |
a.each do |h| |
h.each_pair do |pair| |
my_proc.call(pair) |
end |
end |
end |
end |
|
x.report do |
10.times.each do |i| |
a.each_with_index do |h, j| |
enum = h.each_pair |
begin |
loop do |
my_proc.call(enum.next) |
end |
rescue StopIteration |
end |
end |
end |
end |
end
|
with the results
❯ ruby test.rb |
user system total real |
0.030373 0.000491 0.030864 ( 0.030925) |
0.217605 0.018319 0.235924 ( 0.235980) |
❯ jruby test.rb |
user system total real |
0.800000 0.020000 0.820000 ( 0.091818) |
20.310000 2.370000 22.680000 ( 17.027952)
|
In MRI, the difference is around an order of magnitude, but both versions are so fast the cost is negligible either way. On JRuby, the difference is >150x. |