Hi all,
I've been experimenting with improving Ohm's index handling with
pipelining. I've had particular success with
Ohm::Model#add_to_indices, the core of which I've wrapped in a
pipeline as follows:
def add_to_indices
db.pipelined do
indices.each do |att|
next add_to_index(att) unless collection?(send(att))
send(att).each { |value| add_to_index(att, value) }
end
end
end
I've also tried to do something similar with delete_from_indices, but
with less success. Here's some numbers from a benchmark run on a
project I'm working on. This particular benchmark amongst other things
creates 2000 instances of models subclassing Ohm::Model, hits
#add_to_indices 3000 times, and #delete_from_indices 1000 times:
Rehearsal
---------------------------------------------------------------------
no pipelining 7.860000 1.880000 9.740000
( 13.856807)
add_to_indices pipelining 7.280000 1.620000 8.900000
( 12.299852)
delete_from_indices pipelining 7.690000 2.140000 9.830000
( 13.957360)
no pipelining 2nd run 7.810000 2.000000 9.810000
( 13.946544)
add and delete indices pipelining 7.280000 1.840000 9.120000
( 12.613177)
----------------------------------------------------------- total:
47.400000sec
user system
total real
no pipelining 7.470000 2.130000 9.600000
( 13.590801)
add_to_indices pipelining 7.470000 1.630000 9.100000
( 12.634070)
delete_from_indices pipelining 7.770000 2.060000 9.830000
( 13.957237)
no pipelining 2nd run 7.830000 2.050000 9.880000
( 14.006379)
add and delete indices pipelining 7.300000 1.780000 9.080000
( 12.569478)
OK this looks like a mess as I post it, but I don't seem able to
format or mark this message plain text to render it fixed width -
fneh. Anyhow, what these numbers seem to show is a 7-11% increase in
performance when #add_to_indices is pipelined. This is probably
because #add_to_indices spends it's entire time generating a bunch of
SADD commands (via #add_to_index), which is a primary use case for
pipelining in Redis. Applying pipelining to #delete_from_indices
doesn't seem to show any benefit, and may in fact be slowing things
down. I've run the above benchmark a number of times and the results
are pretty consistent.
More targeted benchmarking is needed, particularly to measure
performance vs number of indices. The project I'm working on has a
mix, one class has four indexes (=> 8 SADDs), which I suspect benefits
the most. Classes with only one or two indexes will only generate a
few SADDs, so the benefit of sticking these in a pipeline might be
outweighed by the overhead of setting up the pipeline.
Conclusion: pipelining Ohm::Model#add_to_indices looks like a win,
particularly for models with large numbers of indices, but more
testing is required to ensure single index models don't suffer.
If anyone want to try the benchmarks I've used above, take a look at
https://github.com/LichP/Porp, checkout the pipeline-bench tag, and
poke around in the benchmarks directory. WARNING: the benchmark
scripts will flush Redis DB 0 by default, so be careful where you run
them.
--
Phil Stewart