Performance

36 views
Skip to first unread message

Fonsan

unread,
Apr 7, 2011, 5:12:05 PM4/7/11
to spreadsheet
Hi

Installed the rack-perftools_profiler and it turns out that when using
spreadsheet with excel,calling methods on nil objects and catching the
exception is really expensive and so is running a regex on every
single cell looking for dates. I lost the exact figures but the nil
raising was about 50% time spent and regex was around 20 after that.

It was hard to find info on how to contribute to this project, and the
lack of github repo adds to that equation.

Heres a monkeypatch that I am using for now in a initalizer file.
Notice that this is not safe since I have no understanding of how
multiples work but they are not used in my use case. I am also
completely disabling the date search which could be undesired. There
should be a flag for that.

Since I am streaming the data to the client I end up with creating a
StringIO object in which I store the full string of the sheet. A nicer
implementation would be if the sheet would expose a IO object that I
could pass to send_file which would truly stream data to the client

Regards

module Spreadsheet
module Excel
module Writer
def write_cellblocks row
# BLANK ➜ 6.7
# BOOLERR ➜ 6.10
# INTEGER ➜ 6.56 (BIFF2 only)
# LABEL ➜ 6.59 (BIFF2-BIFF7)
# LABELSST ➜ 6.61 (BIFF8 only)
# MULBLANK ➜ 6.64 (BIFF5-BIFF8)
# MULRK ➜ 6.65 (BIFF5-BIFF8)
# NUMBER ➜ 6.68
# RK ➜ 6.82 (BIFF3-BIFF8)
# RSTRING ➜ 6.84 (BIFF5/BIFF7)
multiples, first_idx = nil
row = row.formatted
row.each_with_index do |cell, idx|
cell = nil if cell == ''
unless cell
write_blank(row, idx)
else

## it appears that there are limitations to RK precision,
both for
# Integers and Floats, that lie well below 2^30
significant bits, or
# Ruby's Bignum threshold. In that case we'll just write
a Number
# record
need_number = need_number? cell
if multiples && (!multiples.last.is_a?(cell.class) ||
need_number)
write_multiples row, first_idx, multiples
multiples, first_idx = nil
end
nxt = idx + 1
case cell
when NilClass
if multiples
multiples.push cell
elsif nxt < row.size && row[nxt].nil?
multiples = [cell]
first_idx = idx
else
write_blank row, idx
end
when TrueClass, FalseClass, Error
write_boolerr row, idx
when String
write_labelsst row, idx
when Numeric
## RK encodes Floats with 30 significant bits, which is
a bit more than
# 10^9. Not sure what is a good rule of thumb here, but
it seems that
# Decimal Numbers with more than 4 significant digits
are not represented
# with sufficient precision by RK
if need_number
write_number row, idx
elsif multiples
multiples.push cell
elsif nxt < row.size && row[nxt].is_a?(Numeric)
multiples = [cell]
first_idx = idx
else
write_rk row, idx
end
when Formula
write_formula row, idx
when Date, Time
write_number row, idx
end
end
end
write_multiples row, first_idx, multiples if multiples
end

end
end
end
class Workbook
def sanitize_worksheets sheets
found_selected = false
sheets.each do |sheet|
found_selected ||= sheet.selected
#sheet.format_dates!
end
unless found_selected
sheets.first.selected = true
end
sheets
end
end

Zeno Davatz

unread,
Apr 8, 2011, 2:16:19 AM4/8/11
to rubyspr...@googlegroups.com
Dear Fonsan

Thanks for the patch.

What I would be interested in is how you use spreadsheet. Can you
provide me with some more details about yourself and your project
where you use spreadsheet. Then I can understand better why you would
want that patch applied.

Alexandre Bini has a clone of spreadsheet in his git repo. So if we
find some common ground for some important use cases between the three
of us then we may be able to fasttrack a patch.

It would also help if you would paste your patch as a gist on github,
so we can start commenting on your patch there. Please link your gist
from github to this mailing list.

I like to see the identities of the people who contribute to spreadsheet.

Best
Zeno

Reply all
Reply to author
Forward
0 new messages