Cell Values return nil from malformed file

121 views
Skip to first unread message

Andy Davis

unread,
Jan 23, 2013, 9:53:14 AM1/23/13
to rubyspr...@googlegroups.com
I have a file that I suspect is somewhat invalid (it is produced by a third party analysis program).  

require 'spreadsheet'
ss = Spreadsheet.open('./tmp/analysis.summary.xls');nil
sheet = ss.worksheet 0
row = sheet.row 8
row[0]

The row will come back as an empty array.  If I open the file in Excel and then re-save it, then the row comes back with the correct data.

I am trying to read this spreadsheet as part of a zero-touch processing pipeline and was curious if anyone had encountered a similar issue and was able to work around it.

Andy

Zeno Davatz

unread,
Jan 23, 2013, 10:35:30 AM1/23/13
to rubyspr...@googlegroups.com
Dear Andy

Have you tried to open

analysis.summary.xls

with LibreOffice and save it using Libre/OpenOffice as MS XLS and then
try to run your script again. Do you still get the error? If so, can
you share the file?

Excel-Generating Softwares are a bit of a pain as they do not always
implement every bit of the Microsoft Excel Manual ;) and sometimes
take undocumented shortcuts.

Best
Zeno

Andy Davis

unread,
Jan 23, 2013, 10:57:29 AM1/23/13
to rubyspr...@googlegroups.com
I haven't tried it with Libre or Open Office, but certainly re-saving with Excel gives a file that works.  I am sure that it is the software generating the file that is to blame.  I am really trying to see if there's a way that I can do this without having to have a human step involved.

Unfortunately, I cannot post the file.

Andy

Javix

unread,
Jan 24, 2013, 8:43:55 AM1/24/13
to rubyspr...@googlegroups.com
Try to execute the below code to see if you get the right output (adapt it your file location):

# coding: utf-8
require 'spreadsheet'

Spreadsheet.open(File.join(folder, file_name)) do |book|
  puts "sheets: #{book.worksheets.size}"
  sheet = book.worksheet(0)   
  puts "First sheet name: #{sheet.name}"
end

Serguei

Andy Davis

unread,
Jan 24, 2013, 10:56:11 AM1/24/13
to rubyspr...@googlegroups.com
Frustrating thing is that I can do that and more:

wb = Spreadsheet.open('./tmp/analysis.summary.xls')
puts "Opened Spreadsheet, #{wb.sheet_count} sheets encountered"

(0..wb.sheet_count-1).each do |sheet_number|
  ws = wb.worksheet(sheet_number)
  puts "\tWorksheet #{ws.name} has #{ws.row_count} rows and #{ws.column_count} columns}"
end

This works fine.  I get the right number of sheets, the right titles for the sheets, and the right dimensions for all sheets.  It only seems to be once I get into the sheets that the rows are apparently empty.

Zeno Davatz

unread,
Jan 24, 2013, 11:23:48 AM1/24/13
to rubyspr...@googlegroups.com
Dear Andy

On Thu, Jan 24, 2013 at 4:56 PM, Andy Davis <andy....@koanhealth.com> wrote:
> Frustrating thing is that I can do that and more:
>
> wb = Spreadsheet.open('./tmp/analysis.summary.xls')
> puts "Opened Spreadsheet, #{wb.sheet_count} sheets encountered"
>
> (0..wb.sheet_count-1).each do |sheet_number|
> ws = wb.worksheet(sheet_number)
> puts "\tWorksheet #{ws.name} has #{ws.row_count} rows and
> #{ws.column_count} columns}"
> end
>
> This works fine. I get the right number of sheets, the right titles for the
> sheets, and the right dimensions for all sheets. It only seems to be once I
> get into the sheets that the rows are apparently empty.

What happens if you save the file

analysis.summary.xls

using Ruby and spreadsheet to a new name, like:

analysis.summary_.xls

And then you try to open it using spreadsheet.

Can you then reach the contents of the file?

Best
Zeno

Andy Davis

unread,
Jan 24, 2013, 11:52:51 AM1/24/13
to rubyspr...@googlegroups.com
That doesn't seem to help.  I can open the alternative file in excel and it looks correct, but whatever issues the original had are getting carried forward.

One thing I did notice is that the reported dimensions are almost correct.  Every sheet is off by one on the column count.  So, the first sheet has two columns, but column_count returns 1.  However, I do think that dimensions is correct:

Opened Spreadsheet, 7 sheets encountered
Worksheet 'Summary Statistics' has 77 rows and 1 columns}; and Dimensions: [0, 77, 0, 1]
Worksheet 'Summary Statistics' row 5: []
Worksheet 'Summary Statistics' cell(5, 0): 

That is produced by:

wb = Spreadsheet.open('./tmp/analysis.summary.alt.xls')

puts "Opened Spreadsheet, #{wb.sheet_count} sheets encountered"

ws = wb.worksheet(0)
puts "\tWorksheet '#{ws.name}' has #{ws.row_count} rows and #{ws.column_count} columns}; and Dimensions: #{ws.dimensions}"

row_index = 5
row = ws.row row_index
puts "\tWorksheet '#{ws.name}' row #{row_index}: #{row.to_s}"

puts "\tWorksheet '#{ws.name}' cell(#{row_index}, 0): #{ws.cell(row_index, 0)}"

Zeno Davatz

unread,
Jan 24, 2013, 12:56:46 PM1/24/13
to rubyspr...@googlegroups.com
Dear Andy

On Thu, Jan 24, 2013 at 5:52 PM, Andy Davis <andy....@koanhealth.com> wrote:

> One thing I did notice is that the reported dimensions are almost correct.
> Every sheet is off by one on the column count. So, the first sheet has two
> columns, but column_count returns 1. However, I do think that dimensions is
> correct:

This is interesting! Try playing around with the Sheet and column
numbers. You may then get some output with your original file.

Maybe the Software that generates the XLS files just numbers the
sheets in a slightly different order (or starts from a different
number), that spreadsheet gem does not recognize if you do not "step
on it". The binary formatting seems to be correct.

Does the file size change if you save your sample input file with OpenOffice?

Best
Zeno

Andy Davis

unread,
Jan 24, 2013, 5:06:08 PM1/24/13
to rubyspr...@googlegroups.com
Yes, significantly.

Original size is 2419712 bytes.  OO resave is 1449984 bytes.  Excel resave is 1398784 bytes.  I can see cell values in both the Excel and Open Office resaves.

BTW - I really appreciate your help on this.  I am sorry I can't post the spreadsheet.

Andy

Zeno Davatz

unread,
Jan 25, 2013, 2:31:48 AM1/25/13
to rubyspr...@googlegroups.com
Dear Andy

On Thu, Jan 24, 2013 at 11:06 PM, Andy Davis <andy....@koanhealth.com> wrote:
> Yes, significantly.
>
> Original size is 2419712 bytes. OO resave is 1449984 bytes. Excel resave
> is 1398784 bytes. I can see cell values in both the Excel and Open Office
> resaves.

How many worksheets does the original file have?

Best
Zeno

Javix

unread,
Jan 25, 2013, 4:37:08 AM1/25/13
to rubyspr...@googlegroups.com
1.Taking your post from the very beginning, you put nil just after getting the excel sheet (see marked in blue). Is it a typo ?
2. Try to get the details of the row as explained in the spreadsheet Guide (https://groups.google.com/forum/?hl=en&fromgroups=#!topic/rubyspreadsheet/vkHOulBnjMc):

require 'spreadsheet'

book = Spreadsheet.open '/path/to/an/excel-file.xls'
sheet1 = book.worksheet(0)
row = sheet1.row(3) # index (0-based):#getting the values
puts "Row values: #{row.inspect}"

And post what you have as result.

3. One more important details. DO NOT LEAVE YOUR file open when trying to read it or write to. ALWAYS CLOSE IT.

Regards

Zeno Davatz

unread,
Jan 25, 2013, 5:12:57 AM1/25/13
to rubyspr...@googlegroups.com
Dear Javix

On Fri, Jan 25, 2013 at 10:37 AM, Javix <s.ca...@gmail.com> wrote:
> 1.Taking your post from the very beginning, you put nil just after getting
> the excel sheet (see marked in blue). Is it a typo ?
> 2. Try to get the details of the row as explained in the spreadsheet Guide
> (https://groups.google.com/forum/?hl=en&fromgroups=#!topic/rubyspreadsheet/vkHOulBnjMc):
>
> require 'spreadsheet'
>
> book = Spreadsheet.open '/path/to/an/excel-file.xls'
> sheet1 = book.worksheet(0)
> row = sheet1.row(3) # index (0-based):#getting the values
> puts "Row values: #{row.inspect}"
>
> And post what you have as result.
>
> 3. One more important details. DO NOT LEAVE YOUR file open when trying to
> read it or write to. ALWAYS CLOSE IT.

Awesome, thanks for helping out here!

Best
Zeno

Andy Davis

unread,
Jan 25, 2013, 9:47:43 AM1/25/13
to rubyspr...@googlegroups.com
I appreciate your help as well.  To answer  your questions.

1. Not a typo.  In irb for Ruby 1.9.3, inspecting the workbook will exhaust the memory on my machine (starting from about 8GiB free).  The nil afterwards prevents irb from automatically inspecting the value.

2. That's pretty close to what I have posted, but here goes:

andy@Andy-MBP:Crucible $irb
1.9.3-p194 :001 > require 'spreadsheet'
 => true 
1.9.3-p194 :002 > wb = Spreadsheet.open('./tmp/analysis.summary.xls');nil
 => nil 
1.9.3-p194 :003 > sheet1 = wb.worksheet(0)
 => #<Spreadsheet::Excel::Worksheet:0x003ff71d085184 @row_addresses= @default_format= @selected= @dimensions= @name=Summary Statistics @workbook=#<Spreadsheet::Excel::Workbook:0x007fee3b1b3d30> @rows=[] @columns=[] @links={} @merged_cells=[] @protected=false @password_hash=0 @changes={} @offsets={} @reader=#<Spreadsheet::Excel::Reader:0x007fee3b1aa870> @ole=#<Ole::Storage::RangesIOMigrateable:0x007fee3b1b20e8> @offset=255158 @guts={} @rows[77]> 
1.9.3-p194 :004 > row = sheet1.row(3) # index (0-based):#getting the values
 => #<Spreadsheet::Excel::Row:0x003ff71d85f65c @default_format= @worksheet=#<Spreadsheet::Excel::Worksheet:0x007fee3a10a308> @idx=3 @formats=[] @height= @outline_level=0 @hidden=false []> 
1.9.3-p194 :005 > puts "Row values: #{row.inspect}"
Row values: #<Spreadsheet::Excel::Row:0x003ff71d85f65c @default_format= @worksheet=#<Spreadsheet::Excel::Worksheet:0x007fee3a10a308> @idx=3 @formats=[] @height= @outline_level=0 @hidden=false []>
 => nil 

And yes, the fourth row of the first spreadsheet does contain information.

Andy

Andy Davis

unread,
Jan 25, 2013, 9:49:57 AM1/25/13
to rubyspr...@googlegroups.com
And:
3. The file is not open in any other program.

Andy Davis

unread,
Jan 25, 2013, 9:53:08 AM1/25/13
to rubyspr...@googlegroups.com
There are seven sheets in the file.  The sheets, their names, and their dimensions are correctly reported by the gem:
Message has been deleted

Andy Davis

unread,
Jan 25, 2013, 12:09:18 PM1/25/13
to rubyspr...@googlegroups.com
We were able to generate a report based upon sample (not real) data.  It shows the issues that I have been having.
sample.xls

Zeno Davatz

unread,
Jan 25, 2013, 12:47:38 PM1/25/13
to rubyspr...@googlegroups.com
Dear Andy

Please link this sample file to your gist with the testing code.

Best
Zeno
<sample.xls>

Andy Davis

unread,
Jan 25, 2013, 1:40:09 PM1/25/13
to rubyspr...@googlegroups.com
OK, I created a gist that has my code and links to a downloadable file in the description:

Javix

unread,
Jan 25, 2013, 4:13:44 PM1/25/13
to rubyspr...@googlegroups.com
I tried the below code to read the supplied excel file and it works:

require 'spreadsheet'

book = Spreadsheet.open('sample.xls')
puts "file found" if book
puts "Sheets found: #{book.worksheets.size}"
book.worksheets.each do |sheet|
  puts "Got sheet: #{sheet.name}"
end

sheet1 = book.worksheets[0]
puts "Got the first sheet: #{sheet1.name}"
puts "cell 1: #{sheet1[2,1]}"

I can't figure out out why the standard way to read rows values like that:

sheet1.rows each { | row| puts row }

does not work.
It fails even when I tried just to count rows number, it was always 0:

sheet.rows.size

Regards 

Andy Davis

unread,
Jan 25, 2013, 4:28:40 PM1/25/13
to rubyspr...@googlegroups.com
Are you saying that the line:
puts "cell 1: #{sheet1[2,1]}"
outputs a value? 

Zeno Davatz

unread,
Jan 25, 2013, 4:41:19 PM1/25/13
to rubyspr...@googlegroups.com
Dear Javix

Am 25.01.2013 um 22:13 schrieb Javix <s.ca...@gmail.com>:

> I tried the below code to read the supplied excel file and it works:
>
> require 'spreadsheet'
>
> book = Spreadsheet.open('sample.xls')
> puts "file found" if book
> puts "Sheets found: #{book.worksheets.size}"
> book.worksheets.each do |sheet|
> puts "Got sheet: #{sheet.name}"
> end
>
> sheet1 = book.worksheets[0]
> puts "Got the first sheet: #{sheet1.name}"
> puts "cell 1: #{sheet1[2,1]}"
>
> I can't figure out out why the standard way to read rows values like that:
>
> sheet1.rows each { | row| puts row }
>
> does not work.
> It fails even when I tried just to count rows number, it was always 0:
>
> sheet.rows.size

I guess the reason is simple. The XLS generator is creating a new order of Workbook / Worksheet.

Best
Zeno

Javix

unread,
Jan 26, 2013, 6:26:41 AM1/26/13
to rubyspr...@googlegroups.com
I always read excel files in a 'normal' way as described in  spreadsheet Guide. But in case of sample.xls file, I also discovered that (after selecting all the cells in a sheet) and looking in the properties, they were locked. That's why I'm more convinced that the problem is in the supplied excel file and not in the way we read it.

Zeno Davatz

unread,
Jan 26, 2013, 6:59:54 AM1/26/13
to rubyspr...@googlegroups.com
Dear Javix

On Sat, Jan 26, 2013 at 12:26 PM, Javix <s.ca...@gmail.com> wrote:
> I always read excel files in a 'normal' way as described in spreadsheet
> Guide. But in case of sample.xls file, I also discovered that (after
> selecting all the cells in a sheet) and looking in the properties, they were
> locked. That's why I'm more convinced that the problem is in the supplied
> excel file and not in the way we read it.

Lets see if we can somehow get this insight into the the Guide.txt.

Did you see that the cells where locked by locking at the file in MS
Office / OpenOffice?

We still need a confirmation from Andy that it is now working for him.

Best
Zeno

Javix

unread,
Jan 26, 2013, 10:04:02 AM1/26/13
to rubyspr...@googlegroups.com
@Zeno: The cells seem to be locked in MS Excel as well in Excel for Mac.

Zeno Davatz

unread,
Jan 26, 2013, 10:14:54 AM1/26/13
to rubyspr...@googlegroups.com
Am 26.01.2013 um 16:04 schrieb Javix <s.ca...@gmail.com>:

> @Zeno: The cells seem to be locked in MS Excel as well in Excel for Mac.

Nice! I can't seem to see that in LibreOffice.

Best
Zeno

Andy Davis

unread,
Jan 26, 2013, 12:25:10 PM1/26/13
to rubyspr...@googlegroups.com
My situation has not changed.  I can see the dimensions of the sheet, but I cannot read any of the values.

Is there a way I can unlock the cells and therefore read the values?

Andy

Zeno Davatz

unread,
Jan 26, 2013, 1:08:52 PM1/26/13
to rubyspr...@googlegroups.com
Am 26.01.2013 um 18:25 schrieb Andy Davis <andy....@koanhealth.com>:

> My situation has not changed. I can see the dimensions of the sheet, but I cannot read any of the values.
>
> Is there a way I can unlock the cells and therefore read the values?

Ask the creator of the file that. It may be a setting in the software that creates the file.

Best
Zeno

Zeno Davatz

unread,
Jan 26, 2013, 1:09:56 PM1/26/13
to rubyspr...@googlegroups.com
Am 26.01.2013 um 18:25 schrieb Andy Davis <andy....@koanhealth.com>:

> My situation has not changed. I can see the dimensions of the sheet, but I cannot read any of the values.
>
> Is there a way I can unlock the cells and therefore read the values?

What is the name of the software that creates the file?

They must have some settings options as well.

Best
Zeno

Andy Davis

unread,
Jan 27, 2013, 10:08:18 PM1/27/13
to rubyspr...@googlegroups.com
It's a population medical claims analysis program.  The spreadsheet is a very small part of what they output.  Most of the output is in tab delimited text files.  However, they put some control information like, "your input completely baffles us, therefore we didn't do anything" into this spreadsheet.  While options abound for what the analysis does, there are none that control generating the spreadsheet.

I will follow up with the vendor.  You guys really went above and beyond for this one.  Thank you again.

Andy

Zeno Davatz

unread,
Jan 28, 2013, 2:30:10 AM1/28/13
to rubyspr...@googlegroups.com
Dear Andy

On Mon, Jan 28, 2013 at 4:08 AM, Andy Davis <andy....@koanhealth.com> wrote:
> It's a population medical claims analysis program. The spreadsheet is a
> very small part of what they output. Most of the output is in tab delimited
> text files. However, they put some control information like, "your input
> completely baffles us, therefore we didn't do anything" into this
> spreadsheet. While options abound for what the analysis does, there are
> none that control generating the spreadsheet.
>
> I will follow up with the vendor. You guys really went above and beyond for
> this one. Thank you again.

Lets see how easy it is for the vendor to unlock those cells.

Best
Zeno

Javix

unread,
Jan 28, 2013, 4:45:56 AM1/28/13
to rubyspr...@googlegroups.com
@Andy: In case if you will not be able to 'unlock' the cells, you can try to use ask to generate a CSV file and use CSV gem to parse the data, - even it is a little bit 'sado' solution. :)

Andy Davis

unread,
Jan 29, 2013, 2:12:54 PM1/29/13
to rubyspr...@googlegroups.com
With the supplied spread sheet:

andy@Andy-MBP:spreadsheet $ruby spreadsheet_test.rb 
sheets: 3
First sheet name: Sheet-1-persons
John
20.0

With the sample spreadsheet:
andy@Andy-MBP:spreadsheet $ruby spreadsheet_test.rb 
sheets: 7
First sheet name: Summary Statistics


On Fri, Jan 25, 2013 at 9:37 AM, Javix <s.ca...@gmail.com> wrote:
Try to run the below script with the attached excel file. If it works fine, try to put your excel file in the same folder as the below ruby file to be executed:

require 'spreadsheet'

Spreadsheet.open(File.join(File.dirname(__FILE__),'excel_test.xls')) do |book|
  puts "sheets: #{book.worksheets.size}"
  sheet = book.worksheet(0)   
  puts "First sheet name: #{sheet.name}"
  row = sheet.row(1)
  row.each { |cell| puts cell}
end

And see what happens.

--
You received this message because you are subscribed to the Google Groups "spreadsheet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyspreadshe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Javix

unread,
Feb 1, 2013, 8:06:46 AM2/1/13
to rubyspr...@googlegroups.com
So we can resume that it worked for you ?

Andy Davis

unread,
Feb 1, 2013, 10:01:08 AM2/1/13
to rubyspr...@googlegroups.com
No, it did not work for me.

For the spreadsheet that comes from my program, I have always been able to read the worksheet names, dimensions, and total count, but never any cell values.  

For spreadsheets that come from Open Office, Excel, or the one that you posted, I am able to read everything including values.

I don't think that the spreadsheet gem is the problem.  I have concluded that the issue is entirely with the program that generates the spreadsheet that I cannot read.

Javix

unread,
Feb 4, 2013, 6:56:45 AM2/4/13
to rubyspr...@googlegroups.com
Hi Andy, I've managed to read the values from the supplied sample.xls file with win32ole library (http://ruby-doc.org/stdlib-1.9.3/libdoc/win32ole/rdoc/WIN32OLE.html) as follows:

#encoding: utf-8

#See more details and examples at
#http://rubyonwindows.blogspot.com/2007/03/automating-excel-with-ruby-workbook.html
#http://rubyonwindows.blogspot.com/2007/04/automating-excel-with-ruby-worksheet.html
#http://rubyonwindows.blogspot.com/search/label/excel

require 'win32ole'

file = 'C:\Documents and Settings\Username\ruby_drafts\lib\sample.xls'
excel = WIN32OLE.new('Excel.Application')
workbook = excel.Workbooks.Open(file)

#selecting a sheet by name
worksheet = workbook.Worksheets(1)
worksheet.select    

rowcount = worksheet.UsedRange.Rows.Count

section_index = nil
puts "rows count: #{rowcount}"
for row in 1..rowcount do
  #referencing the cell by column index: worksheet.Cells(row,2) or as below, by its letter
  if(worksheet.Cells(row,'B').Value)   
      puts "found the text at row: #{row} => #{worksheet.Cells(row,'B').Value}"               
  end
end

Hope this helps.
Regards

Zeno Davatz

unread,
Feb 4, 2013, 8:28:57 AM2/4/13
to rubyspr...@googlegroups.com
Awesome! Thanks for the Tip!

Best
Zeno

Andy Davis

unread,
Feb 4, 2013, 1:51:57 PM2/4/13
to rubyspr...@googlegroups.com
I appreciate your ongoing efforts.

I expect that would work, but unless I am mistaken, this solution would be limited to Windows boxes with Excel installed.  Does that sound correct to you?

Andy

Javix

unread,
Feb 4, 2013, 3:06:23 PM2/4/13
to rubyspr...@googlegroups.com
Unfortunately, yes, you are right.
Reply all
Reply to author
Forward
0 new messages