On Wed, Dec 26, 2012 at 2:47 AM, Dylan Herman <
d.her...@gmail.com> wrote:
> Presently, I am building a file system crawler using the spreadsheet gem and
> receive the following error when I run the script:
>
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:377:in
> `validate!': OLE2 signature is invalid (Ole::Storage::FormatError)
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:369:in
> `initialize'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:110:in
> `new'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:110:in
> `load'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:77:in
> `initialize'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:83:in
> `new'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/ruby-ole-1.2.11.6/lib/ole/storage/base.rb:83:in
> `open'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/spreadsheet-0.7.5/lib/spreadsheet/excel/reader.rb:1179:in
> `setup'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/spreadsheet-0.7.5/lib/spreadsheet/excel/reader.rb:121:in
> `read'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/spreadsheet-0.7.5/lib/spreadsheet/excel/workbook.rb:32:in
> `open'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/spreadsheet-0.7.5/lib/spreadsheet.rb:62:in
> `open'
> from
> /Users/Dylan/.rvm/gems/ruby-1.9.3-p327/gems/spreadsheet-0.7.5/lib/spreadsheet.rb:68:in
> `open'
> from sscrawler.rb:9:in `block in <main>'
> from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:41:in
> `block in find'
> from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in
> `catch'
> from /Users/Dylan/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in
> `find'
> from sscrawler.rb:7:in `<main>'
>
> Below is the code to my crawler:
>
> require 'find'
> require 'spreadsheet'
> Spreadsheet.client_encoding = 'UTF-8'
>
> count = 0
>
> Find.find('/Users/Dylan/') do |file| # '/' for root directory on
> OS X
> if file =~ /\b\.xls\b/ # check if filename
> ends in desired format
> contents = Spreadsheet.open(file).worksheets
> contents.each do |row|
> if row =~
> /\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})\b/
> puts file
> count += 1
> end
> end
> end
> end
>
> puts "#{count} sensitive files were found"
>
> Please let me know your thoughts. Thanks!
The answer is here:
http://stackoverflow.com/questions/3321011/parsing-xls-and-xlsx-ms-excel-files-with-ruby/14072528#14072528
and here:
https://gist.github.com/4399750
Best
Zeno