Importing multiple Excel workbooks, each with multiple sheets

479 views
Skip to first unread message

Colin Van Alstine

unread,
Nov 13, 2017, 11:35:06 AM11/13/17
to OpenRefine
Hello,

I'm having trouble working with a set of spreadsheets that I'm processing.  During the import process, I am selecting several Excel files that contain data that I want to process.  (four in the example below, seventeen total for this project)


Each file has a variable number of sheets, from 1 sheet to 15 in a file.  I can On the "Configure Parsing Options" screen, I can select the specific sheets that I want to import, using checkboxes.  Note that I'm only being prompted for the sheets in the first of the workbooks, which has four sheets.


After the import is successful, I can do a text facet on the filename column and I see four sheets per file.


Unfortunately, this is not accurate to the contents of the excel files.  Each of the other files have eight sheets each, but Refine only brought in the first four.


After a little testing, if I select less sheets, each file only brings in that many sheets (select three in Parsing Options, get three per file), discarding any additional sheets that might be in the workbook.  Additionally, if I select four sheets and one of the files doesn't have that many sheets, the import process will hang and not complete.


There are some manual steps to the cleanup that I'm doing in Refine, so it won't be much of an issue to import each of the seventeen spreadsheets in turn.  I wanted to bring this to folks' attention to a) report this as a bug or b) have someone point out a solution.  I appreciate your feedback!

-Colin


Thad Guidry

unread,
Nov 13, 2017, 1:12:05 PM11/13/17
to openr...@googlegroups.com


After a little testing, if I select less sheets, each file only brings in that many sheets (select three in Parsing Options, get three per file), discarding any additional sheets that might be in the workbook. 

This is expected and how we designed the handling. 

Additionally, if I select four sheets and one of the files doesn't have that many sheets, the import process will hang and not complete.

Of course, that makes sense, your telling OpenRefine software that you know you had 4 sheets in that file, but that was not the case.

I guess you probably would like a better error message rather than OpenRefine just hanging, right ?  If so, we can certainly improve that experience for you and not hang but instead give you a nicer error message and return to the importer dialog ?

Let me know and I can open an issue for you on that last point, to provide a proper error message and not just hang.

Colin Van Alstine

unread,
Nov 13, 2017, 1:55:26 PM11/13/17
to OpenRefine
Thad,

What you are saying makes sense for an import process for a single Excel workbook containing multiple sheets.  The situation that I'm describing is when you have multiple workbooks, each one containing a variable number of sheets.  I've attached 4 of the workbooks that I am processing as an example.  If you create a new project and select the four files from your computer, when you get to Configure Parsing Option, you only are presented with the names of the sheet from the first workbook.  In my case, those are the sheets in Al-Ahram, of which there are only four.  

I'm not given an option to select more sheets, even though Alamphon and Al-chark both have eight sheets each.  I put Concert Record (with only two sheets) as an example, because when you try to create the project, having selected the four sheets that are present in Al-Ahram, the project won't be created - it will just hang.  Try it with all four first to see the hanging, then try to create a project with just Al-Ahram, Alamphon and Al-chark.  For the three files, my goal is to see the data from all 20 sheets in Refine.  I'm currently only able to create a project with 16 sheets.

Thanks,
Colin
Alamphon.xlsx
Al-chark.xlsx
Concert Record.xlsx
Al-Ahram.xlsx

qi cui

unread,
Nov 13, 2017, 3:26:15 PM11/13/17
to OpenRefine
Colin,
Just tried. It seems a bog. do you mind to register an issue at:

We will be working on that soon.

Regards
Jacky

Ettore Rizza

unread,
Nov 14, 2017, 5:33:51 AM11/14/17
to OpenRefine
If you plan to work on this possible bug, it would be a good idea to consider adding a "select all the sheets" checkbox. It is not uncommon for OCRs to produce Excel files containing several tens of sheets.

But we will probably talk about that again when the issue is created.

Colin Van Alstine

unread,
Nov 14, 2017, 10:13:42 AM11/14/17
to OpenRefine
Thank you Jacky and Ettore.  I've created an issue here: https://github.com/OpenRefine/OpenRefine/issues/1328  Please tag and add any context you feel would be helpful.  I appreciate you considering this and I'm excited to see all the updates to the refine milestones!

-Colin

qi

unread,
Nov 16, 2017, 2:15:35 PM11/16/17
to OpenRefine
Colin, This will be released with version 2.7.2. If you cannot wait, you can try to check the tip of the repo if it has been merged.

Jacky
Reply all
Reply to author
Forward
0 new messages