How to transform excel file that is close to a CSV, but includes global values

30 views
Skip to first unread message

Daniel White

unread,
Aug 9, 2017, 1:33:26 PM8/9/17
to OpenRefine
I have a ton of excel files in this format:

year: 2015
quarter: 1
otherGlobalValue: bla bla
 
headerA    headerB    headerC
valueA1    valueB1     valueC1
valueA2    valueB2     valueC2
valueA3    valueB3     valueC3

I need to convert that into something useable like:

headerA    headerB    headerC    year    quarter    otherGlobalValue
valueA1    valueB1     valueC1    2015    1             bla bla
valueA2    valueB2     valueC2    2015    1             bla bla
valueA3    valueB3     valueC3    2015    1             bla bla

Obviously the formats are a bit more complicated than that, but this is my boiled down example. "year", "quarter", and "otherGlobalValue" are consider global values that apply to every record, which works fine for visuals, but is not easily machine readable. If I can convert this format into a standard CSV then it would be great. Any ideas if this is something I can do within OpenRefine?

Thad Guidry

unread,
Aug 9, 2017, 3:37:41 PM8/9/17
to OpenRefine
1. Export your Excel files as CSV.
2. Use Transform -> Transform cells in rows into columns...  and set to value of 4 (or however many of those metadata lines the CSV file has)
3. Refine and polish the data as necessary.  (ask for more help here if need be)

-Thad

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ettore Rizza

unread,
Aug 9, 2017, 4:37:00 PM8/9/17
to OpenRefine
Hi Daniel, 

You can import all your Excel files at once and execute some transformations. I do not have time to explain all the steps, but here is an example of solution with two files (screencast).

Daniel White

unread,
Aug 10, 2017, 12:10:38 PM8/10/17
to OpenRefine
Thanks for your response!

I'm a little confused by the results.

1. Import the CSV, this works great


2. Transpose cells in rows into columns (value of 4)



Results:



Probably I didn't understand you correctly, did I do something wrong?

Thank you!
-Daniel



Thad Guidry

unread,
Aug 10, 2017, 1:21:58 PM8/10/17
to openr...@googlegroups.com
Ah, Daniel,

Your data is aligned differently than how you described... I see that now.

OK, so you really just have some metadata at the top 3 or 4 rows.  And your real data is after that.  Similar to others that have a need for keeping notes, etc

Going to lunch, but I'll be back with an answer...or someone else might beat me to it :)

-Thad

Thad Guidry

unread,
Aug 10, 2017, 2:49:00 PM8/10/17
to openr...@googlegroups.com
For that, we don't have a one-click way to perform the magic needed. But several steps can be done and then repeated as necessary with Applying the JSON operations to a new imported file, only if the structure is exactly identical and the metadata row/columns are identical  (I.E. like your example, 4 rows of metadata with 2 columns for the key and value).

When you have metadata in rows within certain columns, you basically just need COPIES of those columns first so that you can isolate them with facet by star... then perform other work on the data.  That is the basic approach for that kinda use case typically... just copy things...so that you don't lose the original column data. :)

1. Add 2 new columns based on your headerA and headerB and call them meta1 and meta2 (to hold your key/value metadata) based on the GREL expression of just value
2. Move meta1 and meta2 to the beginning columns
3. Star your rows 1 - 4
4. On meta 1 and meta2 do Edit cells -> common transformations - Blank out cells
5. unstar the rows
6. Columnize by key column meta 1 and value column meta 2
7. Fill down on columns year, quarter, otherGlobalValue
8. perform small individual cell edits on your year, quarter, otherGlobalValue to make it look refined and polished.

-Thad

Thad Guidry

unread,
Aug 10, 2017, 2:51:10 PM8/10/17
to openr...@googlegroups.com
sorry... forgot to tell you to facet on star... at step 3.5

Thad Guidry

unread,
Aug 10, 2017, 2:56:04 PM8/10/17
to openr...@googlegroups.com
For clarity...Its facet by star and click on "false" ... you basically want to blank out  cells in meta1 and meta2 that do NOT have the metadata... because you already have the value data in the other 3 header columns.

That should do it for you.

Daniel White

unread,
Aug 10, 2017, 4:30:00 PM8/10/17
to OpenRefine
Thank you for your help! Although I wish it was easier, I'm glad I didn't waste your time on a basic use case

Thad Guidry

unread,
Aug 10, 2017, 4:36:28 PM8/10/17
to OpenRefine
This is actually quite a common case....because business folks and analysts create those kind of file structures all the time in Excel.

We should probably create an enhancement one-click that could do a shift of cells value into new columns... basically...all your data rows (not the metadata rows) move those into new columns..leaving the meta rows.  Because that's really all your doing.

I'll create that enhancement request now.  Quite useful I think.


On Thu, Aug 10, 2017 at 3:30 PM Daniel White <lights...@gmail.com> wrote:
Thank you for your help! Although I wish it was easier, I'm glad I didn't waste your time on a basic use case

--

Thad Guidry

unread,
Aug 10, 2017, 4:58:38 PM8/10/17
to OpenRefine
DONE !


Thanks for the use case reminder Daniel !

Happy refining,
-Thad

Ettore Rizza

unread,
Aug 11, 2017, 2:00:06 AM8/11/17
to OpenRefine
Hello Daniel.

I think you've skipped some of the operations. My fault, the GIF was probably too fast. Recap:

1° Import all your Excel files at once into a single Open Refine project.

2° Uncheck the box "parse next 1 line (s) as column header"

3° Use a text filter in the column that contains the words "year", "quarter", etc., and select all the lines that contain these words using a regular expression: year|quarter|myOther... (the pipe operator | means OR)

4° Create a new column based on these filtered rows using this formula: 

value + "," + cells ['Column name that contains the value of year, quarter'].value


The goal is to concatenate year and 2015 like this : "year, 2015".

5° Move the newly created column to the end of the file.

6° Use "split into several columns" to reconstruct two columns based on this column.

7° Use "columnize by key-value" to create as many new columns as you had metadata.

8° Fill down on these new columns

9° Delete the cells in excess by using a "facet by blank" then a "All > remove all matching rows".

Hope this helps.

Disclaimer: If these explanations have saved you time, or if you want to see new features in Open Refine, a small donation is always welcome to pay the developers. ;)
Reply all
Reply to author
Forward
0 new messages