Joining multi-valued cells/splitting column problems

420 views
Skip to first unread message

olivia solis

unread,
Apr 10, 2017, 7:53:00 PM4/10/17
to OpenRefine
Hello all,

Sorry if this is a known problem, but I am having some difficulty combining some data within a project. I am dealing with some extent units and numbers within records. I have extent numbers in one column and units in another. The extent units vary and records with multiple numbers/units have each listed on a separate row in the record: 



I want to change some of the records, those that have "feet" and "inches" only (always in that order) to be listed on one row in terms of feet. These records are have an "x" in "ft_in_only". For example:

2     feet
6     inches

in the example above for id_001 would become

2.5    feet

My plan:
1) isolate records with just feet/inches as in units by filtering for "x" in a text facet on "ft_in_only".
2)  isolate records with "inches" in "extent_type"
3) convert extent to a data type of number and divide value by 12
4) Join these isolated rows with a separator of ", ".
5) Split the columns based on the separator.
6) Make sure values in extent_number corresponding to feet are numbers (not text) 
7) Add the two now split columns "extent_number 1" and "extent_number 2".
8) remove the rows with inches in the "unit" but nothing in "number"

Things aren't going according to plan.
Steps 1-3 went according to plan:

However, when I join the isolated cells using Edit cells--> Join multivalued cells with a separator of ", ", it joins all the cells in the column, not just the ones in the isolated rows. Is this expected? I thought a feature of OpenRefine (or LODRefine in my case) is that it edits only isolated rows. See that the isolated rows were joined:

But so were all the others:



Next, I tried isolating the rows with "x" (those with just "feet" and "inches" only in "Unit") and splitting the columns using Edit column --> Split into several columns. This also did not go as I expected. It indeed split the columns I wanted it to:




But it also deleted the values in every other cell in the column that was not isolated:


This seems like a bug. Is this a known problem? I see this workaround:

 https://groups.google.com/forum/#!topic/openrefine/gZXJztLOcYk

But this nonetheless seems like suspect behavior. While I'm using LODRefine, I have also noticed this behavior occurring in OpenRefine 2.6. Attaching my example data.


Thanks!

Olivia


Extents.csv

Thad Guidry

unread,
Apr 10, 2017, 8:47:36 PM4/10/17
to OpenRefine
Olivia,

From your screenshots, It looks like its flipping back to rows mode...instead of staying in records mode ?
Or is it that you forgot to change back to records mode after doing some all rows operations ?
Or is it magically changing to rows mode from records mode, after your doing some specific operations ?

-Thad



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
Apr 10, 2017, 8:53:07 PM4/10/17
to OpenRefine
and YES, OpenRefine performs mass edits ONLY to isolated cells, that you are faceted against.  Recognizing the difference between Rows mode and Records mode however ! :)

-Thad

olivia solis

unread,
Apr 10, 2017, 9:22:57 PM4/10/17
to OpenRefine
Hi Thad,
I may not be understanding how OpenRefine is supposed to work in this instance, but here is a screencast of my manipulations:

I am switching back and forth between records and rows depending on what I want to do. For instance, when I only want to change the inches for the selected records (those with "x"), I switch to rows mode. When I want to join the rows within a record, I switch back to records mode. I would think regardless OpenRefine shouldn't manipulate anything that is not isolated on screen.

Does this help?

Thanks!
Olivia

Thad Guidry

unread,
Apr 10, 2017, 11:48:18 PM4/10/17
to openr...@googlegroups.com
Split and Join for multivalued cells is for MULTIVALUED cells.  like "George W. Bush, Washington, D.C, US President"

Both operations are not intended to operate as your intending.  However, we probably could somehow perform some magic function with an enhancement like that. :)
The VIB-BITS extension might have something along those lines...since Herzog worked a lot with Bioinformatics records.  So you can certainly download that and play with it or ask him about it.

Your use case is a perfect fit for Transpose by Key/Value columns however !

1. Create the project.
2. Stay in Rows mode
3. Fill down on your column ID
4. Columnize by key column UNIT and value column NUMBER with no note column.
5. Perform operations to modify your FEET column and then your INCHES column.
6. Perform concatenation by adding a new column that adds the numeric values in column FEET and column INCHES.
    Example. cells.feet.value + cells.inches.value

Attached is your example Extents.csv with the Undo/Redo history of how I performed it, included in the OpenRefine project.

-Thad
Extents.openrefine.tar.gz

Thad Guidry

unread,
Apr 10, 2017, 11:53:45 PM4/10/17
to openr...@googlegroups.com
P.S.
You can give back by writing up a Recipe page for our Wiki, Or blog about your solution and tweet #OpenRefine rocks :)  Either way, helps the community knowledge base.

-Thad

Ettore Rizza

unread,
Apr 11, 2017, 5:37:47 AM4/11/17
to OpenRefine
Hi Olivia,

You're right, Open Refine manipulate only things isolated on screen. But there is a subtlety in this case: the "join multivalued cells" function always applies to all the values of the record for the column in question, even those that are not visible. To say it another way, the function only applies to the records displayed on the screen, ok, but on all its lines. Thad is right, you should use "columnize by key value" to get what you want. But here is a little screencast that shows how to do the same with the method you have chosen.

olivia solis

unread,
Apr 11, 2017, 9:51:41 AM4/11/17
to OpenRefine
Thank you both for your explanations! The columnize by key value is definitely something I can see a use for in cleaning up a lot of the data I work with, and it is making me rethink how I am dealing with my data. In general, Ettore's solution was how I was thinking because I need to export the units as well as the number as JSONs through the Export -> Templating feature. Some if the IDs will have multiple nested JSON records with extents subrecords: e.g. {"id":"5","extents": [{"number":"12","unit":"reels"},{"number": "3","unit":"photographic_prints"}...]}. However I will need to put each number/unit combo on the same row as the ID, so I could modify the template to include unit values based on the column header if I use columnize by key value. 

I can certainly post the solution Thad devised on your GitHub/recipes page.

I am still not clear on why OpenRefine deleted some of the values in the "Number" columns in the records that I was not displaying when I split the column. 

Thank you for pointing me to the VIB-bits plugin, Thad!

Best,
Olivia

Thad Guidry

unread,
Apr 11, 2017, 10:02:29 AM4/11/17
to OpenRefine
Your welcome,

VIB-BITS has a Pivot function that works similarly to Transpose with other options
It also has a visual selector Cross function...unlike our built-in Cross function with no visual selector other than the GREL expression box.

-Thad


Reply all
Reply to author
Forward
0 new messages