Issue with Columnize by key/value column

86 views
Skip to first unread message

Melanie Belisle

unread,
Jul 18, 2017, 11:00:04 AM7/18/17
to OpenRefine
Hi,

I imported data from several data sets and get something like this:

I need each source file to represent one column with the data in Column2 being the corresponding data in each column.  I applied "Columnize by key/value column to Column1, expecting to get something like this:



but instead, I got this:


I would like to know if this is a bug, or if I've just misused/misunderstood Columnize by key/value.  Any suggestions on how to get the result in the second screen shot?

Merci,

Melanie

Owen Stephens

unread,
Jul 18, 2017, 12:55:12 PM7/18/17
to OpenRefine
Hi Melanie,

I *think* this is a bug - I think my expectation is the same as yours in terms of what outcome I'd expect here. 
There was an issue with key/value transponse reported a while back, but that was fix. I don't know if this is a variation on this issue (https://github.com/OpenRefine/OpenRefine/issues/796) or something different.

I'd suggest creating a new Issue at https://github.com/OpenRefine/OpenRefine/issues with the information you've put in your post here (what happened and what you expected to happen). Then someone can look and confirm whether this is a bug.

Owen

Melanie Belisle

unread,
Jul 19, 2017, 9:10:06 AM7/19/17
to OpenRefine
Cheers Owen.  Will do.

Mélanie

Also, edit from my original post : Any suggestions on how to get the result in the **first** screen shot?

Owen Stephens

unread,
Jul 20, 2017, 4:39:28 AM7/20/17
to OpenRefine
It seems tricky I'm afraid.

Depending on the size of your data set it might be easier to do this in another tool (e.g. export to a spreadsheet then just move the values in the incorrectly aligned columns up/down)

If you want to do it in OpenRefine the only way I've come up with is to force OR to treat this dataset as a single record, then to join/split the cells in your columns. You can do this by:

Adding a new column (use 'Add column based on this column') with the same value (e.g. "1") in all cells. I'll call this column 'record'
Move this column to be the first column in the project, and then use 'Edit cells->Blank down' to remove all values except in the first cell
Force OR into Records mode by clicking the 'Records' link
Then on each of your columns use 'Edit cells->Join multivalued cells', with an appropriate separator (e.g. | )
Use 'Edit cells->Split multi-valued cells' on the columns
Remove your 'records' column

Ettore Rizza

unread,
Jul 20, 2017, 12:28:55 PM7/20/17
to OpenRefine
Definitely a bug. If you do not want to fix it in a spreadsheet, as Owen recommends, here's a way to do it in OpenRefine. The "ff" column I have created for the demo only serves to create a record, which then allows join and split multivalued cells. 

Melanie Belisle

unread,
Jul 21, 2017, 4:53:33 PM7/21/17
to OpenRefine
Thanks both for your replies.  It was a lot of data, so I ended up just writing up a python script to format it.

Hopefully the bug is fixed soon..

Cheers

Antonin Delpeuch (lists)

unread,
Jul 22, 2017, 10:03:30 AM7/22/17
to openr...@googlegroups.com
Bonjour Mélanie,

It would be very helpful if you could provide a small example dataset
(as a .csv for instance) where the bug occurs. This will help the dev
team to find a fix for your bug.

Merci!
Antonin

On 21/07/2017 20:08, Melanie Belisle wrote:
> Thanks both for your replies. It was a lot of data, so I ended up just
> writing up a python script to format it.
>
> Hopefully the bug is fixed soon..
>
> Cheers
>
> On Thursday, July 20, 2017 at 12:28:55 PM UTC-4, Ettore Rizza wrote:
>
> Definitely a bug. If you do not want to fix it in a spreadsheet, as
> Owen recommends, here's a way to do it in OpenRefine. The "ff"
> column I have created for the demo only serves to create a record,
> which then allows join and split multivalued cells.
>
> <https://lh3.googleusercontent.com/-X7WBYTuXZbA/WXDaC_VQNuI/AAAAAAAAUIQ/iiYU3uQp-zcJiCy0DtVuyxatrunVlgURQCLcBGAs/s1600/screencast.gif>
>
>
>
> Le mardi 18 juillet 2017 17:00:04 UTC+2, Melanie Belisle a écrit :
>
> Hi,
>
> I imported data from several data sets and get something like this:
>
> I need each source file to represent one column with the data in
> Column2 being the corresponding data in each column. I applied
> "Columnize by key/value column to Column1, expecting to get
> something like this:
>
>
>
> but instead, I got this:
>
>
> I would like to know if this is a bug, or if I've just
> misused/misunderstood Columnize by key/value. Any suggestions
> on how to get the result in the second screen shot?
>
> Merci,
>
> Melanie
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Ettore Rizza

unread,
Jul 22, 2017, 10:12:31 AM7/22/17
to OpenRefine
Salut Antonin, 

I did a test with the same data as Melanie (file attached) and I have the same bug with open refine 2.7. No time today, but will have to mention it in Github, of course.
clipboard.csv

Antonin Delpeuch (lists)

unread,
Jul 22, 2017, 11:01:31 AM7/22/17
to openr...@googlegroups.com
Sorry, I did not realize Mélanie already opened an issue here:
https://github.com/OpenRefine/OpenRefine/issues/1214
Thank you!

Antonin
> > an email to openrefine+...@googlegroups.com <javascript:>
> > <mailto:openrefine+...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.

Owen Stephens

unread,
Jul 28, 2017, 7:42:49 AM7/28/17
to OpenRefine, li...@antonin.delpeuch.eu
Thanks to Melanie for filing this as an issue - it generated quite a bit of a discussion but finally concluded that this is not a bug, but the intended (although undocumented) behaviour.

Basically 'order matters' when using "Columnize by key/value columns". The outcome for:

KEY COL,VALUE COL
key
1,value 1
key
2,value 2
key
3,value 3
key
1,value 4
key
2,value 5
key
3,value 6

is different to

KEY COL,VALUE COL
key
1,value 1
key
1,value 4
key
2,value 2
key
2,value 5
key
3,value 3
key
3,value 6

(same data, different order)

The reason for this is that the behaviour of the "Columnize by key/value columns" is designed to cope with a 'record' based situation. It does this by assuming that the values associated with the first 'key' it encounters are primary key values, and then groups subsequent values under this primary key (using the OpenRefine record approach). This means that the function works for examples like:

KEY COL,VALUE COL
merchant
,Katie
fruit
,apple
price
,1.2
fruit
,pear
price
,1.5
merchant
,John
fruit
,banana
price
,3.1

However, the same assumptions in the transpose function that make this example work OK, lead to the outcome Melanie has described for her data set.

I've undertaken to write some documentation for the Transpose functions to make it clearer how these work. I'll add a link from this thread to the documentation once I've completed it. Please feel free to ask questions here or help out with the documentation (once I've made a start)! 

Owen


Ettore RIZZA

unread,
Jul 28, 2017, 7:46:05 AM7/28/17
to openr...@googlegroups.com
Here is probably a more classic dataset to test "columnize by key-value" and its counterpart, "transpose cells accross columns into rows"

country,year,key,value
Afghanistan,1999,cases,745
Afghanistan,2000,cases,2666
Brazil,1999,cases,37737
Brazil,2000,cases,80488
China,1999,cases,212258
China,2000,cases,213766
Afghanistan,1999,population,19987071
Afghanistan,2000,population,20595360
Brazil,1999,population,172006362
Brazil,2000,population,174504898
China,1999,population,1272915272
China,2000,population,1280428583

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

qi cui

unread,
Jul 29, 2017, 11:16:06 AM7/29/17
to OpenRefine
There is one from Unit Test:

If you want to play with the different date set,it is a good start.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages