Shift values in multiple rows

918 views
Skip to first unread message

gg_16

unread,
Mar 3, 2016, 2:56:15 PM3/3/16
to OpenRefine

Hi,

I have a large dataset in OpenRefine and I need to shift a value down by one cell - in multiple rows.  For example:

Name        Age
Mary          20
Sam
Abigail           
Kim           10
Tom
Brown           
Sue           30
Allen 
Cary          40
Michelle     

I want to shift all the values for Age - down - by one row as follows: 


Name        Age
Mary          
Sam          20
Abigail           
Kim           
Tom          10
Brown           
Sue           
Allen          30 
Cary            
Michelle     40


How can I do this in OpenRefine?
Thank you.


Jonathon Paarlberg

unread,
Mar 4, 2016, 3:00:12 PM3/4/16
to OpenRefine
I don't think you should really use OpenRefine to do that sort of thing. There might be a way to do it, but it will be rather complicated as compared to just editing the file in a spreadsheet software like Excel before opening it in OpenRefine.

Jonathon Paarlberg

unread,
Mar 4, 2016, 3:02:56 PM3/4/16
to OpenRefine
On second thought, you may have a good reason for using OpenRefine to do it. I apologize for my knee-jerk reaction. It can probably be done by utilizing the row index. Let me look into it.

gg_16

unread,
Mar 4, 2016, 3:19:53 PM3/4/16
to OpenRefine
yes, I have been working on extracting certain pieces of information from a large html source page. Open Refine has been an excellent tool for me to transform the source data to columns of information that I can easily export to our database for further analysis. Except that when I did the transformations, there was this one problem of values in the wrong row. Its been several months of trying different approaches including manually fixing it. But, in the long run, I would love to have it fixed with OpenRefine itself. And I came to the same conclusion as yourself that it cannot be done. So, I thought I could post and see if someone had already figured this one out. I was also thinking along the lines of using row index but didn't know how that could be done. Thank you for trying and no problem it can't be figured out. Atleast we tried!

Thad Guidry

unread,
Mar 4, 2016, 4:12:16 PM3/4/16
to openrefine

John Little

unread,
Mar 7, 2016, 9:55:59 AM3/7/16
to openr...@googlegroups.com
Try this recipe based on the sample data you listed.  The process requires shifting from rows to records mode and back.

1) Shift to "records" mode:  Show as > records

2) Make a copy of the Age column, new name = "record ID", and move the "record ID" column to the first column:
Age > Edit column > Add column based on this column...  > New column name = record ID > OK
record ID > Edit column > Move column to beginning

3) Join multi-valued cells in the Name column:
Name > Edit cells > Join multi-valued cells...  (use a unique character separator to separate the values, e.g. | )

4) Split Name column into several columns.  Use the same separator as used in step 3
Name > Edit column > Split into several columns...  > by separator  ; Separator = |

5) Move Name 2 column in front of Name 1 column
Name 2 > Edit column > Move column left

6) Transpose all Name columns back into a multi-valued record
Name 2 > Transpose > Transpose cells across columns into rows...
   From Column = Name 2
   To Column = Name 3
   One Column = Name

7) Remove the record ID column and Show as "rows" mode
record ID > Edit column > Remove this column
Show as > rows

I hope this helps....

--John




--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Little

unread,
Mar 7, 2016, 9:59:30 AM3/7/16
to openr...@googlegroups.com
Ooops. Sorry.  That doesn't do quite what you want.  I'll have another look.

John Little

unread,
Mar 7, 2016, 12:34:27 PM3/7/16
to openr...@googlegroups.com
Here's a revised recipe.  

1) Age > Edit column > Add column based on this column...  (column name = rec1 ID)
2) rec1 ID > Edit column > Move column to beginning
3) "Show as:"  records
4) rec1 ID > Edit column > Add column based on this column... 
   New column name = "record ID"
   Expression = row.record.index
5) record ID > Edit cells > Transform ...  
   Expression = value + 1
6) record ID > Edit column > Add column based on this column... 
   New column name = "record ID 2"
   Expression = value
7) record ID 2 > Edit cells > Blank down
8) record ID 2 > Edit cells > Transform...
   Expression = value -1
9) record ID 2 > Edit cells > Transform...
   Expression = if(isNull(value),cells["record ID"].value,value)
10) Name > Edit column > Add column based on this column...
   New column name = Names 2
   Expression = value
11) Names 2 > Edit Cells > Join mulit-valued cells
   Separator = |
12) Names 2 > Edit column > Split into several columns...
   Separator = |
13) Names 2 2 > Edit Cells > Transform...
   Expression = value + "|" + cells["Age"].value
14) Names 2 1 > Transpose > Transpose cells across columns into rows...
   From Column = Names 2 1
   To Column = Names 2 3
   One column = Name 2
15) record ID > Edit cells > Fill down
16) record ID 2 > Edit cells > Transform...
   Expression = if(isNull(value),cells["record ID"].value,value)
17) "Show as:" rows
18) Name 2 > Customized facets > Facet by blank
   select "true" facet
19) All > Edit rows > Remove all matching rows
20) Close sidebar facet "Name 2"
21) Name 2 > Text filter
   Expression = |
22) Name 2 > Edit column > Add column based on this column...
   New column name = Age 2
   Expression = value.split("|")[1]
23) Name 2 > Edit cells > Transform ...
   Expression = value.split("|")[0]
24) Go to the "Facet / Filter" sidebar and click the "Remove All" button


I can send you the "Operation History" if you want but I wrote it out because the switch from rows to records (step 3) and back (step 17) is important to seeing how this works.

I hope this helps.

--John

Owen Stephens

unread,
Mar 7, 2016, 2:01:26 PM3/7/16
to OpenRefine
As has been noted, this isn't entirely straightforward in OpenRefine. As John notes you'll need to move to Records mode to achieve this (as it is only in Records mode that you have access to values from rows other than the one you are working on).

My recipe is given below. The first few of steps are essentially the same as John's I think, but then I've got a quicker way of shifting the values across rows I think:

1) Add a new column based on the Name col:
Name > Edit column > Add column based on this column... 
Call the column 'index' (or similar)
 In expression type:
""
This creates a new but empty column

2) Move the new 'index' column to be the first column in the project
index > Edit column > Move column to beginning

3) Edit the cell in first row of the 'index' column to have some value in it - could be anything - e.g. 'first'

4) Change to 'Record' mode by clicking 'Show as: records' link (top left of project)
This should create you a single record based on the index column only having a value in the first row.
 If this doesn't work, then use 'Fill down' and then 'Blank down' on the Index column, and you should find this forces the shift to a single record

5) Modify the cells in the 'Age' column so empty cells have a placeholder value in them - this is to make sure the next step (which ignores empty cells) works correctly
Age > Edit cells > Transform, and in the Expression box type
if(isBlank(value),"null",value) 
 
6) Move the values in the Age column down one cell
Age > Edit cells > Transform
In expression type:
row.record.cells["Age"].value[rowIndex-1]

The 'row.record.cells["Age"].value' part of this creates an array of values using all the values in the Age column - because they are now all part of the same Record (which is what steps 1-5 achieved). You can then extract the value from the row above using 'rowIndex' which gives you a row number

Note that the first row in the project will get the value from the last row in the project using this expression. If this is 'null' you don't need to worry about this.

7) Remove the dummy values in the Age column
Age > Edit cells > Transform
In expression type:
if(value=="null","",value)

8) Remove the index column
index > Edit column > Remove this column
9) Make sure you are back in Row mode by clicking 'Show as: rows' link 


Gretsi George

unread,
Mar 7, 2016, 4:00:02 PM3/7/16
to openr...@googlegroups.com
Thank you so much.. I am in a bit of  crunch with time for other projects. But I will respond to this post with updates once I implement your solutions on my dataset. Thank you again!

Gretsi George

unread,
Mar 10, 2016, 11:13:47 AM3/10/16
to openr...@googlegroups.com
Hi.

I've tried both John and Owen's solutions and they both work.  Both solutions begin with using the same idea of changing rows into records. That was interesting and new for me.

I am gonna use Owen's solution for my dataset since it is shorter but John's solution shows some interesting ways to manipulate data in Google Refine which is also something new that I learnt.

Thank you again for your effort and time on this.

Andrea Zanni

unread,
Mar 11, 2016, 6:37:21 AM3/11/16
to openr...@googlegroups.com
Thanks everyone.
Sorry to bother about this, but could you post both the question and answer on StackOverflow? I find it the best place to store this kind of technical but helpful information.

Aubrey

Owen Stephens

unread,
Mar 11, 2016, 8:09:16 AM3/11/16
to OpenRefine
If people ask questions on Stackoverflow I post the answers there, but I'm not sure about posting questions simply to post an answer. I think the OpenRefine wiki would probably be a better place to post information for future reference? Perhaps on the 'Recipes' page https://github.com/OpenRefine/OpenRefine/wiki/Recipes - I'd be happy to post something there.

The technique of taking advantage of the 'record' mode to work across rows that I describe in my answer is already described on Stackoverflow in two answers I've posted:


Owen

Tom Morris

unread,
Mar 11, 2016, 2:44:06 PM3/11/16
to openr...@googlegroups.com
On Fri, Mar 11, 2016 at 6:36 AM, Andrea Zanni <zanni.a...@gmail.com> wrote:

Sorry to bother about this, but could you post both the question and answer on StackOverflow? I find it the best place to store this kind of technical but helpful information.

We're very grateful to everyone who contributes here and helps out the community and don't want to impose on them even more.

Rather than burdening the experts, who are already making significant suggestions, I'd suggest someone else step up to help curate answers into a more reusable form. I agree with Owen that the OpenRefine wiki is the most logical place for this.

StackOverflow is a great resource, but there's lots of information on the Web in other places -- perfectly appropriately.

Tom

Andrea Zanni

unread,
Mar 11, 2016, 4:16:56 PM3/11/16
to openr...@googlegroups.com

On Fri, Mar 11, 2016 at 8:44 PM, Tom Morris <tfmo...@gmail.com> wrote:
We're very grateful to everyone who contributes here and helps out the community and don't want to impose on them even more.

Rather than burdening the experts, who are already making significant suggestions, I'd suggest someone else step up to help curate answers into a more reusable form. I agree with Owen that the OpenRefine wiki is the most logical place for this.

StackOverflow is a great resource, but there's lots of information on the Web in other places -- perfectly appropriately.


I agree, I didn't mean to ask yet another thing to the experts who are giving their time to help others. The point is that SO is a community with its own rules, and especially reputation: it wouldn't be appropriate for me to make the question or the answer... That's why I asked.

Aubrey

Andrea Zanni

unread,
Mar 12, 2016, 4:38:25 AM3/12/16
to openr...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages