Open Refine Bulk Editing Question

249 views
Skip to first unread message

Cecily Walker

unread,
Jun 6, 2021, 4:45:33 PM6/6/21
to OpenRefine
I am working with a data set of library catalog information in OpenRefine. It contains a column of call numbers that I have changed from numbers to text, because many of the call numbers begin with zero.

When I try to edit all instances of a call number in open refine by clicking the blue 'Edit' button in the column, OpenRefine automatically returns to the first page of the data. How do I stop it from doing this? Please see the attached gif for a visual.

CleanShot 2021-06-06 at 13.42.19.gif

Thad Guidry

unread,
Jun 6, 2021, 5:10:57 PM6/6/21
to openr...@googlegroups.com
The easiest way to edit "all instances" of a particular number/text is to use a Text Facet, then use the blue edit links in the Facet to perform bulk edits for all rows that contain that number/text.
What you are currently doing is an individual cell edit (like in Excel or LibreOffice and only modifying an individual cell).

So... OpenRefine has multiple ways to change values of cells, individually as you are doing, and explained here at the very bottom (notice table of contents on right hand side of page): https://docs.openrefine.org/manual/cellediting#edit-one-cell-at-a-time
And massively (bulk) through other mechanisms all of which are detailed on that same doc page beginning at the top here: https://docs.openrefine.org/manual/cellediting



On Sun, Jun 6, 2021 at 3:45 PM Cecily Walker <cecily...@gmail.com> wrote:
I am working with a data set of library catalog information in OpenRefine. It contains a column of call numbers that I have changed from numbers to text, because many of the call numbers begin with zero.

When I try to edit all instances of a call number in open refine by clicking the blue 'Edit' button in the column, OpenRefine automatically returns to the first page of the data. How do I stop it from doing this? Please see the attached gif for a visual.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/f49c92b9-c27e-460b-b7bc-cd8101b483b5n%40googlegroups.com.

Owen Stephens

unread,
Jun 7, 2021, 9:53:56 AM6/7/21
to OpenRefine
Hi Cecily,

I think the straight answer to your question is that it isn't possible to prevent OpenRefine returning to the first page of the data after clicking the "Apply to all identical cells".
In the OpenRefine 3.5 beta release (just out now) a new "jump to page" option is added to the OpenRefine interface which will allow for faster navigation back to your place in the project - but unfortunately it doesn't change the behaviour of jumping back to the start of the project after using "Apply to all identical cells"

 As Thad has said, it might well be better to do these edits via a Facet - although if you have a very large number of values then this could cause performance issues.

Another option is to write a custom text facet which isolates all the rows you need to edit - so for example if you know that all the rows that need sorting have a decimal point in them, you could write a custom text fact (Facet -> Custom text facet) like 
value.contains(".")

If you use the Facet to filter the rows you are seeing to only those that have the issue you need to fix, as you fix lines and click "Apply to all identical cells" you still jump back to the first line but the list of filtered rows also updates - so the first row you see is always the next one to fix

However I wonder if you could say a little more about what you are trying to achieve with these edits as it's possible there might be even more efficient ways of doing it. For example if you are trying to get rid of the zeros to the right of the decimal point, it would be possible to do that for all the call numbers at once rather than tackle them one at a time.

If you are able to share a bit more about the data set and where you want to get to with the set in the end, I (or others) might be able to suggest some approaches.

Owen

Cecily Walker

unread,
Jun 7, 2021, 10:07:01 AM6/7/21
to OpenRefine
Thank you for this explanation, Thad!

Cecily Walker

unread,
Jun 7, 2021, 10:07:44 AM6/7/21
to OpenRefine
Thank you so much for these suggestions, Owen! I'll give them a try right now.

Cecily Walker

unread,
Jun 7, 2021, 10:20:52 AM6/7/21
to OpenRefine
Hi again Owen, 

With the call numbers, I'm trying to remove any that have a single ".0" at the end, as call numbers in our library (or perhaps even generally) don't end in single zeroes. So a call number like 135.0 would be incorrect; 135 is the preferred number. After cleaning the data, I'll be doing some collections analysis looking at performance of particular call numbers over the last five years. I have several spreadsheets that contain similar data from 2016-2020, and each of these spreadsheets all contain the exact same errors, which occured when the raw data was then imported into an Excel spreadsheet for reuse/distribution. 

On Monday, June 7, 2021 at 6:53:56 AM UTC-7 Owen Stephens wrote:

Owen Stephens

unread,
Jun 7, 2021, 10:35:19 AM6/7/21
to OpenRefine
OK - there are probably a few ways to tackle all of these in one go. Essentially the first thing is to find all the call numbers that finish with ".0" and then keep only the numbers to the left of the decimal point. I'll suggest the first option which immediately come to mind, but there are definitely other approaches as well

Using a custom facet, then split
--------------------------------------------------
Create a custom text facet on the call number column with the expression:
value.endsWith(".0")

Then select "true" in that facet to select only the rows which end with ".0"
Then in the call number column use Edit cells -> Transform with the expression
value.split(".")[0]

The `split(".")` splits each string into two parts - the part to the left of the decimal point, and the part to the right. By using `[0]` you are selecting the first part (that to the left of the decimal in the original string)

If this doesn't quite achieve what you need let me know what the issues are and I can try to suggest improvements to this approach or different approaches to avoid the issues

Owen

Thad Guidry

unread,
Jun 7, 2021, 12:49:32 PM6/7/21
to openr...@googlegroups.com
Chomp is a great way to chomp and chew up the end of a string by a particular pattern.  Always loved our name.
It used to only work with a single char but long ago I asked David to allow it to work for any string separator pattern and since then it works with any sequence of characters... like your ".0"

So if you know that in your particular column you want to hack off any of the ".0" characters from the end of the string...
It will do this for all values in the column if you don't have any Facets and using Edit cells -> Transform...

value.chomp(".0")

image.png
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
Message has been deleted

Beckett, Michael

unread,
Jun 7, 2021, 2:39:06 PM6/7/21
to openr...@googlegroups.com
Cecily,
From your screenshot it looks like you're still previewing the facet; you haven't yet applied it to all the data. Click OK in the lower-left corner of the expression window to apply the facet; once you do that, it should work as Owen described.

-- Michael

On Mon, Jun 7, 2021 at 1:06 PM Cecily Walker <cecily...@gmail.com> wrote:
Hi Owen, 

Thanks for the help. This is what happens when I try your first suggestion: only 10 of the rows are returned. I'm not sure where to go from this point, as you can see from the image below. It doesn't seem to be catching any of the values that end with .0. 

CleanShot 2021-06-07 at 09.47.43.png

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages