Roman numeral conversion

84 views
Skip to first unread message

Thad Guidry

unread,
May 27, 2021, 9:04:41 PM5/27/21
to openr...@googlegroups.com
Hi OpenRefine Users!

This came up today helping uw.edu with some library MARC21 data cleaning.
Here's the recipe snippet that I found online and borrowed to add to our Recipes wiki page.
It uses Clojure instead of GREL or Python because there's no quick and easy direct support in those languages:

Column1 Output
vii 7
v 5
xix 19
iv 4

Enjoy!

Owen Stephens

unread,
May 28, 2021, 3:33:13 AM5/28/21
to OpenRefine
Thanks Thad

Interestingly conversion of roman -> arabic numerals is exactly the example I use when I train librarians with OpenRefine to show how to use languages other than GREL - although I use Python as the alternative, but it's a lot less compact than the Clojure you have here!

Owen

Dobbs, Zoe

unread,
May 28, 2021, 8:26:24 AM5/28/21
to openr...@googlegroups.com

Thanks, I can definitely see a recipe like this coming in handy.

 

This reminds me of something I’ve been thinking about while preparing library metadata for Wikidata upload – in your (or anyone on this list’s) MARC21 data cleaning experience, have you ever come across ways to parse MARC 300$a (extent) fields into something more… useful? I’m thinking something akin to the absolute number of pages script found here: http://www.aurochs.org/mashcat/pages.html but more OpenRefine friendly. However, I realize that Wikidata’s number of pages (P1104) is not exactly the most important property in there, so this is just one of the many fields I’m looking at.

 

Zoe

 

From: openr...@googlegroups.com <openr...@googlegroups.com> On Behalf Of Thad Guidry
Sent: Thursday, May 27, 2021 9:04 PM
To: openr...@googlegroups.com
Subject: [OpenRefine] Roman numeral conversion

int

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaNWBxniPHnSaUVC%2Bh4_RGiejqSYr5iinm9n7xfcMeKUPg%40mail.gmail.com.

Thad Guidry

unread,
May 28, 2021, 9:20:52 AM5/28/21
to openr...@googlegroups.com
Hi Zoe!

Sometimes it is helpful to look at subpatterns in String (such as a MARC field).
Reverse partition rpartition() is a common utility GREL function that i use in lots of data extraction by subpatterns because it can start from the end of a string looking backwords, instead of partition() which starts from the beginning and looks forwards.
Subpatterns typically show up in MARC fields towards the end, so rpartition() comes in handy.
For example...  XXX pages,  XXX leaves, XXX linear ft.
As you can see with English the type or context is on the end of each phrase so we can extract by some "whitespace PATTERN STRING".

  1. Get page numbers out:
    value.rpartition(" pages")[0].rpartition(/\d+/)[1]

  2. Get Roman numbers out:
    value.rpartition(" pages")[0].rpartition(/\d+/)[0].rpartition("(")[2]


You'll have to adjust some of those examples but it gives you an idea of how to partition by subpatterns...pages, leaves, etc. then take the first [0] part and then look again for more subpatterns by digits, etc.
If you really want or need some GREL utility functions or better yet, a complete OpenRefine extension, or GREL macros designed to parse out MARC fields, then we would gladly accept donations or IMLS grants to begin work on that.
Which would save you lots of trouble with lots of keeping lots of GREL recipes.
Imagine seeing a drop down menu that says Extract Marc Fields...and a nice dialog with drop down field selection like you'd have in MarcEdit, but for data extraction.
Which would be driven by Recipe macros that the community themselves could maintain and that the dialog automatically incorporates!

Owen Stephens

unread,
May 28, 2021, 9:40:36 AM5/28/21
to OpenRefine
On Friday, May 28, 2021 at 1:26:24 PM UTC+1 Zoe Dobbs wrote:

I’m thinking something akin to the absolute number of pages script found here: http://www.aurochs.org/mashcat/pages.html but more OpenRefine friendly

Apparently based on that original script by Tom, "Owen Stephens [me] managed to plug an adaptation of this script into Blacklight to create an index of book sizes" - which I have no recollection of 9 years later (http://www.aurochs.org/aurlog/2012/07/10/how-big-is-my-book-mashcat-session/)

But I don't think it would be too difficult to do the same thing in Python (or Clojure, if I wrote Clojure)

Dobbs, Zoe

unread,
May 28, 2021, 10:03:30 AM5/28/21
to openr...@googlegroups.com

I see I shall be spending some quality time with rpartition()! Thank you, and thanks as well for that vision of “Extract MARC Fields” our support could make possible.

 

Cheers, and happy Friday!

Zoe

 

From: openr...@googlegroups.com <openr...@googlegroups.com> On Behalf Of Thad Guidry
Sent: Friday, May 28, 2021 9:21 AM
To: openr...@googlegroups.com
Subject: Re: [OpenRefine] Roman numeral conversion

 

Hi Zoe!

 

Sometimes it is helpful to look at subpatterns in String (such as a MARC field).

Reverse partition rpartition() is a common utility GREL function that i use in lots of data extraction by subpatterns because it can start from the end of a string looking backwords, instead of partition() which starts from the beginning and looks forwards.

Subpatterns typically show up in MARC fields towards the end, so rpartition() comes in handy.

For example...  XXX pages,  XXX leaves, XXX linear ft.

As you can see with English the type or context is on the end of each phrase so we can extract by some "whitespace PATTERN STRING".

 

                                           i.         Get page numbers out:

value.rpartition(" pages")[0].rpartition(/\d+/)[1]

                                         ii.         Get Roman numbers out:

Thad Guidry

unread,
May 28, 2021, 10:32:56 AM5/28/21
to openr...@googlegroups.com

Joe Wicentowski

unread,
May 31, 2021, 12:10:15 AM5/31/21
to OpenRefine
Thanks, Thad and Owen! I was astounded at the brevity of the Clojure function. Having dabbled in scripts that perform this conversion in XSLT and XQuery, I adapted the key insights from the Clojure version into my XQuery version. While this isn't directly useful in an OpenRefine context, perhaps someone will find the write-up to be food for thought:


Cheers,
Joe

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages