Column1 | Output |
---|---|
vii | 7 |
v | 5 |
xix | 19 |
iv | 4 |
Thanks, I can definitely see a recipe like this coming in handy.
This reminds me of something I’ve been thinking about while preparing library metadata for Wikidata upload – in your (or anyone on this list’s) MARC21 data cleaning experience, have you ever come across ways to parse MARC 300$a (extent) fields into something more… useful? I’m thinking something akin to the absolute number of pages script found here: http://www.aurochs.org/mashcat/pages.html but more OpenRefine friendly. However, I realize that Wikidata’s number of pages (P1104) is not exactly the most important property in there, so this is just one of the many fields I’m looking at.
Zoe
From: openr...@googlegroups.com <openr...@googlegroups.com>
On Behalf Of Thad Guidry
Sent: Thursday, May 27, 2021 9:04 PM
To: openr...@googlegroups.com
Subject: [OpenRefine] Roman numeral conversion
int
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
openrefine+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/openrefine/CAChbWaNWBxniPHnSaUVC%2Bh4_RGiejqSYr5iinm9n7xfcMeKUPg%40mail.gmail.com.
Get page numbers out:
value.rpartition(" pages")[0].rpartition(/\d+/)[1]
Get Roman numbers out:
value.rpartition(" pages")[0].rpartition(/\d+/)[0].rpartition("(")[2]
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/MN2PR08MB62720BDDEF814C8209D01F5D97229%40MN2PR08MB6272.namprd08.prod.outlook.com.
I’m thinking something akin to the absolute number of pages script found here: http://www.aurochs.org/mashcat/pages.html but more OpenRefine friendly
I see I shall be spending some quality time with rpartition()! Thank you, and thanks as well for that vision of “Extract MARC Fields” our support could make possible.
Cheers, and happy Friday!
Zoe
From: openr...@googlegroups.com <openr...@googlegroups.com>
On Behalf Of Thad Guidry
Sent: Friday, May 28, 2021 9:21 AM
To: openr...@googlegroups.com
Subject: Re: [OpenRefine] Roman numeral conversion
Hi Zoe!
Sometimes it is helpful to look at subpatterns in String (such as a MARC field).
Reverse partition rpartition() is a common utility GREL function that i use in lots of data extraction by subpatterns because it can start from the end of a string looking backwords, instead of partition() which starts from the beginning and looks forwards.
Subpatterns typically show up in MARC fields towards the end, so rpartition() comes in handy.
For example... XXX pages, XXX leaves, XXX linear ft.
As you can see with English the type or context is on the end of each phrase so we can extract by some "whitespace PATTERN STRING".
i. Get page numbers out:
value.rpartition(" pages")[0].rpartition(/\d+/)[1]
ii. Get Roman numbers out:
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaNaNYNSZnY3hu3oGyEnWCgKxSQvyJfXUtcVN20W8gtvMQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/MN2PR08MB6272A3B11C8CEF3167698E9497229%40MN2PR08MB6272.namprd08.prod.outlook.com.
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/fed99188-d078-4866-ac94-4da7c29b1637n%40googlegroups.com.