Custom text facet by uppercase?

95 views
Skip to first unread message

Jordon Steele

unread,
Feb 18, 2022, 12:09:26 PM2/18/22
to OpenRefine
Hello, I'm trying to create a custom text facet that groups results whose text string is in all caps versus not all caps (i.e. Titlecase or lowercase). I tried value.isUppercase but that does not seem to work. Any thoughts? Thank you!

Jordon

Jevon, Graham

unread,
Feb 18, 2022, 1:50:53 PM2/18/22
to openr...@googlegroups.com

Hi Jordon

 

I would think something like this might work:

 

if(value.contains(/^\p{Upper}{1,}$/),”ALL UPPER”,”Everything else”)

 

But there might be a more robust or more elegant solution.

 

This list of regular expressions is really helpful for building things like this: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

 

Thanks

 

Graham

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/423f0dbb-1d9e-4d3c-9ab3-bcdb82ad9d8en%40googlegroups.com.


 
******************************************************************************************************************
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print

Thad Guidry

unread,
Feb 18, 2022, 2:44:32 PM2/18/22
to openr...@googlegroups.com
Hi Jordan,

In GREL we designed the general functions into 2 broad categories:  those that perform some transformation (do something) and those that ask questions (often returning a boolean).
We've kept direct pattern matching functions out of GREL syntax.  I.E. those that ask a question about a specific pattern and return a boolean.

Instead, we offer wider, broader functions for asking those questions with any pattern you can conceive in RegEx:

contains()
endsWith()
startsWith()
find()
match()

This is not to say that "specific" pattern matching functions would not be useful in GREL directly.
I could certainly see us having new functions that ask the questions for our already included transformation functions:

toUppercase()
toLowercase()
toTitlecase()

Feel free to open an enhancement request if you think having corollary boolean functions for those would be useful:

isUppercase()
isLowercase()
isTitlecase()



Owen Stephens

unread,
Feb 21, 2022, 6:00:05 AM2/21/22
to OpenRefine
It may depend on what you mean when you say "whose text string is in all caps versus not all caps". Graham's solution looks for values that only contain uppercase characters - which would mean if there were any non-alpha characters in the string it would automatically be counted as "everything else" (since e.g. a space or a comma are not uppercase characters).

I'd suggest a simpler approach is to compare the original string to the string converted to uppercase which you can do with:

value==value.toUppercase()

This is a slightly different test to the one Graham suggests as the result of converting (e.g.) a comma 'to uppercase' is that you still get a comma - so this approach will treat a string like "UPPER, CASE" as 'all uppercase' even though it contains a comma and a space, whereas Graham's approach would treat that string as not all uppercase because of the comma and space.

This test will give 'true' if all the alpha characters in the original value are already uppercase and false if some of them were not already uppercase

Hope this is helpful

Best wishes

Owen
Reply all
Reply to author
Forward
0 new messages