Hello OpenRefine Community !
A lot of us dealing with open data often have to struggle with language accent characters called diacritics and need to replace them for reconciling, equivalence checking, or general English usage.
Marc Márquez
Marc Marquez
I've added a simple recipe to the Jython tutorial that can replace diacritic characters.
Note: The original strings need to be in unicode (utf-8), so ensure that your data is encoded properly when you first import into OpenRefine.
(This might land as a new Common Transformation in OpenRefine later on depending on what works better overall for our community...I'm unsure if either
Apache Commons Lang3 StringUtils.stripAccents ... or the above recipe use of
Python's unidecode library will work better. So far, it seems that Python's unidecode library converts better, but I'd like to gather opinions)
Happy Easter ! (in all languages! - with or without diacritics!)
-Thad