non-breaking spaces and trim

47 views
Skip to first unread message

fabius pocus

unread,
Sep 10, 2022, 4:22:41 PM9/10/22
to OpenRefine
Hi. I have a txt file with a particular string:

Ciao.  

It has a non-breakable space at the end. I checked the option 'Trim leading & trailing whitespace from strings' before to create my project. It fails because I think it can't recognize this type of spaces. Once the project is created, the trim() command can make this task without any problems. Can you confirm this? Thanks a lot,

Fabio 

Owen Stephens

unread,
Sep 11, 2022, 5:13:51 AM9/11/22
to OpenRefine
I can confirm that 'Trim leading & trailing whitespace from strings' on import and the GREL trim() command work differently.
The detail:

Trim leading & trailing whitespace from strings -> uses the Java String 'trim()' function defines whitespace as any character with a code point less than or equal to the space code point (\u0020) 

GREL 'trim()' -> uses com.google.common.base.CharMatcher.whitespace() which defines whitespace as any unicode whitespace character (see https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bwhitespace%7D)

I don't know why these are different but it seems likely to me this is just some code history rather than something specifically planned. A Github issue could be created to bring these two things into line with each other if that was desirable (and that could be a place to discuss the impact, if any, of making such a change)

Owen

fabius pocus

unread,
Sep 11, 2022, 5:17:01 AM9/11/22
to openr...@googlegroups.com
Thanks Owen

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/3476a71a-244b-4db3-9ece-317955bd4ea5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages