Unicode Hex values against API call

41 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
Jun 18, 2021, 4:16:51 AM6/18/21
to openr...@googlegroups.com
Dear All

I'm trying in OR 3.4.1 to fetch machine translation results in Indian languages from https://api.mymemory.translated.net (limited call) but it is getting stored in Unicode hex values in OR (column 1 in picture):

Screenshot from 2021-06-18 13-29-56.png
When I copy and paste results manually it shows properly (columns 2 & 3 in picture).

I've tried to transform by value.reinterpret("utf-8") but to no effect.  My locale is en_IN.UTF-8 and refine.ini includes JAVA_OPTIONS=-Dfile.encoding=UTF-8.

After reading a bit on the issue here, I understood that OR 3.4.1 may have a related bug and the issue has been addressed in OR 3.5 Beta.

So, installed OR 3.5 Beta side by side and fetched again:

This time it always resulted in an empty column (even store error is not showing anything).

You may check it here:


What am I missing?

Thanks & regards


-----------------------------------------------------------------------
Parthasarathi Mukhopadhyay
Professor, Department of Library and Information Science,
University of Kalyani, Kalyani - 741 235 (WB), India
-----------------------------------------------------------------------

Owen Stephens

unread,
Jun 18, 2021, 7:04:15 AM6/18/21
to OpenRefine
Hi,

In OR 3.4.1 if I use "parseJson()" on the returned JSON this correctly interprets the unicode values into characters:

Screen Shot 2021-06-18 at 11.59.49.png
I can then use the usual approach to extracting the required values e.g.
value.parseJson().responseData.translatedText

In OpenRefine 3.5.beta1 I see the same issue as you - nothing retrieved, no errors shown. This looks like a bug to me - I'll do a bit more investigation and if it proves to be a bug will add an issue to github

Owen

Owen Stephens

unread,
Jun 18, 2021, 7:28:56 AM6/18/21
to OpenRefine
OK - the issue with OpenRefine 3.5.beta1 seems to be about non-escaped characters in the URL. The URL you shared is like:

Using this URL as is fails for me in OpenRefine. I'm not sure at what point it fails, but given the speed of the failure my guess is that it's at the OpenRefine end. However, if I URL (percent) encode the | and @ symbols to give a URL like:


Then the requests succeeds. 

Since @ is a 'reserved' character in URLs and | is an 'unsafe' character in URLs (https://www.ietf.org/rfc/rfc1738.txt) they should both be encoded in this particular situation so I think requiring this to be encoded is the right thing - but it needs better feedback to the user if this is going to fail before the request is made.

Owen
 n.b. the "@" is a more complex case as if it is being used for it's reserved purpose is should be unencoded, but I don't have a test case for this so not sure how to see if 3.5.beta1 is doing the correct thing here

Parthasarathi Mukhopadhyay

unread,
Jun 18, 2021, 8:16:07 AM6/18/21
to openr...@googlegroups.com
Thanks Owen.

Both solutions worked for me like a charm.

Heartfelt thanks and best regards

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/eb80b40e-b9c3-4cc3-9c16-eda410f9bcb3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages