Using TextRazor REST API in Open Refine

302 views
Skip to first unread message

Miquel Centelles

unread,
May 20, 2017, 1:00:12 PM5/20/17
to OpenRefine
Hi,

I'm thinking about using TextRazor services in Open Refine projects. One of the uses I'm planning is in named entities recognition on text columns. Specifically, I would use "Add column by fectching URLs based on column...". The point is that I have no enough information about the syntax for building the pattern of URLs in TextRazor, including the API_KEY. Have anybody ever used this API in this way? Any tips will be appreciate.

Thank you in advance.

Ettore Rizza

unread,
May 20, 2017, 2:12:21 PM5/20/17
to OpenRefine
Hi Miquel,

I have not worked with this API for more than a year, but last time it only had an HTTP POST method. The traditional method used by Open Refine only accepts GET requests, but it should be possible to use the POST method via Jython. I just read a nice tutorial about it.

It should also be possible to install the Python SDK of the API in Jython using this method (a bit complicated the first time).

Thad Guidry

unread,
May 20, 2017, 2:21:43 PM5/20/17
to OpenRefine
Agreed with Ettore.

REMEMBER: When you need something really special in OpenRefine, its best to just drop to Python as your expression language.
REMEMBER: Always search our mailing list when you get stumped :)  https://groups.google.com/forum/#!forum/openrefine

TextRazor needs the API in the header as shown in their CURL example with -H

curl -X POST \
    -H "x-textrazor-key: YOUR_API_KEY" \
    -d "extractors=entities,entailments" \
    -d "text=Spain's stricken Bankia expects to sell off its vast portfolio of industrial holdings that includes a stake in the parent company of British Airways and Iberia." \
    https://api.textrazor.com/

Here's a guy that had a similar need for putting the API in the HTTP Header and my reply to him.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miquel Centelles

unread,
May 21, 2017, 5:52:45 AM5/21/17
to OpenRefine
Thank you very much to both of you. I'm going to proceed your directions. 

And I beg your pardon for not checking backwards more carefully. I know it for next situations.

Miquel Centelles 

El dissabte, 20 maig de 2017 19:00:12 UTC+2, Miquel Centelles va escriure:

Ettore Rizza

unread,
Jun 15, 2017, 11:56:13 AM6/15/17
to OpenRefine
Good new, it looks like one can install the textrazor SDK using the method described in this tutorial : https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules

Once Jython is installed, go to its "bin" folder and launch this command line :

jython pip install textrazor


Finally, you can use the  SDK in Open Refine (replace D:\Jython2.7.0\etc with your own Jython path) :

Code Jython 

import sys
sys
.path.append(r'D:\jython2.7.0\Lib\site-packages')

import textrazor

textrazor
.api_key = "YOUR API KEY"

client
= textrazor.TextRazor(extractors=["entities", "topics"])
response
= client.analyze(value)

for entity in response.entities():
    result
= entity.id, str(entity.relevance_score), str(entity.confidence_score), entity.freebase_types[0]

return ":::".join(result)

Reply all
Reply to author
Forward
0 new messages