Managing POST request in OpenRefine

159 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
Mar 7, 2022, 10:45:44 AM3/7/22
to openr...@googlegroups.com
Dear all

There is an awesome automated subject indexing system - Annif (annif.org) with an API service - api.annif.org. But unfortunately the API based  "suggest" method only supports POST requests, not GET.

I'm getting results/suggestions for possible subject headings for this POST request in the terminal with the content of an abstract of a journal paper-

curl -X POST --header 'Content-Type: application/x-www-form-urlencoded' --header 'Accept: application/json' -d 'text=The authors evaluated emotional distress among 9th-12th grade students, and examined whether the association between being lesbian, gay, bisexual, and/or transgendered (i.e., "LGBT") and emotional distress was mediated by perceptions of having been treated badly or discriminated against because others thought they were gay or lesbian. Data come from a school-based survey in Boston, Massachusetts (n = 1,032); 10% were LGBT, 58% were female, and ages ranged from 13 to 19 years. About 45% were Black, 31% were Hispanic, and 14% were White. LGBT youth scored significantly higher on the scale of depressive symptomatology. They were also more likely than heterosexual, non-transgendered youth to report suicidal ideation (30% vs. 6%, p < 0.0001) and self-harm (21% vs. 6%, p < 0.0001). Mediation analyses showed that perceived discrimination accounted for increased depressive symptomatology among LGBT males and females, and accounted for an elevated risk of self-harm and suicidal ideation among LGBT males. Perceived discrimination is a likely contributor to emotional distress among LGBT youth.&limit=10&threshold=0.3' 'http://api.annif.org/v1/projects/yso-en/suggest'

Results

{
  "results": [
    {
      "label": "sexual minorities",
      "notation": null,
      "score": 0.4889162480831146,
      "uri": "http://www.yso.fi/onto/yso/p1828"
    },
    {
      "label": "homosexuality",
      "notation": null,
      "score": 0.3382744789123535,
      "uri": "http://www.yso.fi/onto/yso/p1825"
    },
    {
      "label": "bisexuality",
      "notation": null,
      "score": 0.3371562361717224,
      "uri": "http://www.yso.fi/onto/yso/p1830"
    },
    {
      "label": "young people",
      "notation": null,
      "score": 0.3130534589290619,
      "uri": "http://www.yso.fi/onto/yso/p11617"
    }
  ]
}

Is it possible to use this API service in OpenRefine against a dataset of journal papers on the top of the abstract column?

Regards


Parthasarathi Mukhopadhyay

Professor, Department of Library and Information Science,

University of Kalyani, Kalyani - 741 235 (WB), India

hpiedcoq

unread,
Mar 7, 2022, 7:59:51 PM3/7/22
to OpenRefine
Hi!

Yes it is possible:

- create a new column named "curl", where you craft your curls (one per line)
- Once done, you create a new column based on this column using the followin jython script (select jython, not GREL) :

#####
# This jython2.7 script has to be executed as jython, not GREL
# It allows you to execute a command (CLI) in the terminal and retrieve the result.
# H/T to Ettore Rizza
# import basic librairies
import time
import commands
import random
# get status and output of the command
status, output = commands.getstatusoutput(value)
# add a random between 2 and 5s pause to avoid ddos on servers... Be kind to APIs!
time.sleep(random.randint(2, 5))
# returns the result of the command
return output.decode("utf-8")
#####

- Parse the result usint the parseJson function.

Be aware to respect the API standards by adapting the random pause.

Hervé

Parthasarathi Mukhopadhyay

unread,
Mar 12, 2022, 1:25:14 PM3/12/22
to openr...@googlegroups.com
I got a half solution for this issue.

If we use python/jython option in OpenRefine like (need to wait 2/3 minutes to get the result):

import urllib2
url = "http://api.annif.org/v1/projects/yso-en/suggest"
data = "text=" + value.encode("utf-8") + "&limit=10&threshold=0.3"
post = urllib2.urlopen(url, data)
response = post.read()
return response

It is giving me results in Preview as per my expectation (against the abstract fields of a dataset containing journal papers).

Screenshot from 2022-03-12 22-41-20.png

But I can't figure out why it is not getting stored against a column after clicking on "OK".

Regards

Owen Stephens

unread,
Mar 14, 2022, 6:55:36 AM3/14/22
to OpenRefine
Possibly you need to covert the response to a storable value (in this case a string)?

Felix Lohmeier

unread,
Mar 14, 2022, 1:39:02 PM3/14/22
to OpenRefine
The expression in "Add column by fetching URLs" is used to generate the URLs to be queried. I guess it works in "Add column based on this column...".

Best regards,
Felix

Parthasarathi Mukhopadhyay

unread,
Mar 14, 2022, 4:27:49 PM3/14/22
to openr...@googlegroups.com
Thanks Owen for the clue.

I tried with this :

import urllib2,urllib,json

url = "http://api.annif.org/v1/projects/yso-en/suggest"
data = "text=" + value.encode("utf-8") + "&limit=10&threshold=0.3"
post = urllib2.urlopen(url, data)
response = post.read()
return json.dumps(json.loads(response), ensure_ascii=False)

It is showing me results in the Preview tab but still getting null after fetching the source.

{"results": [{"label": "sexual minorities", "uri": "http://www.yso.fi/onto/yso/p1828", "score": 0.4891691505908966, "notation": null}, {"label": "homosexuality", "uri": "http://www.yso.fi/onto/yso/p1825", "score": 0.33421629667282104, "notation": null}, {"label": "bisexuality", "uri": "http://www.yso.fi/onto/yso/p1830", "score": 0.3291281759738922, "notation": null}, {"label": "young people", "uri": "http://www.yso.fi/onto/yso/p11617", "score": 0.3131211996078491, "notation": null}]}

Regards



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/6baf5d84-c838-47bb-8c6a-72a144448c26n%40googlegroups.com.

Parthasarathi Mukhopadhyay

unread,
Mar 14, 2022, 4:33:59 PM3/14/22
to openr...@googlegroups.com
Thanks Felix

Add column based on this column has worked for me.

Thanks a lot.

Regards

Thad Guidry

unread,
Mar 14, 2022, 4:36:42 PM3/14/22
to openr...@googlegroups.com
What if you explicitly converted the returned object... to a String?

return str( json.dumps(json.loads(response), ensure_ascii=False) )

Parthasarathi Mukhopadhyay

unread,
Mar 14, 2022, 4:52:48 PM3/14/22
to openr...@googlegroups.com
Nope. This is also giving me Null (though displaying in Preview) with this modified one:

import urllib2,urllib,json
url = "http://api.annif.org/v1/projects/yso-en/suggest"
data = "text=" + value.encode("utf-8") + "&limit=10&threshold=0.3"
post = urllib2.urlopen(url, data)
response = post.read()
return str(json.dumps(json.loads(response), ensure_ascii=False))

Regards

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Thad Guidry

unread,
Mar 14, 2022, 5:28:37 PM3/14/22
to openr...@googlegroups.com

Since you have a custom header that you are sending then Fetch URLs won't work (it only works with simple constructed URL strings, or programmatically constructed strings -- but no headers for Post).
And the only header attributes we expose are:

Authorization:
User-Agent:
Accept:

image.png

And in your case, for that web service you are attempting to query, you need to pass the Content-Type of "application/json" text via a header in a POST.
Fetch URLs can not do POST, only GET.
Fetch URLs can not send a custom header with the Content-Type: "application/json" (only those 3 shown previously).

So, Add column... is your only option and there you can use Python or Clojure to construct a full POST or GET request (and customize the header attributes to your liking) per row value in OpenRefine, however you need.

Parthasarathi, I hope that better explains what is happening under the covers of OpenRefine for you in the future.

Owen Stephens

unread,
Mar 15, 2022, 5:35:12 AM3/15/22
to OpenRefine
Good catch Felix!

Parthasarathi Mukhopadhyay

unread,
Mar 15, 2022, 9:48:32 PM3/15/22
to openr...@googlegroups.com
Thanks Felix, Owen and Thad

The entire workflow is now operational. And now I understand the reasons as explained by Thad.

Regards


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Juan Pablo Sánchez

unread,
Jul 8, 2022, 11:55:01 AM7/8/22
to OpenRefine

Almost 4 hours with the same problem and i didn´t figure out I was wrong using ADD COLUMN BY FETCH!!!  

thanks a lot for this post.

Parthasarathi Mukhopadhyay

unread,
Jul 8, 2022, 2:17:42 PM7/8/22
to openr...@googlegroups.com
:-)  Glad to know that it helps.

Regards


Reply all
Reply to author
Forward
0 new messages