API call - fetching wrong ROR IDs

0 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
Sep 29, 2021, 12:16:21 PM9/29/21
to ROR Tech Support

Hello All

Thanks for the approval of my request for group membership. I'm reporting a problem that we have faced recently in developing a ROD ID datasets for Indian institutes. Most of the ROR API call works well in fetching JSON data for a given institute but in some cases (say 23%-24% of sample institutes), it displays wrong institutes first with high relevancy score  and the right institutes show up much later with comparatively low relevancy score. 

Reporting one example -

Indian Institute of Science: An api call like

https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science

not showing first -  Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html) with 0.92 score whereas Indian Institute of Science comes up much later with 0.84 score.

What is the way out?

Best regards


Liz Krznarich

unread,
Oct 4, 2021, 7:07:37 PM10/4/21
to ROR Tech Support, psmukho...@gmail.com
Hi there,

Apologies for the delay, I was out on vacation at the end of last week. The causes of this are (1) there is no record with the exact name "Indian Institute of Science" and (2) ?affiliation parameter search approach tries 4 different matching algorithms and mixes the results from those different approaches together in the response. 

You'll see in the results that Indian Institute of Soil Science has a matching score of .92 based on "matching_type" : "COMMON TERMS" while the next highest result (Indian Institute of Science Bangalore) had a score of .84 based on "matching_type" : "PHRASE". The Common terms algorithm searches records for each term in your query ("Indian", "Institute", "Science") separately, which means that records with one or more those relatively common terms in multiple fields (regardless of what order they appear in) may receive a high score, despite no fields containing an exact match to your query.

For this particular case, a ?query parameter search does a better job, since you can specify an exact substring match using quotes, ex:


That said, you may find cases where ?query is far less accurate than ?affiliation - this is something we're working to address, though we're in the initial phases of gathering feedback on how we can improve our search functionality. I'd be curious to hear more about your experience and use case - please feel free to comment on this Github issue https://github.com/ror-community/ror-api/issues/196

Cheers,
Liz

---
Liz Krznarich, DataCite | ROR adoption manager
l...@ror.org | https://ror.org | @ResearchOrgs

Parthasarathi Mukhopadhyay

unread,
Oct 5, 2021, 2:19:34 PM10/5/21
to Liz Krznarich, ROR Tech Support

Hello Liz

Thanks for an excellent explanation of behind the screen mechanism and the possible reason for the issue.

Thanks and regards
Reply all
Reply to author
Forward
0 new messages