ERIC thesaurus in Annif

93 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
May 7, 2023, 12:39:15 PM5/7/23
to Annif Users
Dear all

This may not be directly related to Annif but I seek advice from the forum members on the following issues related to the subject matter:

1. We are exploring the use of Annif for the domain of education and to start with selected the following resources - a) ERIC thesaurus as vocabulary; and b) ERIC database as a source of our training and test datasets;

2. The problem is that presently ERIC thesaurus (almost 12K preferred terms and a comprehensive RT-BT-NT based term network) is available in the public domain but in XML format from here - https://eric.ed.gov/eric_thesaurus2023.zip ;

3. We understand that the ERIC in its present form is not compatible with the requirements of Annif. We studied this issue a bit and now understand that the one possible way out may be converting the XML based Annif to RDF/XM through RML (https://d2s.semanticscience.org/docs/convert-rml/) and it appears a very complex process of mapping as there is no available mapping in case of a thesaurus. Is there any easier solution to this issue i.e converting ERIC xml file to TTL or NT format ?

4. If this conversion is no way an easy one, is there any way out to convert the XML file into CSV format compatible with Annif?

5. Another issue is the identifier, for example, ERIC is representing concepts (say 21st century skills) in this fashion -


How can we solve the issue of URI based concept identifier in this case?

Thanks and regards

Parthasarathi Mukhopadhyay

Professor, Department of Library and Information Science,

University of Kalyani, Kalyani - 741 235 (WB), India

Gabriel Kulevicius

unread,
May 10, 2023, 11:04:16 AM5/10/23
to Parthasarathi Mukhopadhyay, Annif Users
Hi Parthasarathi, 

In our service vocabularyserver.com we have a TemaTres implementation of the ERIC thesaurus, updated this week.
You can browse it in this URL
From this tool it is possible to export it on RDF or CSV format, I'm attaching this in RFD, if it does not help and you need it in CVS format please let me know.

please confirm me this solves your issue
best
Gabriel Kulevicius



--
You received this message because you are subscribed to the Google Groups "Annif Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/annif-users/CAGM_5ubSzPA%3D0qg2xeWSN7N50xbGGyt%3DdQc4Q2mB-woPGorH8w%40mail.gmail.com.
eric-thesaurus.rdf.zip

Parthasarathi Mukhopadhyay

unread,
May 10, 2023, 1:35:51 PM5/10/23
to Gabriel Kulevicius, Annif Users
Dear Gabriel

Thanks a tonne for your guidance.

It did the magic. I've followed the following steps (if anyone else is interested):

1. Downloaded your file in RDF format (eric-thesaurus.rdf)
2. Applied Skosify (https://github.com/NatLibFi/Skosify) to convert into SKOS format (with the following switches - --label "ERIC" --eliminate-redundancy --default-language=en)
3. Uploaded into Annif through load-vocab

The process generated subjects.csv and subjects.ttl happily. I checked that the number of skos:preLabel descriptors (4751 to be exact)  are matching the ERIC January 2023 version.
Moreover, it solved the problem of identifiers too:

uri notation label_en
https://vocabularyserver.com/eric/skos/10103
Moral Criticism (1969 1980)
https://vocabularyserver.com/eric/skos/10190
Mythic Criticism (1969 1980)
https://vocabularyserver.com/eric/skos/10238
Negro Housing (1966 1977)
https://vocabularyserver.com/eric/skos/10298
Northern Schools (1966 1980)
https://vocabularyserver.com/eric/skos/10299
Norwegian
https://vocabularyserver.com/eric/skos/10566
Performance Criteria (1968 1980)
https://vocabularyserver.com/eric/skos/10567
Performance Specifications (1969 1980)
https://vocabularyserver.com/eric/skos/10646
Plane Geometry


Heartfelt thanks and best regards

Note: I've got some queries related to TemaTres (as a user of TemaTres) and will be writing to you separately (as this forum is meant for Annif).

Sincerely

Parthasarathi


Reply all
Reply to author
Forward
Message has been deleted
0 new messages