Transforming data to fit annif

53 views
Skip to first unread message

Elisabeth Mecking

unread,
Dec 16, 2022, 10:33:00 AM12/16/22
to Annif Users
Dear annif users,

For my Masters thesis, I plan to train annif with the German Classification System "Regensburger Verbundklassifikation" (RVK). Has anyone done that before and could share their experiences? I can't seem to find out by myself how to transform the vocabulary into tsv/skos/ttl (which are the formats that can be used, if I understand it correctly). The formats I can get are xml or MARCXML. Unfortunately, my programming skills are still very basic so I might be missing something very obvious and I apologize for that. It would be great to just get some pointers to know where to look. Thank you for your help. Elisabeth

Jim Hahn

unread,
Dec 17, 2022, 8:30:26 AM12/17/22
to Elisabeth Mecking, Annif Users
Hi Elisabeth,

One low code approach to parsing MARCXML is to use the MarcEdit tool.

There is feature where you can parse MARC into tab delimited  -- https://library.si.edu/workflow-kb/exporting-from-marcedit

Using that feature you can set which tags you would like to be in the TSV.

If you need to do some cleanup or more manipulation on the TSV you can use OpenRefine for that purpose: https://openrefine.org/ 

Best wishes on the thesis!

-Jim

On Fri, Dec 16, 2022 at 9:33 AM 'Elisabeth Mecking' via Annif Users <annif...@googlegroups.com> wrote:
Dear annif users,

For my Masters thesis, I plan to train annif with the German Classification System "Regensburger Verbundklassifikation" (RVK). Has anyone done that before and could share their experiences? I can't seem to find out by myself how to transform the vocabulary into tsv/skos/ttl (which are the formats that can be used, if I understand it correctly). The formats I can get are xml or MARCXML. Unfortunately, my programming skills are still very basic so I might be missing something very obvious and I apologize for that. It would be great to just get some pointers to know where to look. Thank you for your help. Elisabeth

--
You received this message because you are subscribed to the Google Groups "Annif Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/annif-users/76aa615d-4619-4c06-a14a-0ede8f456904n%40googlegroups.com.

Osma Suominen

unread,
Dec 19, 2022, 6:52:03 AM12/19/22
to annif...@googlegroups.com
Hi Elisabeth,

Jim already provided you with good hints on how you could convert RVK
into TSV.

If you want to have SKOS instead, I think it should be possible to use
the mc2skos tool to convert from MARC authority records into SKOS:
https://github.com/scriptotek/mc2skos

I see that its configuration file (vocabularies.yml) already has some
default values for RVK, so apparently this has been done before.

There are also JSKOS dumps of RVK available from the coli-conc project:
https://coli-conc.gbv.de/rvk/data/

JSKOS isn't directly supported by Annif but it's closely related to SKOS
as you can imagine from the name. The README.md file in the above folder
(in German) seems to explain in more detail how the conversion to JSKOS
has been performed - and one of the first steps involves using mc2skos.

I hope these pointers will get you at least closer to your goal.

Best,
Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com
> <mailto:annif-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/annif-users/76aa615d-4619-4c06-a14a-0ede8f456904n%40googlegroups.com <https://groups.google.com/d/msgid/annif-users/76aa615d-4619-4c06-a14a-0ede8f456904n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Elisabeth Mecking

unread,
Dec 21, 2022, 5:47:22 AM12/21/22
to Annif Users
Dear Jim an d Osma,
thank you both for your help. I've been trying the MarcEdit tool and seem to go in the right direction with that. mc2skos seems interesting, too, however, I ran into some difficulties using it which I will try to resolve.
So thank you both very much. This is a good group to find help.

Elisabeth
Reply all
Reply to author
Forward
0 new messages