Annif Language error

22 views
Skip to first unread message

Paul Gapski

unread,
Mar 11, 2025, 10:07:07 AMMar 11
to Annif Users
Hej,
i'm Paul and I am testing Annif for my Masterthesis.
I set everxthing up just fine but when i want to get suggestions: 

for file in ~/Masterthesis/TXTDateien/*.txt; do     echo "Analysiere: $file";     annif suggest Masterthesis < "$file" > "${file}.annif"; done

 i get this error message:

Error: Invalid value: language "de" not supported by vocabulary
Analysiere: /home/annif/Masterthesis/TXTDateien/R_9346_I_1013_0002_output.txt
Usage: annif suggest [OPTIONS] PROJECT_ID [PATHS]...
Try 'annif suggest --help' for help

When I list the vocabulary everything seems fine except language is empty.


(annif-venv) annif@annif-tutorial:~/Masterthesis$ annif list-vocabs
Vocabulary ID  Languages  Size   Loaded
---------------------------------------
schlagworte               32978  True  
(annif-venv) annif@annif-tutorial:~/Masterthesis$

The SKOS File is like:

Albrecht-Dürer-Haus / Nürnberg@de
3.

Mount Baker@de

 

 

4.

Konferenz über Sicherheit und Zusammenarbeit in Europa@de
5.

Spielzeugeisenbahn@de
6.

Wasserschaden@de
7.
Volksbund Deutsche Kriegsgräberfürsorge@de
8.
Verein für Deutsche Kulturbeziehungen im Ausland@de


Can someone help me to make sense of it?
For now, I just need it to run 1 time^^

Many Thanks and best whishes

Paul

Osma Suominen

unread,
Mar 11, 2025, 10:26:49 AMMar 11
to annif...@googlegroups.com
Hi Paul,

I think we need a bit more details.

Can you please provide your Annif configuration file (usually called
projects.cfg or projects.toml)?

The SKOS file you provided doesn't look like RDF data in any normal
syntax (Turtle, RDF/XML etc.). Can you show how it actually looks inside
the file? Did you manage to load it using the "annif load-vocab" command?

What annif commands did you use so far, before the suggest command?

Best,
Osma

On 11/03/2025 15:49, Paul Gapski wrote:
> Hej,
> i'm Paul and I am testing Annif for my Masterthesis.
> I set everxthing up just fine but when i want to get suggestions:
>
> for file in ~/Masterthesis/TXTDateien/*.txt; do     echo "Analysiere:
> $file";     annif suggest Masterthesis < "$file" > "${file}.annif"; done
>
>  i get this error message:
>
> Error: Invalid value: language "de" not supported by vocabulary
> Analysiere: /home/annif/Masterthesis/TXTDateien/
> R_9346_I_1013_0002_output.txt
> Usage: annif suggest [OPTIONS] PROJECT_ID [PATHS]...
> Try 'annif suggest --help' for help
>
> When I list the vocabulary everything seems fine except language is empty.
>
>
> (annif-venv) annif@annif-tutorial:~/Masterthesis$ annif list-vocabs
> Vocabulary ID  Languages  Size   Loaded
> ---------------------------------------
> schlagworte               32978  True
> (annif-venv) annif@annif-tutorial:~/Masterthesis$
>
> The SKOS File is like:
>
> http://example.org/vocab#Albrecht_Dürer_Haus_Nürnberg <http://
> example.org/vocab#Albrecht_Dürer_Haus_Nürnberg>
> Albrecht-Dürer-Haus / Nürnberg@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
> 3.
>
> http://example.org/vocab#Mount_Baker <http://example.org/vocab#Mount_Baker>
> Mount Baker@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
>
>
>
> 4.
>
> http://example.org/
> vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa <http://
> example.org/vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa>
> Konferenz über Sicherheit und Zusammenarbeit in Europa@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
> 5.
>
> http://example.org/vocab#Spielzeugeisenbahn <http://example.org/
> vocab#Spielzeugeisenbahn>
> Spielzeugeisenbahn@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
> 6.
>
> http://example.org/vocab#Wasserschaden <http://example.org/
> vocab#Wasserschaden>
> Wasserschaden@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
> 7.
> http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge
> <http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge>
> Volksbund Deutsche Kriegsgräberfürsorge@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
> 8.
> http://example.org/
> vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland <http://
> example.org/vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland>
> Verein für Deutsche Kulturbeziehungen im Ausland@de
> http://www.w3.org/2004/02/skos/core#Concept <http://www.w3.org/2004/02/
> skos/core#Concept>
>
>
> Can someone help me to make sense of it?
> For now, I just need it to run 1 time^^
>
> Many Thanks and best whishes
>
> Paul
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/
> fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com?
> utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Paul Gapski

unread,
Mar 14, 2025, 3:06:16 AMMar 14
to Annif Users
Hey Osma and Goup,

I hope it's ok that I post my projects folder


I just copied everything I have in there so it's more transparent.

There are multiple TSV files and a skoks vocab. Maybe you can make sense of that.

In the TXTDateien Folder are the TXT Files I'd like to get suggestions for.

Many thanks in advance

Paul
projects.cfg

Osma Suominen

unread,
Mar 17, 2025, 5:19:35 AMMar 17
to annif...@googlegroups.com
Hi Paul,

thanks for posting the files.

The configuration file projects.cfg looks OK. Also, the vocabulary file
at data/vocab/schlagworte_skos.ttl looks fine to me.

I think there might be something wrong with the way you loaded the
vocabulary from the SKOS file. Try running this command:

annif load-vocab schlagworte data/vocab/schlagworte_skos.ttl

Then check the output of annif list-vocabs. It should look like this,
with "de" shown under Languages:

Vocabulary ID Languages Size Loaded
---------------------------------------
schlagworte de 32978 True


Hope this helps!

-Osma
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> vocab#Mount_Baker> <http://example.org/vocab#Mount_Baker <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> <http://www.w3.org/2004/02/>
> > skos/core#Concept>
> >
> >
> >
> > 4.
> >
> > http://example.org/ <http://example.org/>
> > vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa
> <http://
> > example.org/
> vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa>
> > Konferenz über Sicherheit und Zusammenarbeit in Europa@de
> > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> vocab#Spielzeugeisenbahn> <http://example.org/ <http://example.org/>
> > vocab#Spielzeugeisenbahn>
> > Spielzeugeisenbahn@de
> > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> vocab#Wasserschaden> <http://example.org/ <http://example.org/>
> > vocab#Wasserschaden>
> > Wasserschaden@de
> > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> <http://www.w3.org/2004/02/>
> > skos/core#Concept>
> > 7.
> > http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge
> > <http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge>
> > Volksbund Deutsche Kriegsgräberfürsorge@de
> > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> <http://www.w3.org/2004/02/>
> > skos/core#Concept>
> > 8.
> > http://example.org/ <http://example.org/>
> > vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland <http://
> > example.org/vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland>
> > Verein für Deutsche Kulturbeziehungen im Ausland@de
> > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://www.w3.org/2004/02/
> <http://www.w3.org/2004/02/>
> > skos/core#Concept>
> >
> >
> > Can someone help me to make sense of it?
> > For now, I just need it to run 1 time^^
> >
> > Many Thanks and best whishes
> >
> > Paul
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Annif Users" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to annif-users...@googlegroups.com <mailto:annif-
> > users+un...@googlegroups.com>.
> > To view this discussion visit https://groups.google.com/d/msgid/
> annif- <https://groups.google.com/d/msgid/annif->
> > users/fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com
> <http://40googlegroups.com> <https://
> > groups.google.com/d/msgid/annif-users/ <http://groups.google.com/
> d/msgid/annif-users/>
> > fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com
> <http://40googlegroups.com>?
> > utm_medium=email&utm_source=footer>.
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 15 (Unioninkatu 36)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <tel:+358%2050%203199529>
> osma.s...@helsinki.fi
> http://www.nationallibrary.fi <http://www.nationallibrary.fi>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/54fef4af-9b9d-44a2-92e6-056dab325a6an%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-
> users/54fef4af-9b9d-44a2-92e6-056dab325a6an%40googlegroups.com?

Paul Gapski

unread,
Mar 18, 2025, 9:22:31 AMMar 18
to Annif Users
Hey Osma,

yes, now it's working. Thank you for your help.

Now I have a different question. Can I train with just Keywords? Like the training text contains two identical columns with the same keywords? SO that when this word is coming up in the texts I want to use, it will be automatically proposed?
Also, how long can the Training Text examples be? Just a few words of a sentence or can it be like a whole paragraph?

Thanks again!

Paul

Osma Suominen

unread,
Mar 19, 2025, 4:33:44 AMMar 19
to annif...@googlegroups.com
Hi Paul!

Good that you got the vocabulary working!

I think it would be helpful if you gave some more background about what
you are trying to do with Annif, what kind of training data you have
available, what kind of documents you wish to apply Annif on and what
kind of results you are aiming for. Then it would be easier to guide you
in the right direction.

Also, I suggest that you take a closer look at the Annif tutorial videos
and exercises. They explain many of the ideas and choices, for example
what kind of data sets you can use for training and what kind of
backends (different algorithms) are available:
https://github.com/NatLibFi/Annif-tutorial

To answer your questions:

1. Yes, you could create a training data set where the text is the same
as the keyword (subject label) and train for example a TFIDF model using
that. Not sure how useful this would be though, as I suspect the quality
wouldn't be very good if the model is applied on longer real world
documents. If you are looking for a solution for lexical subject
indexing (matching words and phrases in the text directly to subject
terms in your vocabulary), then I would suggest that you instead look at
using the MLLM backend (which does require some training data - for
example 50 or 100 documents with manually assigned subjects would be a
good start) or the YAKE backend, which doesn't need any training data.

2. The training documents can be almost any length, from a few words
(e.g. just document titles) to longer abstracts, tables of contents or
full text documents with over a hundred pages of text. It depends on
what you have available and what kind of documents you want to apply
Annif on later on.

Best,
Osma
> > its https://fhpcloud.fh-potsdam.de/s/X6d6KzFDRGee4Pp <https://
> fhpcloud.fh-potsdam.de/s/X6d6KzFDRGee4Pp>
> vocab#Mount_Baker> <http://example.org/ <http://example.org/>
> > vocab#Mount_Baker> <http://example.org/vocab#Mount_Baker <http://
> example.org/vocab#Mount_Baker> <http://
> > example.org/vocab#Mount_Baker <http://example.org/
> vocab#Mount_Baker>>>
> > > Mount Baker@de
> > <http://www.w3.org/2004/02/ <http://www.w3.org/2004/02/>>
> > > skos/core#Concept>
> > >
> > >
> > >
> > > 4.
> > >
> > > http://example.org/ <http://example.org/> <http://example.org/
> <http://example.org/>>
> > > vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa
> > <http://
> > > example.org/ <http://example.org/>
> > vocab#Konferenz_über_Sicherheit_und_Zusammenarbeit_in_Europa>
> > > Konferenz über Sicherheit und Zusammenarbeit in Europa@de
> > > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://
> > vocab#Spielzeugeisenbahn> <http://example.org/ <http://
> example.org/> <http://example.org/ <http://example.org/>>
> > > vocab#Spielzeugeisenbahn>
> > > Spielzeugeisenbahn@de
> > > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://
> > vocab#Wasserschaden> <http://example.org/ <http://example.org/>
> <http://example.org/ <http://example.org/>>
> > > vocab#Wasserschaden>
> > > Wasserschaden@de
> > > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://
> > <http://www.w3.org/2004/02/ <http://www.w3.org/2004/02/>>
> > > skos/core#Concept>
> > > 7.
> > > http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge
> > > <http://example.org/vocab#Volksbund_Deutsche_Kriegsgräberfürsorge>
> > > Volksbund Deutsche Kriegsgräberfürsorge@de
> > > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://
> > <http://www.w3.org/2004/02/ <http://www.w3.org/2004/02/>>
> > > skos/core#Concept>
> > > 8.
> > > http://example.org/ <http://example.org/> <http://example.org/
> <http://example.org/>>
> > > vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland <http://
> > > example.org/
> vocab#Verein_für_Deutsche_Kulturbeziehungen_im_Ausland>
> > > Verein für Deutsche Kulturbeziehungen im Ausland@de
> > > http://www.w3.org/2004/02/skos/core#Concept <http://
> www.w3.org/2004/02/skos/core#Concept> <http://
> > <http://www.w3.org/2004/02/ <http://www.w3.org/2004/02/>>
> > > skos/core#Concept>
> > >
> > >
> > > Can someone help me to make sense of it?
> > > For now, I just need it to run 1 time^^
> > >
> > > Many Thanks and best whishes
> > >
> > > Paul
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Annif Users" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > send
> > > an email to annif-users...@googlegroups.com <mailto:annif-
> > > users+un...@googlegroups.com>.
> > > To view this discussion visit https://groups.google.com/d/
> msgid/ <https://groups.google.com/d/msgid/>
> > annif- <https://groups.google.com/d/msgid/annif- <https://
> groups.google.com/d/msgid/annif->>
> > > users/fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com
> <http://40googlegroups.com>
> > <http://40googlegroups.com <http://40googlegroups.com>> <https://
> > > groups.google.com/d/msgid/annif-users/ <http://
> groups.google.com/d/msgid/annif-users/> <http://groups.google.com/
> <http://groups.google.com/>
> > d/msgid/annif-users/>
> > > fa82039f-7f08-46b4-8626-0264bd95d293n%40googlegroups.com
> <http://40googlegroups.com>
> > <http://40googlegroups.com <http://40googlegroups.com>>?
> > > utm_medium=email&utm_source=footer>.
> >
> > --
> > Osma Suominen
> > D.Sc. (Tech), Information Systems Specialist
> > National Library of Finland
> > P.O. Box 15 (Unioninkatu 36)
> > 00014 HELSINGIN YLIOPISTO
> > Tel. +358 50 3199529 <tel:+358%2050%203199529> <tel:
> +358%2050%203199529>
> > osma.s...@helsinki.fi
> > http://www.nationallibrary.fi <http://www.nationallibrary.fi>
> <http://www.nationallibrary.fi <http://www.nationallibrary.fi>>
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Annif Users" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to annif-users...@googlegroups.com <mailto:annif-
> > users+un...@googlegroups.com>.
> > To view this discussion visit https://groups.google.com/d/msgid/
> annif- <https://groups.google.com/d/msgid/annif->
> > users/54fef4af-9b9d-44a2-92e6-056dab325a6an%40googlegroups.com
> <http://40googlegroups.com> <https://
> > groups.google.com/d/msgid/annif- <http://groups.google.com/d/
> msgid/annif->
> > users/54fef4af-9b9d-44a2-92e6-056dab325a6an%40googlegroups.com
> <http://40googlegroups.com>?
> > utm_medium=email&utm_source=footer>.
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 15 (Unioninkatu 36)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <tel:+358%2050%203199529>
> osma.s...@helsinki.fi
> http://www.nationallibrary.fi <http://www.nationallibrary.fi>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/f76698c7-4fb0-4953-908f-4f5aaa022f2en%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/
> f76698c7-4fb0-4953-908f-4f5aaa022f2en%40googlegroups.com?
Reply all
Reply to author
Forward
0 new messages