Tutorial: Problems with Exercise 2

53 views
Skip to first unread message

Tobias Bülte

unread,
Aug 13, 2024, 9:12:03 AM8/13/24
to Annif Users
I started the tutorial.
Using Linux Mint LMDE 6, 32 GB Ram, Python 3.11.2 and Annif installed via Linux.

When I try to train one of the projects in Exercise 2: Set up and train a TFIDF project, I stumbled into two problems:

1) It seems that python was complaining that the nltk package punkt_tab was missing. The Readme and the tutorial only said to install with `python -m nltk.downloader punkt`


When running `python -m nltk.downloader punkt_tab` I was able to train.

But then:

2) The tutorial says the small training sets take around 1-2 min and the bigger ones 10-15 min.
But on my computer the small ones take around 15-20 min (e.g. `annif train yso-tfidf-en data-sets/yso-nlf/yso-finna-small.tsv.gz`) and the larger ones take forever I was not able to finish the training of the larger sets.

When the small ones are done Annif is able to `suggest` stuff. But annif being that slow seems odd to me.

Does anybody had the same problem? Or has anybody a solution for annif being that slow?

juho.i...@helsinki.fi

unread,
Aug 13, 2024, 9:41:34 AM8/13/24
to Annif Users
Thanks for the information!

A new NLTK release (v3.8.2) was made last week, which changed the way the download of some auxiliary packages is handled (the package name was changed from punkt to punkt_tab) and is the reason for the problem (1), see this GitHub issue: https://github.com/nltk/nltk/issues/3293 

The performance regression (problem 2) looks like a new observation, but I think it too is due to NLTK v3.8.2.

The NLTK release should not have been a patch version release due to the breaking change; they seem to be doing a new release in the near future: https://github.com/nltk/nltk/issues/3293#issuecomment-2285641302. At the moment, to get Annif working with the performance as previously, please downgrade NLTK to v3.8.1. When a new NLTK release is made we will check what changes if any needed are for Annif.

-Juho
Reply all
Reply to author
Forward
0 new messages