Having trouble while doing inference in Annif 1.3.0

89 views
Skip to first unread message

Javier 1882

unread,
Feb 26, 2025, 10:15:27 AMFeb 26
to Annif Users
Hello, 
I made some Python code in the day that worked with Annif 0.60.0 and now, with Annif 1.3.0, it's crashed.
I used to get my labels and their scores like this:
labels = [
            {
                "term": self.vocabulary.loc[item[0], 'label_en'],
                "score": item[1]
            } for item in self.annif.suggest(text).as_list() if item[1] >= threshold
        ]
But it doesn't work anymore, I get a Exception: 'SuggestionBatch' object has no attribute 'as_list' Exception.
What's more, I try to use a model to do inference on a text from Python, and, for the same text, on the terminal I get labels, but on Python I get nothing.
How do I fix this? How can I do inference with a model trained with Annif 1.3.0 in Python?
Kind regards,
Javier

juho.i...@helsinki.fi

unread,
Feb 27, 2025, 4:23:19 AMFeb 27
to Annif Users
Hi Javier!

First, please note that Annif is not developed to be used as a Python library, and also that the projects trained with old versions of Annif may not remain working on subsequent minor releases. See the policy we follow since the release 1.0 in https://github.com/NatLibFi/Annif/wiki/Backward-compatibility-between-Annif-releases.

We could make this more clear in our documentation.

If you want to try to hack via the Python code, you can see the changes made from v0.60.0 to v1.3.0 from this GitHub diff; especially the changes in the file backend.py could be relevant for your case.

Also remember the Annif-client, which can be used with Python code.

-Juho

Javier 1882

unread,
Feb 28, 2025, 8:32:23 AMFeb 28
to Annif Users
Hello Yuho,
I replied twice but it wasn't showing up, turns out I wasn't clicking the right button. Well, sorry for the two extra notifications you must've gotten, my bad.
Thank you for your reply!
I see... we'll be using Annif with the subprocess module, which technically is using Annif from the commandline (just from within Python).
However, the model is pretty slow. We trained it on 300 texts, very large and often with trash... We're considering cleaning the text and summarizing the text before passing it to Annif. How does that sound? In addition, how many training texts would you say are necessary for a good result? 500? 1000?
As for the API, I've done some research and it looks good, sadly, users can't upload custom models... will this change in the future?
Thank you again, 
Javie

juho.i...@helsinki.fi

unread,
Feb 28, 2025, 10:39:04 AMFeb 28
to Annif Users
Hi!

Using Annif CLI via subprocess calls is unconventional, but I guess it can be made to work. :)

However...

I assume the slowness you mention is caused by the time it takes to load a model from disk, which is required every time you run "annif suggest" command from a terminal (for large models it can be like half a minute). Note that since Annif 0.61 it has been possible to give paths to documents as optional second arguments of "annif suggest", so you can get suggestions to multiple documents in one run.

Still, even better is to start Annif's webserver (Flask/Uvicorn), which provides the API and web UI that you can access from your machine. With it the models you have trained can be used (there is no need to upload them anywhere). The models are loaded only once (after the first use of a project), and remain loaded until you quit the server. It should take clearly less than a second to get subject suggestions for a document, in the case of simple projects.

Just use command "annif run", and you'll see some log messages including

    INFO:     Uvicorn running on http://127.0.0.1:5000 (Press CTRL+C to quit)
 
 and then you can open the web UI with you browser on the address http://127.0.0.1:5000

 Annif-tutorial has these exercises, which can be useful:
 
- https://github.com/NatLibFi/Annif-tutorial/blob/main/exercises/03_web_ui.md
- https://github.com/NatLibFi/Annif-tutorial/blob/main/exercises/OPT_rest_api.md


The number of required training documents depends on the backend, your vocabulary and the project configuration: MLLM does not need very many documents (1000 may be quite enough), Omikuji backend requires more (at least many times more than is the number of the terms in your vocabulary). The best approach is to try it out and see; please checkout the extra section of this exercise:

- https://github.com/NatLibFi/Annif-tutorial/blob/main/exercises/05_mllm_project.md

Regards,
-Juho

Javier 1882

unread,
Mar 11, 2025, 3:44:18 AMMar 11
to Annif Users
Hi Juho!
I didn't know you could process several documents in one run, that's very useful. 
As for the Annif API, it looks really good!! Right now I'm curating my training data (aiming for around a 1000 as you said). When I'm done and I train the new model, I'll implement this approach and get back to you.
Thanks a lot!
Best,
Javier

Javier 1882

unread,
Mar 19, 2025, 12:17:33 PMMar 19
to Annif Users
Hi Juho!
I trained the model on 350 docs and the performance is okay. 
As for the inference part, the Annif API works great but I can't seem to be able to get it running on Docker. When I do docker compose up, I get:
WARN[0000] /mnt/c/Users/jdetorre/workspace/cyclops/annif-docker/docker-compose.yaml: `version` is obsolete
WARN[0000] Found orphan containers ([annif-docker-annif-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 0/2
 ⠙ Container annif-docker-annif_app-1  Recreated                                                                                                        0.1s
 ⠋ Container annif-docker-nginx-1      Recreated                                                                                                        0.0s
Attaching to annif_app-1, nginx-1
annif_app-1  | [2025-03-19 16:12:14 +0000] [1] [INFO] Starting gunicorn 23.0.0
annif_app-1  | [2025-03-19 16:12:14 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
annif_app-1  | [2025-03-19 16:12:14 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
annif_app-1  | [2025-03-19 16:12:14 +0000] [7] [INFO] Booting worker with pid: 7
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: using the "epoll" event method
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: nginx/1.27.4
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: built by gcc 12.2.0 (Debian 12.2.0-14)
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: OS: Linux 5.15.167.4-microsoft-standard-WSL2
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: getrlimit(RLIMIT_NOFILE): 1048576:1048576
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker processes
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 10
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 11
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 12
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 13
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 14
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 15
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 16
nginx-1      | 2025/03/19 16:12:14 [notice] 9#9: start worker process 17
annif_app-1  | INFO:annif:finished initializing projects
annif_app-1  | [2025-03-19 16:12:16 +0000] [7] [INFO] Started server process [7]
annif_app-1  | [2025-03-19 16:12:16 +0000] [7] [INFO] Waiting for application startup.
annif_app-1  | [2025-03-19 16:12:16 +0000] [7] [INFO] Application startup complete.

docker-compose.yaml:

version: "3"

services:

  annif_app:
    image: quay.io/natlibfi/annif:latest
    volumes:
      - .:/annif-projects
    # user: jdetorre:sample_pw
    command: ["gunicorn", "annif:create_app()", "--bind", "0.0.0.0:8000", "--timeout", "600"]

  nginx:
    image: nginx
    ports:
      - "80:80"
    depends_on:
      - annif_app
    command: |
      bash -c 'bash -s <<EOF
        cat > /etc/nginx/conf.d/default.conf <<EON
          server {
              listen 80;
              server_name localhost;
              location / {
                  proxy_pass http://annif_app:8000;
              }
          }
      EON
      nginx -g "daemon off;";
      EOF'

The directory that is being mounted:
(annif-venv) mnt/c/Users/jdetorre/workspace/cyclops/annif-docker# ls -R
.:
Dockerfile  data  docker-compose.or  docker-compose.yaml  projects.cfg  requirements.txt

./data:
projects  vocabs

./data/projects:
eurovoc-mllm-en

./data/projects/eurovoc-mllm-en:
mllm-model.gz  mllm-train.gz

./data/vocabs:
eurovoc

./data/vocabs/eurovoc:
subjects.csv  subjects.dump.gz  subjects.tsv  subjects.ttl

How do I fix this? I want to be able to access Annif in my browser, with the model I trained on my local, on Docker...What do you think?
Kind regards,
Javier

juho.i...@helsinki.fi

unread,
Mar 20, 2025, 4:15:59 AMMar 20
to Annif Users
Hi!

Thanks for the details! I assume your problem is that when you access the address "localhost" with your browser, when you have the docker compose running, your project eurovoc-mllm-en is not shown in the menu(?).

That is probably because you have a wrong user-id and group-id inside the Docker container (by default their numerical ids are 998): they should be the same as in your host computer so that the files mounted from your host can be used inside the container. To do that, the environment variables MY_UID and MY_GID can be set when starting docker compose, like with command

  MY_UID=$(id -u) MY_GID=$(id -g) docker compose up

and then their values should be referenced in the setting in the docker-compose.yml (in services.annif_app.user), so just restore your commented line to

  user: ${MY_UID}:${MY_GID}

Hopefully this helps!
-Juho

Javier 1882

unread,
Mar 20, 2025, 4:54:57 AMMar 20
to Annif Users
Hello!
The problem is that, when I enter http://localhost:8000 in the browser, it rejects the connection.
I don't know what the user-id and group-id are. Do I have them? How can I know what are my user-id and group-id? When I enter both as 998, I get yaml: line 5: did not find expected key.
How can I get the container up so the connection won't be rejected in my browser?
Kind regards,
Javier

juho.i...@helsinki.fi

unread,
Mar 20, 2025, 6:38:02 AMMar 20
to Annif Users
I see; with your NGINX and Annif combination you don't need (and cannot) to use the port 8000, but port 80 (the default for http connections).

The NGINX configuration sets NGINX to accept connections to port 80 ("listen 80") and to forward all requests to annif_app service port 8000 ("proxy_pass http://annif_app:8000;").

-Juho

Javier 1882

unread,
Mar 20, 2025, 12:11:00 PMMar 20
to Annif Users
Hello Juho!
Yes, that fixed the issue! Thank you so much!
Now I'm having some problems with inference, but let's see if I can solve it before asking you anything...
Again, thanks!! Have a nice afternoon!
-Javier

Javier 1882

unread,
Mar 20, 2025, 12:25:57 PMMar 20
to Annif Users
Okay, something really weird is happening.
I train my model and I do inference on a text. Okay, I get labels.
However, after I do the docker compose up -d, I do inference again, it doesn't work. What can be possibly be going wrong?
I send you the directory where docker is running:
(annif-venv) root:/mnt/c/Users/jdetorre/workspace/cyclops/annif-docker# ls -R

.:
Dockerfile  data  docker-compose.or  docker-compose.yaml  projects.cfg  requirements.txt

./data:
projects  vocabs

./data/projects:
eurovoc-mllm-en

./data/projects/eurovoc-mllm-en:
mllm-model.gz  mllm-train.gz

./data/vocabs:
eurovoc

./data/vocabs/eurovoc:
subjects.csv  subjects.dump.gz  subjects.tsv  subjects.ttl

docker.compose.yaml:
(annif-venv) root@:/mnt/c/Users/jdetorre/workspace/cyclops/annif-docker# cat docker-compose.yaml
# version: "3"


services:

  annif_app:
    image: quay.io/natlibfi/annif:latest
    volumes:
      - .:/annif-projects:ro
    # user: 0:0

    command: ["gunicorn", "annif:create_app()", "--bind", "0.0.0.0:8000", "--timeout", "600"]

  nginx:
    image: nginx
    ports:
      - "80:80"
    depends_on:
      - annif_app
    command: |
      bash -c 'bash -s <<EOF
        cat > /etc/nginx/conf.d/default.conf <<EON
          server {
              listen 80;
              server_name localhost;
              location / {
                  proxy_pass http://annif_app:8000;
              }
          }
      EON
      nginx -g "daemon off;";
      EOF'

The mllm-model.gz and mllm-model.gz were copied and pasted there from the working model I described earlier. What do you think? What can be going wrong?
Kind regards,
Javier

juho.i...@helsinki.fi

unread,
Mar 24, 2025, 5:23:20 AMMar 24
to Annif Users
Hi Javier!

I suspect the issue might now be because of the commented-out user setting in the docker-compose.yaml file ("# user: 0:0"), but I cannot be sure. Could you clarify how you are performing inference when you get the labels? Are you using the CLI command suggest or the API suggest method (via the web UI or a direct API request)?

If you do not need to use Docker or docker compose, it probably is easier to use Annif without them. Also please see Annif-tutorial, where there are exercises also for Web UI and REST API.

Regards
-Juho

Javier 1882

unread,
Apr 2, 2025, 8:42:53 AMApr 2
to Annif Users
Hi Juho,
Apologies for the late reply, it was a really busy week. 
For inference I'm using the CLI command with Python subprocess, but, as I have more time now, I'm going to try to get my hands on the source code and do inference from Python. Hopefully that will make it unnecessary to use Docker or any kind of thing external to the Python code. Will try to do this and then I'll get back to you.
Kind regards,
Javier

Javier 1882

unread,
Apr 4, 2025, 3:57:02 AMApr 4
to Annif Users
Hi Juho,
I'm happy to tell you that I could do inference with Annif 1.3.0 from Python, posting here the Python code in case anyone finds it useful! I was able to get this by seeing what got called when running the annif suggest command, mainly the run_suggest(project_id, paths, limit, threshold, language, backend_param, docs_limit) function in cli.py.

Given an Annif project (<class 'annif.project.AnnifProject'>), and given a text, run the following command to do inference on the text:
>> suggestions = mllm_project.suggest([text], {}).filter(10, 0.0)[0] 
{} is the backend_params, inference works just fine with this set to an empty dict
10 is limit, the max number of labels that will be returned by the model
0.0 is the threshold, that is, you're saying to the model "don't return any labels that go below this value", e.g. 0.5 will mean that you will only get labels with a confidence score higher than 0.5
[0] is because of the object returned. "mllm_project.suggest([text], {}).filter(10, 0.0)" returns a  <class 'annif.suggestion.SuggestionBatch'> , so you need to index it to get a <class 'annif.suggestion.SuggestionResult'>, which is what the suggestions object is in this case. If you were doing inference on several texts, then you'd be able to access them by doing [0], [1], [2], etc

Given a  <class 'annif.suggestion.SuggestionResult'>, run the following command to see the labels:
>>print([
(
mllm_project.subjects[hit.subject_id].labels["en"], # the label in your language of choice, English in my case
 mllm_project.subjects[hit.subject_id], # the URI
 hit.score # the confidence score of the label
) for hit in suggestions])
# [('fishing', Subject(uri='http://eurovoc.europa.eu/1372', labels={'en': 'fishing'}, notation=None), 0.8497293591499329), ('ecology', Subject(uri='http://eurovoc.europa.eu/632', labels={'en': 'ecology'}, notation=None), 0.6631121635437012), ('environment', Subject(uri='http://eurovoc.europa.eu/100155', labels={'en': 'environment'}, notation=None), 0.5274575352668762)]
mllm_project.subjects is a <class 'annif.corpus.subject.SubjectIndex'> and it's basically where the vocabulary is stored. You can see where it's getting its information from in the .ttl file that was created when you loaded your vocabulary. In my case, here's the content of said .ttl file for "fishing":
<http://eurovoc.europa.eu/1372> a skos:Concept ;
    skos:notation "None" ;
    skos:prefLabel "fishing"@en .

So all is working fine in the end! Thank you for all the help Juho!
Best,
Javier
Reply all
Reply to author
Forward
0 new messages