Fail to run XTransformer using pre-trained gnd-all-xtransformer-en on colab

13 views
Skip to first unread message

Mohamad Mmdouh

unread,
Jan 5, 2026, 1:55:12 AM (12 days ago) Jan 5
to Annif Users

Hi all,

 I am trying to run  the pre-trained gnd-all-xtransformer-en project with Annif (invoked with annif suggest -p projects.d -l 20 gnd-all-xtransformer-en) on google colab -The link is in the end-.  I’m running into a two-step failure. I wanted to check if anyone has seen the same and to ask for recommended dependency versions or a patch.

Environment:

  • annif 1.5.0.dev0 (source checkout in /content/Annif)

  • transformers 4.49.0

What I did / command:

echo "Deep learning methods for multilingual information retrieval and neural ranking." \ | annif suggest -p projects.d -l 20 gnd-all-xtransformer-en

Sequence of issues:

  1. Initial failure — missing tokenizer config

terminate called after throwing an instance of 'std::runtime_error' what(): Unable to open config file at data/projects/gnd-all-xtransformer-en/vectorizer/tokenizer/config.json

I resolved this by fixing a linking / path problem (some model files/configs weren’t on the expected path in my projects.d layout). After correcting that, the command proceeded further.

  1. New failure (after resolving the above)
    Short stderr/log noise about XLA/CUDA plugin registration (I think noisy and unrelated), then an abort with:

terminate called after throwing an instance of 'nlohmann::detail::type_error' what(): [json.exception.type_error.302] type must be number, but is null

This happens immediately after the noisy XLA messages. Disabling GPU (CUDA_VISIBLE_DEVICES="") and setting TF_XLA_FLAGS="--tf_xla_enable_xla_devices=false" did not avoid the crash for me.

Colab link with Editor access for any suggests:
https://colab.research.google.com/drive/11MHIlLDUmh6_UMprITtT__oEldNrW0zu?usp=sharing

Thanks in advance.


Kähler, Maximilian

unread,
Jan 5, 2026, 4:37:30 AM (12 days ago) Jan 5
to Mohamad Mmdouh, Annif Users

Dear Mohamad,

 

great to hear that there is interest in working with the Xtransformer Backend.

 

Unfortunately, there is currently a bunch of version-issues and incompatibilities with the underlying pecos-library, which seems to be no longer maintained by the creator.

When I look into your notebook I see a lot of install errors. You definitely need to pin or downgrade some of the libraries. A configuration which worked for me (haven’t had any time testing it recently) was:

 

      - huggingface-hub==0.21.3

      - libpecos==1.2.4

      - numpy==1.26.4

      - …

      - safetensors==0.4.2

      - sentencepiece==0.2.0

      - tokenizers==0.15.2

      - torch==1.13.1

      - transformers==4.38.2

 

Not saying there is no other solution.

 

The backend is still dev-status and I think no one can make any promises right now, how soon we will get that working for everyone. The dependency-chaos caused by pecos is not so easy to resolve for Annif-main.

 

Hope this helps.

Best,

Maximilian

 

Maximilian Kähler

German National Library
Metadata, Automation

Deutscher Platz 1

D-04103 Leipzig

Germany

Phone: + 49 341 2271- 133
mailto:m.ka...@dnb.de
https://www.dnb.de/ki-projekt  

 

 

 

Von: annif...@googlegroups.com <annif...@googlegroups.com> Im Auftrag von Mohamad Mmdouh
Gesendet: Mittwoch, 31. Dezember 2025 19:22
An: Annif Users <annif...@googlegroups.com>
Betreff: Fail to run XTransformer using pre-trained gnd-all-xtransformer-en on colab

 

Hi all,

 I am trying to run  the pre-trained gnd-all-xtransformer-en project with Annif (invoked with annif suggest -p projects.d -l 20 gnd-all-xtransformer-en) on google colab -The link is in the end-.  I’m running into a two-step failure. I wanted to check if anyone has seen the same and to ask for recommended dependency versions or a patch.

Environment:

·         annif 1.5.0.dev0 (source checkout in /content/Annif)

·         transformers 4.49.0

What I did / command:

echo "Deep learning methods for multilingual information retrieval and neural ranking." \ | annif suggest -p projects.d -l 20 gnd-all-xtransformer-en

Sequence of issues:

1.      Initial failure — missing tokenizer config

terminate called after throwing an instance of 'std::runtime_error' what(): Unable to open config file at data/projects/gnd-all-xtransformer-en/vectorizer/tokenizer/config.json

I resolved this by fixing a linking / path problem (some model files/configs weren’t on the expected path in my projects.d layout). After correcting that, the command proceeded further.

2.      New failure (after resolving the above)


Short stderr/log noise about XLA/CUDA plugin registration (I think noisy and unrelated), then an abort with:

terminate called after throwing an instance of 'nlohmann::detail::type_error' what(): [json.exception.type_error.302] type must be number, but is null

This happens immediately after the noisy XLA messages. Disabling GPU (CUDA_VISIBLE_DEVICES="") and setting TF_XLA_FLAGS="--tf_xla_enable_xla_devices=false" did not avoid the crash for me.

Colab link with Editor access for any suggests:
https://colab.research.google.com/drive/11MHIlLDUmh6_UMprITtT__oEldNrW0zu?usp=sharing

Thanks in advance.

 

--
You received this message because you are subscribed to the Google Groups "Annif Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/annif-users/bcbff5ed-36f9-498d-83ee-7c58c2263e7an%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages