Hi all,
I am working on a research project implementing Annif using the X-Transformer (PECOS) PR linked above, and am extremely interested in helping to develop support for this classifier.
LFO/nn:
F1@5: 0.1393
NDCG: 0.1928
NDCG@5: 0.2003
NDCG@10: 0.1938
Precision@1: 0.3233
Precision@3: 0.2016
Precision@5: 0.1502
True positives: 391
False positives: 3859
False negatives: 2070
X-Transformer:
F1@5: 0.2472
NDCG: 0.3204
NDCG@5: 0.2867
NDCG@10: 0.3216
Precision@1: 0.2860
Precision@3: 0.2698
Precision@5: 0.2633
True positives: 845
False positives: 3455
False negatives: 1616
However, there were a couple of modifications I had to make to the current code in the PR to make it work:
- This commit is required for updated Keras support (/annif/backend/nn_ensemble.py) -- more on this below.
- In /annif/backend/xtransformer.py:L90, "distilbert-base-multilingual-uncased" is no longer available on Hugging Face, so I used "distilbert-base-uncased" instead.
- In /annif/backend/fasttext.py:L70, I had to add "fasttext.FastText.eprint = print" to suppress 'missing method' errors.
There is an excellent discussion within
the Pull Request thread, in particular, the conflict (?) between PyTorch and and TensorFlow. I don't know enough to speak to this, although Keras seems to be an attempt to abstract/unify these frameworks. What I
can say is that I am unable to feed the results of the xtransformer backend into the nn_ensemble backend due to these compatibility issues -- however, a simple ensemble will work, although it doesn't provide any scoring benefits with other backends that I've been able to find.
My current development environment is an Apple M2 / 24GB RAM, and xtransformer uses about 10-12GB of memory on my dataset, but it definitely pushes the hardware to its limit. The next environment I'll be working with is an Intel i5-13600K / 64GB RAM, with an RTX 4070 (about 5000 CUDA cores) in an x86_64 VM. CUDA support for this backend would be very helpful!
However, using xtransformer on its own is a big enough step forward that I'm likely to use it exclusively, and try to improve scoring on my dataset with hyperparameter tuning, and possibly varying transformer models. PECOS is a very exciting architecture, and I'd like to learn as much more about it as possible. Again, there is a great discussion about how to best incorporate it with Annif
in the PR thread, but I believe a higher-level discussion of transformer-based backends makes more sense here.
Best,
MJ