change 'nlp.max_length' limit

51 views
Skip to first unread message

Elisabeth Mecking

unread,
Jul 11, 2023, 2:04:43 AM7/11/23
to Annif Users
Hi, I have another question. I've been doing some tests with rather long full texts. When trying it with analyzer spacy, I get the error E088 text exceeds maximum. It says to increase 'nlp.max_length' limit. Where can I change this? I have tried projects file like this:
analyzer=spacy(de_core_news_sm,lowercase=1,nlp.max_length=1500000)
That didn't seem to be the right place, the error is still there.
Thank you for your help
Elisabeth

juho.i...@helsinki.fi

unread,
Jul 11, 2023, 5:01:29 AM7/11/23
to Annif Users
Hi!

Unfortunately the only parameters that can be used in the spacy analyzer project setting are the model name and the lowercase option: other parameters are just ignored and not passed to spacy itself. So I think there is no way to change the default value of nlp.max_length for Spacy analyzer.

I wonder how Annif algorithms can handle documents of lengths near 1500000 characters, have you got the project working with some other analyzer?

Also, it crossed my mind to mention the limit transform, which we use in our Finto AI Omikuji models to truncate long documents to 5000 characters, to make the algorithm consider only abstract and introduction parts, which in scholarly documents contain the most representative information for the full document.

-Juho
Reply all
Reply to author
Forward
0 new messages