PR available to fix Indexing of large text documents

19 views
Skip to first unread message

Brian Keese

unread,
Oct 22, 2024, 9:44:32 AM10/22/24
to DSpace Technical Support
FYI, I recently became aware of a bug in the indexing of large text documents. It is in 7.6.2, introduced last February, 2024. I created a small PR that fixes the bug: https://github.com/DSpace/DSpace/pull/9893

It manifests when indexing text files that are larger than the configured character limit (default 100000). A message is logged about the large file with a suggestion to up the character limit and indicating the first (100000) characters are indexed. In fact, those characters are never indexed and the document will not be found in search results.
Reply all
Reply to author
Forward
0 new messages