Differences in search result on items between DSpace 6.3 / DSpace 7.5

94 views
Skip to first unread message

Erivelto Henrique

unread,
Mar 2, 2023, 8:49:00 AM3/2/23
to DSpace Community
Hi everyone;

I have a DSpace 6.3 installation and am deploying a new document repository with version 7.5
We have already installed the new version 7.5 on a new server, and we have imported some documents for this new installation.
We did some search tests and noticed a very big difference in search results between the two versions.
When I search for a term in version 6.3, I get 14 results found for the search, and when I search in version 7.5, I only get 9 returns.
Version 6.3 search result
Screenshot_8.png
Search result in version 7.5
Screenshot_10.png

With some PDF documents that are very large and the Text Extractor settings were set to 100k characters, the file was not converting 100% to TXT. I changed it to textextractor.max-chars = -1 but still the search result remains the same.

Anyone can help with this?

Thanks

Erivelto

Tim Donohue

unread,
Mar 2, 2023, 11:50:02 AM3/2/23
to Erivelto Henrique, DSpace Community
Hi Erivelto,

I'd recommend looking more closely at the 5 items which were matched in DSpace 6.3 but not in 7.5.  Is there something in common among those 5 items?  Is the search results match occurring in the metadata of those items or in the full text?  

If you can narrow things down, it'd be much easier to provide support/ideas.  There have been a lot of changes in the search engine of DSpace 7.5... including a move to a later version of Solr.  It's possible you've found a bug, or it could be a misconfiguration, or simply a change in the behavior of Solr.  It's difficult to narrow down without more information about the differences in the results that you are seeing.

If you can send more information to this list or your email to dspace-tech (as I see you sent the same email to both lists), that might provide others with more clues as to what might be going on.

Tim

From: dspace-c...@googlegroups.com <dspace-c...@googlegroups.com> on behalf of Erivelto Henrique <erihe...@gmail.com>
Sent: Thursday, March 2, 2023 7:41 AM
To: DSpace Community <dspace-c...@googlegroups.com>
Subject: [dspace-community] Differences in search result on items between DSpace 6.3 / DSpace 7.5
 
--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/9ba0b682-523b-4709-a6cb-de2284f1f90bn%40googlegroups.com.

Bill Tantzen

unread,
Mar 2, 2023, 12:27:36 PM3/2/23
to Tim Donohue, Erivelto Henrique, DSpace Community
There is a companion setting in discovery.cfg (discovery.solr.fulltext.charLimit) which limits the number of characters that are actually stored in the solr index in the fulltext field; initially, that is also set to 100000 characters.  Simply set this to a higher count, or -1 for unlimited.

Hope that helps!
~~Bill



--
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp
________________________________________________________________
Bill Tantzen    University of Minnesota Libraries
612-626-9949 (U of M)    612-325-1777 (cell)

Erivelto Alves

unread,
Mar 2, 2023, 8:43:53 PM3/2/23
to Bill Tantzen, Tim Donohue, DSpace Community
Hi Bill & Tim!

This is the server spec:

Server DSpace App
Ubuntu Server 22.04.2 LTS
openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)
Tomcat 9.0.72
Solr 8.11.2
Apache Maven 3.6.3
Apache Ant 1.10.12
NodeJS v16.19.1
npm 8.19.3
-----------------------------
Server Database
Ubuntu Server 22.04.2 LTS
PostgreSQL 14.6 (Ubuntu 14.6-0ubuntu0.22.04.1)

No error in the installation of DSpace. 
The PDF files that were compared in DS6.5 and DS7.5 are the same and were batch imported into a SAF file. No import error. 
I modified the configurations for extractor text in the DSPACE.cfg file for textextractor.max -chars = -1. All files were 100% converted to TXT.

Bill, there is no companion setting in discovery.cfg (discovery.solr.fulltext.charLimit).

The research failure remains.

Thanks.

Erivelto

Erivelto Alves

unread,
Mar 2, 2023, 9:53:21 PM3/2/23
to Bill Tantzen, Tim Donohue, DSpace Community
Hi Bill!

I was wrong, there is  the companion setting in discovery.cfg (discovery.solr.fulltext.charLimit). I change it to -1 and now DSpace is finding all occurrences in the search. 

But now I find other problem. I sent other PDF files, and some of them, when I execute the command ./dspace index-discovery, some files are erased.

Thank you.

Erivelto




Reply all
Reply to author
Forward
0 new messages