How to configure Solr to do partial word matching

201 views
Skip to first unread message

Tianyi Gu

unread,
May 3, 2022, 1:34:50 PM5/3/22
to DSpace Community

Hello everyone,

I have a Solr related issue as below:

When searching a key word, it has to be precisely matching a word in out-of-box DSpace. Otherwise, no result is in the list. (without using the wild cards)

Steps to reproduce the issue:

  • Log in as Admin/as anonymous
  • Search "Sta"
  • No search result.

Desired results examples:

Search Term: sta

Desired Results:

Texas State University Stanford University

Search Term: stan

Desired Results:

Stanford University

Search Term: st un

Desired Results:

Texas State University Stanford University


This is what I've tried so far:

Standard Tokenizer

This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions:

Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names.

The "@" character is among the set of token-splitting punctuation, so email addresses are not preserved as single tokens.

Note that words are split at hyphens.

File Directory:

\solr-8.11.1\server\solr\configsets\search\conf\schema.xml

\DSpace\solr\search\conf\schema.xml

Code:

  1. Replace fieldType tag codes with the codes below:
<fieldType name="text" class="solr.TextField" omitNorms="false"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>

dspace index-update

If you want to run a full re-index (which actually removes your existing index and rebuilds it from scratch), do the following:

  • Stop Tomcat
  • dspace index-init
  • Restart Tomcat
After changing the codes and re-index, I still didn't get desired search result. Please advice.

Thank you,

Tianyi
Reply all
Reply to author
Forward
0 new messages