Working my way through solr I have been working my way through its
manual as well as sunspot and came up with an odd point having to do
with ngrams, specifically this page dealing with edgengrams.
https://github.com/outoftime/sunspot/wiki/Wildcard-searching-with-ngrams
The suggestion is too ad the edgengrams filter to the text analyzer
which is good but in the current configuration applies the edgengrams
to both queries and indexing where it should only be used to filter
the indexing. If you use it as a query filter you will essentially be
making up to an additional 15 queries if you hit the maximum count
perfectly on the dot and would be diluting your search effectiveness
from the score as you add in all these smaller (and presumably more
common) words when matching the 15 length perfectly will give you the
sufficiently high score you need anyways.
My suggestion for this would be to break up your analyzers into two
different sections for this, a query and index analysis in the
schema.xml file, so it would look something like this.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
</analyzer>
</fieldType>
You do this in the substring matching I noticed as I wrote this up but
that should be transitioned into this page as well.