I wonder if anyone knows how Sphinx goes about constructing the snippets that are returned along with the matches to a search term. This page illustrates a wild variety of examples of how one search term can be interpreted:
https://oll.libertyfund.org/search/results?q=power+corrupts
Note the first hit, from Alvis on Shakespeare. The exact phrase exists in the third line of the snippet (on a desktop screen, YMMV). It is not highlighted. In the third example, the result is from deep in the weeds of the footnotes, and hits on the word power, and actually highlights it. The fifth hit gets both power (twice) and corrupts, but misses the stem of corrupts in corrupt. The second-to-the-last hit on that page, in Liberty, Order, and Justice, goes on for several screens (208,135 words), with a single snippet that has grown to encompass 725 individual keyword hits in one "paragraph" of source text.
I'm using Thinking Sphinx 3.1.2, and Sphinx is version 2.2.9
Here's the controller method that constructed this page:
@results = ThinkingSphinx.search "\"#{ThinkingSphinx::Query.escape(params[:q].to_s)}\"",
:page => params[:page],
:star => true,
:excerpts => {
:limit => 1000,
:around => 40,
:force_all_words => true,
:chunk_separator => '</li><li>'
}.reject{ |r| r.class.to_s == 'NilClass' } rescue Kaminari::paginate_array []
@results.context[:panes] << ThinkingSphinx::Panes::ExcerptsPane
@hits = @results.total_entries rescue 0
And these results are from mostly titles, but some pages. Here's the definition for both:
# titles_index.rb
ThinkingSphinx::Index.define :title, :with => :active_record do
set_property :group_concat_max_len => 10.megabytes
indexes :title, :sortable => true
indexes teaser
indexes content.plain, :as => :plain_text
indexes author_name, :sortable => true
has roles(:person_id), :as => :people_ids
has :id, :as => :title_id
has author_id, created_at, updated_at
has set, :as => :title_set
where sanitize_sql(["publish", true])
end
#pages_index.rb
ThinkingSphinx::Index.define :page, :with => :active_record do
indexes :title, :sortable => true
indexes teaser
indexes body
has created_at, updated_at
end
In the view, I'm using this tortured bit of ERB:
<%= content_tag( :ol, "<li>#{result.excerpts.plain_contents}</li>".gsub(/<li>\s*<\/li>/,'').html_safe ) if result.respond_to?(:plain_contents) %>
And there's no way to explain why some results are wrapped in the <span class="match"> in the output from Sphinx, while others (nearby, in the same set of results) are not.
Thanks in advance if anyone can enlighten me or point me toward documentation of this feature. This is all very old code, maybe 6 or 8 years since I last touched it. I've moved it to a newer server since I wrote all this, but nothing much changed when I did that. My client would like to know, and I don't have any good answers.
Walter