Using Sphinx as a search engine

15 views
Skip to first unread message

Nikita Zhiltsov

unread,
Nov 1, 2012, 4:57:49 PM11/1/12
to ic...@googlegroups.com
Azat,

we would like you to investigate Sphinx on the following features:

1) Indexing questions as multi-field documents. Some potential fields are:
- question title
- question body
- body of the best answer
- aggregated body of the remaining associated answers

Recall that, for our current task, we'd like to use syllabus chapter titles as queries in order to get promising candidates for the syllabus.

2) BM25F ranking function

3) Execution time - how fast the Sphinx search is (You want to choose not so small dump to measure it properly).

 

Азат Хасаншин

unread,
Nov 13, 2012, 1:38:30 PM11/13/12
to iCQA
Hello,

Sorry for the late response

1) Sphinx supports multiple full-text fields in documents

3) They claim that they got 10-15 MB/sec per core indexing speed and
 150-250 queries/sec per core searching speed against 1,000,000 documents, 1.2 GB of data on an internal benchmark

2) Couldn't find any indications that BM25F is supported in documentation, but it is mentioned is this revision http://code.google.com/p/sphinxsearch/source/detail?r=3369 though, may be they plan adding it in the future



 

--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/-XAlKH5_YRoJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Азат Хасаншин

Nikita Zhiltsov

unread,
Nov 14, 2012, 12:33:21 PM11/14/12
to ic...@googlegroups.com
Thanks, Azat,

since BM25F isn't supported in Sphinx out of the box, switching to Apache Lucene 4.0 seems reasonable. I guess Yu is going to describe here his experiments with Lucene and the imported SO data soon.


On Tuesday, November 13, 2012 1:38:31 PM UTC-5, Азат Хасаншин wrote:
Hello,

Sorry for the late response

1) Sphinx supports multiple full-text fields in documents

3) They claim that they got 10-15 MB/sec per core indexing speed and
 150-250 queries/sec per core searching speed against 1,000,000 documents, 1.2 GB of data on an internal benchmark

2) Couldn't find any indications that BM25F is supported in documentation, but it is mentioned is this revision http://code.google.com/p/sphinxsearch/source/detail?r=3369 though, may be they plan adding it in the future
Reply all
Reply to author
Forward
0 new messages