Hi Pat, friendly folk,
I've got a 375_000 row geo data set indexed by lat and lon. I can
search just great, but queries usually take about 650ms on our
production HW. I found there is a huge speed boost using the main
index to walk a split set of sources and indexes. So instead of:
index datapoint
{
type = distributed
local = datapoint_core
}
I have:
index datapoint
{
type = distributed
local = datapoint_core_0
local = datapoint_core_1
local = datapoint_core_2
local = datapoint_core_3
}
Each source then has a range by IDs:
source datapoint_core_0
sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
FROM `restaurant_inspection_datapoints` WHERE `id` BETWEEN 0 AND 93878
Where 93878 is 1/4 of the records. Each index covers 1/4 of the total
IDs. On my laptop alone, this gave me massive gains...queries that
could take a second tool 300ms.
I would love to get this into riddle or thinking_sphinx, but the
riddle configuration code is really complex. Is anyone interested in
working with me on this or giving me a starting point?
Pat, does that use of "local" make sense? All the other examples out
there use agents.
See:
http://www.sphinxsearch.com/docs/current.html#distributed
http://www.sphinxsearch.com/bugs/view.php?id=407 <-- I didn't
encounter that bug
http://blog.wasimasif.com/sphinx-distributed-searching/
Thanks all, I'm very excited to get this up!
--
Matthew Beale ::
607 227 0871
Resume & Portfolio @
http://madhatted.com