regarding delta indexes with remote sphinx servers

169 views
Skip to first unread message

agib

unread,
May 1, 2009, 2:51:14 AM5/1/09
to Thinking Sphinx
I'm not sure I understand how to get the deltas working on a 2+ server
environment... let's say I have server A (app + sphinx) and server B
(app). If a request to server B updates a model that has :delta =>
true, how does the sphinx index on server A get updated? Do I have to
set up some sort of shared filesystem? I'm on EC2 and I'm not sure
that's possible... I used to have A (app + sphinx) and B (app +
sphinx) but then I realized that it was possible for both servers to
return different results (i.e. I could refresh a search result page
and get alternating results). Is there any good solution for remote
delta indexes?

Josh

unread,
May 1, 2009, 7:59:26 AM5/1/09
to Thinking Sphinx
There are a few ways to do this, though I'm not sure what will work on
EC2, check out this thread:

http://groups.google.com/group/thinking-sphinx/browse_thread/thread/bc2e78469537dfcc/cade5de854dcd2a9

-Josh

agib

unread,
May 1, 2009, 10:26:22 AM5/1/09
to Thinking Sphinx
Hi Josh, thank you for the response, but I still don't see how that
fixes the deltas issue...

Is anyone using sphinx's built-in distributed searching feature?
Wouldn't that be the best solution to this problem?

On May 1, 7:59 am, Josh <jnatan...@gmail.com> wrote:
> There are a few ways to do this, though I'm not sure what will work on
> EC2, check out this thread:
>
> http://groups.google.com/group/thinking-sphinx/browse_thread/thread/b...

Josh

unread,
May 2, 2009, 7:16:46 AM5/2/09
to Thinking Sphinx
Sorry, I neglected half of your question. In our case, we run both a
daily full-index and a more frequent delta index on one machine.
Regardless of which type of index we are running, we rename the
resulting files and push them to each server that runs searchd, and
send the SIGHUP signal to get the indexes refreshed.

The downside of this is that we can't use thinking_sphinx's spiffy
indexing tasks, but it does work well. Again, I'm not sure how easy
this is under EC2, I don't have any experience there.

- Josh

wbharding

unread,
May 3, 2009, 6:56:35 PM5/3/09
to Thinking Sphinx
We build our indexes on a remote machine (that uses a slave version of
our DB), then sftp the resulting index files to our web servers, each
of which run their own TS instance that uses cron to send a SIGHUP
that refreshes the search, similar to what it sounds like Josh is
describing.

Two weeks ago, I spent a couple days trying to update this
configuration so we could use time-based delta indexing on that remote
machine to rebuilding our indexes more frequently. However, we ran
into a number of instances where this broke search in a variety of
interesting ways... everything from only parts of the search string
being used, to partial results being returned (ie., only items older
than 3 months).

Ultimately, we reverted back to just doing full indexes and sftping
them (as described in first paragraph). I'm not entirely sure which
aspect of the delta process is to blame for our troubles (was it the
Sphinx merging? The Thinking Sphinx time-stamp delta indexing? Or
just our own code?), but we went through a lot of pain when we tried
to combine delta indexing with across multiple servers.

Seeing as how our indexing now takes almost two hours (and ideally our
main site search would be updated once/hour or more), we'll surely
have to revisit this before too much longer. I'll post the results if/
when I manage to crack this nut.

Bill

agib

unread,
May 3, 2009, 7:03:39 PM5/3/09
to Thinking Sphinx
Hmm... interesting, and thank you for the feedback! Still seems like
there isn't an ideal set up for this. May I ask how many rows and what
kind of rows lead to 2hr indexing? Right now I have 20,000 rows being
indexed and it only takes a few seconds to run.

I really don't know too much about sphinx itself, but I wonder if
there's a way to use it's built in distributed index like this:

server A1: app + sphinx (delta index only)
server A2: app + sphinx (delta index only)
server An: ...
server B: db + sphinx (cron full indexing + clear all app server delta
indexes)

Maybe I'll post a question on the sphinx forums too.

-ajg-

wbharding

unread,
May 3, 2009, 7:07:52 PM5/3/09
to Thinking Sphinx
We've got a lot of data!

2 hours indexes two tables, one of which has about 2 million rows and
the other has about 1 million rows. The table with two million rows
also has to index about 20 different attributes, many of which are
accessed through multi-model associations.

I remember the days of indexing being possible within a few seconds.
If your indexing is that fast, it may be a workable hackaround to just
re-index every 5-10 minutes using cron...?

Bill

agib

unread,
May 4, 2009, 1:14:57 AM5/4/09
to Thinking Sphinx
ok I have a good discussion going on the sphinx forums:
http://www.sphinxsearch.com/forum/view.html?id=3475
Reply all
Reply to author
Forward
0 new messages