[ts] strategies for delta indexing with more than 1 app server

40 views
Skip to first unread message

agibralter

unread,
Apr 19, 2010, 12:12:50 PM4/19/10
to Thinking Sphinx
I use Sphinx for displaying filtered, sortable, and searchable lists
of items in my web app. I use Sphinx even when there is no searching
involved because ThinkingSphinx's sphinx_scopes make it very easy to
chain attribute constraints together and because Sphinx is very fast.
There is an obvious disadvantage though: lists are not displayed in
realtime. This means I have to leverage delta indexing as much as
possible...

Because I have multiple app servers but only one Sphinx/searchd
server, I cannot use the standard delta indexer. I have seen the
caveat at the bottom of http://freelancing-god.github.com/ts/en/deltas.html
but I can't figure out a good way to send asynchronous delta update
jobs to my application server with searchd on it. Has anyone figured
out a good way to do this? Also, I'm not using delayed_job in my
app... I'm using Workling/Starling, but thinking of moving to
Resque... has anyone seen a delayed delta that uses Resque? If so, is
there a way to send the "async_update_deltas" jobs to my app server
with searchd?

Anyway, as an alternative, I've been using ts-datetime-delta to keep
the lists as up-to-date as possible, running the rake ts:index:delta
task every 2 minutes with cron on my server with searchd. I'm thinking
of moving to 1 minute to be more realtime... This seems much more
frequent than any examples of ts-datetime-delta I've seen... does this
sound crazy? Should I try and figure out a way to use the delayed
delta instead?

Sorry for all the questions! I appreciate any advice!

Best,
Aaron

--
You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
To post to this group, send email to thinkin...@googlegroups.com.
To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.

James Healy

unread,
Apr 19, 2010, 9:17:48 PM4/19/10
to thinkin...@googlegroups.com
agibralter wrote:
> Because I have multiple app servers but only one Sphinx/searchd
> server, I cannot use the standard delta indexer. I have seen the
> caveat at the bottom of http://freelancing-god.github.com/ts/en/deltas.html
> but I can't figure out a good way to send asynchronous delta update
> jobs to my application server with searchd on it. Has anyone figured
> out a good way to do this? Also, I'm not using delayed_job in my
> app... I'm using Workling/Starling, but thinking of moving to
> Resque... has anyone seen a delayed delta that uses Resque? If so, is
> there a way to send the "async_update_deltas" jobs to my app server
> with searchd?

An asynchronous work queue is definitely the way to go. If you're
already using workling/starling it should be possible to trigger delta
rebuilds using that.

The trick is that you can pass any class to the delta property in your
index definition:

set_property :delta => MyWorklingDelta

To setup MyWorklingDelta, you could use the delayed job delta gem as a
base, it's a very small piece of code.

There's a gem out there that used to support deltas via workling, you
could also see if that still works:

http://github.com/dpickett/workling_delta_indexer

-- James Healy <ji...@deefa.com> Tue, 20 Apr 2010 11:16:59 +1000

Nick Sellen

unread,
Apr 20, 2010, 4:14:21 AM4/20/10
to Thinking Sphinx
> multiple app servers but only one Sphinx/searchd

the caveat is "it will only work for a single searchd instance" - so
you should be fine to use the delayed delta with multiple app servers
(but it looks like you're not using delayed job anyway).

I use sphinx in a simliar way to you and the most useful information
in my index are the attributes (not fields) and you can actually live
update the index for these. I've stopped doing delta updates where
possible to do the live update instead. I've added a small extension
to thinking sphinx which makes it easier run http://pastebin.com/KgwWREnj.

(You need sphinx 0.9.9 if you want to live update MVA attributes
though)

Hope this is helpful.

Nick

On Apr 19, 5:12 pm, agibralter <aaron.gibral...@gmail.com> wrote:
> I use Sphinx for displaying filtered, sortable, and searchable lists
> of items in my web app. I use Sphinx even when there is no searching
> involved because ThinkingSphinx's sphinx_scopes make it very easy to
> chain attribute constraints together and because Sphinx is very fast.
> There is an obvious disadvantage though: lists are not displayed in
> realtime. This means I have to leverage delta indexing as much as
> possible...
>
> Because I have multiple app servers but only one Sphinx/searchd
> server, I cannot use the standard delta indexer. I have seen the
> caveat at the bottom ofhttp://freelancing-god.github.com/ts/en/deltas.html

agibralter

unread,
Apr 20, 2010, 9:16:12 AM4/20/10
to Thinking Sphinx
Here's the problem with async/delayed deltas: so let's say I have 2
app servers: App1 and App2. App1 has searchd running and App2 simply
connects to App1 for searches. App1 and App2 both have workling/
starling (I'm pretty sure something similar would happen with delayed
job)... So... if a request comes in on App2 that updates a record with
a delta index, App2 will add a job to its queue to update the delta
indexes. Now, if App2 tries to process the job, it will fail -- only
App1, the one with searchd running can run the delta indexer.

On Apr 19, 9:17 pm, James Healy <ji...@deefa.com> wrote:
> agibralter wrote:
> > Because I have multiple app servers but only one Sphinx/searchd
> > server, I cannot use the standard delta indexer. I have seen the
> > caveat at the bottom ofhttp://freelancing-god.github.com/ts/en/deltas.html

James Healy

unread,
Apr 20, 2010, 9:39:17 AM4/20/10
to thinkin...@googlegroups.com
agibralter wrote:
> Here's the problem with async/delayed deltas: so let's say I have 2
> app servers: App1 and App2. App1 has searchd running and App2 simply
> connects to App1 for searches. App1 and App2 both have workling/
> starling (I'm pretty sure something similar would happen with delayed
> job)... So... if a request comes in on App2 that updates a record with
> a delta index, App2 will add a job to its queue to update the delta
> indexes. Now, if App2 tries to process the job, it will fail -- only
> App1, the one with searchd running can run the delta indexer.

You have async workers on both app servers? I only have a single
DelayedJob worker in my setup, so regardless of which app server adds
a delta index jobs to the queue, the job is always executed on the
sphinx server.

Does workling support multiple queues? Can you setup a separate work
queue just for the delta tasks that is only processed on your sphinx
server?

-- James Healy <ji...@deefa.com> Tue, 20 Apr 2010 23:38:43 +1000

Jay Zeschin

unread,
Apr 20, 2010, 6:07:59 PM4/20/10
to thinkin...@googlegroups.com
Or what about setting up separate queues for each app server and putting a job in both for every delta?
--
Jay Zeschin
j...@zeschin.org
720.273.9549

agibralter

unread,
Apr 21, 2010, 5:52:01 PM4/21/10
to Thinking Sphinx
Hi Nick,

This looks awesome! I'm using Sphinx 0.9.9 -- so how exactly would I
use it? Set up a callback for my models that call the method in your
pastie when their attributes of interest get updated? Or is there a
way to hook into ThinkingSphinx's DSL for define_index such that `has
attribute` could take an option like :update => true? Does anyone know
how Pat feels about incorporating this into TS proper?

Also, does that update have to happen on the app server with searchd
running?

Thanks a ton!

-Aaron

On Apr 20, 4:14 am, Nick Sellen <goo...@nicksellen.co.uk> wrote:
> > multiple app servers but only one Sphinx/searchd
>
> the caveat is "it will only work for a single searchd instance" - so
> you should be fine to use the delayed delta with multiple app servers
> (but it looks like you're not using delayed job anyway).
>
> I use sphinx in a simliar way to you and the most useful information
> in my index are the attributes (not fields) and you can actually live
> update the index for these. I've stopped doing delta updates where
> possible to do the live update instead. I've added a small extension
> to thinking sphinx which makes it easier runhttp://pastebin.com/KgwWREnj.

Nick Sellen

unread,
Apr 22, 2010, 6:10:05 AM4/22/10
to Thinking Sphinx
My extension is only a very small addition to reveal the attribute
update functionality. It would need quite a bit more work to
incorporate it into sphinx proper (someone was talking about doing it
somewhere I read).

It suits my needs well enough for now but it's a fairly manual process
to run it though. I added it to get around the process of doing
updates to a lot of records and avoiding this kind of process:
. add a delayed job for each record to remove the it from the main
index (well, set the "sphinx_deleted" attribute using client.update)
. sql to update each single record to delta = 1
. and then the job to run the delta index

I was using Model.suspended_delta { # do stuff with Model records }
but that doesn't set the "sphinx_deleted" attribute so there was
problems having duplicate entries in the main and delta indexes.

Currently I use Model.update_all calls (which doesn't instantiate the
model therefore no callbacks) and manually calculated
ThinkingSphinx.update_index calls. But it's not ideal still because I
still need to read quite a lot of individual rows from the db to
update the MVA attributes (as you can't make a call to add or remove a
value from the array you need to set the whole thing which requires
fetching it first).

So I think I'm going to go back to running the delta sql for mass
updates going forwards and using the attribute update methods for
smaller updates. e.g.:
Model.update_all("some things I want to update",update conditions)
Model.update_all('delta = 1',update conditions)
Model.update_delta_index_and_wait # another addition I've made to
perform the delta update synchonously

agibralter

unread,
Apr 22, 2010, 11:22:59 AM4/22/10
to Thinking Sphinx
Yeah I'm thinking I need to switch to Resque... :)

agibralter

unread,
Apr 22, 2010, 12:16:41 PM4/22/10
to Thinking Sphinx
Hmm I'm not sure I follow... As far as integration with the DSL I was
thinking of something like: http://gist.github.com/375411

But in general, delta indexes aside, if I wanted to do live updates on
attributes I could create some ActiveRecord before_save callbacks that
check for changed attributes and update the Sphinx indexes like so:
http://gist.github.com/375427 ?

Now, my question is, does ThinkingSphinx.update_index need to be
called on a server running searchd? Or can it be called remotely? It
seems like it just uses the Riddle client and can be called from
anywhere... no need to have access to the actual index core files.

Thanks again for the help!

Best,
Aaron

Pat Allan

unread,
Apr 22, 2010, 11:07:24 PM4/22/10
to thinkin...@googlegroups.com
Hi guys

Seems there's a few topics being covered here, so it's taking a bit to grok it all, but with regards to MVA attribute updating: if it's something that Sphinx supports, then I'd like TS to support it natively as well. So Nick, if you want to wrap your changes into a patch with specs, that'd be fantastic.

Obviously, you'll want to check what version of Sphinx is being used, but that's do-able via configuration.version.

Aaron, what you're wanting to is a little bit more complex, but I like your approaches... maybe the more general hook is better, allowing for complex situations. A patch for that would be fantastic, too. You can invoke updates for Sphinx from any machine - all it actually does is update the attributes within searchd's memory, not the index files - and it's all over the socket anyway, so even if it did change the file, it can be remote.

Hope this helps clear a couple of things up - would love to see what you guys come up with. And if there's other questions I've not answered, let me know :)

--
Pat

agibralter

unread,
Apr 26, 2010, 12:58:13 PM4/26/10
to Thinking Sphinx
Hi Pat, thank you for the response. I'll try to take a stab at a
patch. Just to be clear, when you say updates just take place in
searchd's memory, does that mean if searchd restarts, the attributes
will be stale until the next time the indexer is run?

Also, on a separate note: do you have any thoughts on whether delayed-
delta or datetime-delta would be more efficient in system that updates
very often? I.e I'd like to have index updates reflected in a minute
if possible. Are there any tradeoffs between

"set_property :delta => true" with delayed-delta

and

"set_property :delta => :datetime, :threshold => 2.minutes" with
datetime-delta?

Thanks again!

-ajg-

agibralter

unread,
Apr 26, 2010, 2:01:03 PM4/26/10
to Thinking Sphinx
Also, I came across update_attribute_values in lib/thinking_sphinx/
active_record/attribute_updates.rb... how exactly is that used?
> > You received this message because you are...
>
> read more »

Pat Allan

unread,
Apr 28, 2010, 5:00:57 AM4/28/10
to thinkin...@googlegroups.com
On 27/04/2010, at 2:58 AM, agibralter wrote:

> Hi Pat, thank you for the response. I'll try to take a stab at a
> patch. Just to be clear, when you say updates just take place in
> searchd's memory, does that mean if searchd restarts, the attributes
> will be stale until the next time the indexer is run?

That's correct. It's not a perfect implementation...

> Also, on a separate note: do you have any thoughts on whether delayed-
> delta or datetime-delta would be more efficient in system that updates
> very often? I.e I'd like to have index updates reflected in a minute
> if possible. Are there any tradeoffs between
>
> "set_property :delta => true" with delayed-delta
>
> and
>
> "set_property :delta => :datetime, :threshold => 2.minutes" with
> datetime-delta?

Delayed Delta won't repeat a delta index job if there's already one there for that index, so it could be a bit more reliable for getting changes in there quickly without overloading the system. Also, maybe using Model.suspended_delta is worthwhile if you're doing bulk updates?

Beyond that, either should do the job, really.

Cheers

--
Pat

Pat Allan

unread,
Apr 28, 2010, 5:04:06 AM4/28/10
to thinkin...@googlegroups.com
That's the code that manages the attribute updates in searchd's memory - it's all handled by the update API call to Sphinx (lines 26-28 of that file).

Cheers

--
Pat

Nick Sellen

unread,
Apr 28, 2010, 6:51:01 AM4/28/10
to Thinking Sphinx
> updates just take place in searchd's memory, does that mean if searchd restarts, the attributes will be stale until the next time the indexer is run?

thats true for MVA attributes but normal single value attributes will
be persisted to disk on a clean shutdown of searchd:

"They are very fast because they're working fully in RAM, but they can
also be made persistent: updates are saved on disk on clean searchd
shutdown initiated by SIGTERM signal." (from
http://www.sphinxsearch.com/docs/current.html#api-func-updateatttributes)
> ...
>
> read more »

Pat Allan

unread,
Apr 28, 2010, 7:03:50 AM4/28/10
to thinkin...@googlegroups.com
Ah, well how about that! Thanks for sharing Nick, I didn't know that.

--
Pat
Reply all
Reply to author
Forward
0 new messages