Deltas does not appear in search

16 views
Skip to first unread message

Elad Meidar

unread,
May 21, 2009, 10:52:46 AM5/21/09
to Thinking Sphinx
i have sphinx 0.9.8.1 installed and for some reason, it refuses to
take deltas into consideration when searching, until i re-index

here is an example for the console:

#<StatusUpdate id: 31109, content: "Microsoft just patented
extortion", author: "xxxx", source_id: "1", direct_link: "hxxxx", uid:
"234234233", created_at: "2009-05-21 14:14:55", updated_at:
"2009-05-21 14:14:55", handler: "t", posted_at: "1242915290", delta:
true>

but searching for "microsoft":

StatusUpdate.search("@content microsoft", :per_page =>
1000, :match_mode => :extended)

does not return this result.

here is my index:

define_index do
indexes content, :sortable => true
has posted_at, handler, author

has users(:id), :as => :user_ids
set_property :delta => true # for stuff that are added between
indexing
end



another thing is that when i 'ts:in:delta', nothing seem to be
happening, the delta flag stays as it should on new records, but they
still don't appear in search (actually the rake task does not even
return any output).



aitrus

unread,
May 21, 2009, 12:34:06 PM5/21/09
to Thinking Sphinx
Did you stop, index, and start the server?

Also, since I began using TS, I've always set the infix stuff. Have
you tried searching using the exact string in the content attribute
(Microsoft just patented extortion)?

Furthermore, that Patent would not hold up in court as IBM was in the
extortion business long before Bill Gates was even germinated.

Scott

Pat Allan

unread,
May 21, 2009, 2:44:09 PM5/21/09
to thinkin...@googlegroups.com
Hi Elad

The ts:in:delta task is only for datetime deltas, and you're using the
default approach instead, so you don't need to run that.

When you create a new status update in console, do you also see
several lines from the Sphinx indexer task being output?

--
Pat

Elad Meidar

unread,
May 21, 2009, 3:08:24 PM5/21/09
to Thinking Sphinx
Hi guys,

I tried to wrestle with it a little bit, tried to create a new
StatusUpdate record to see what the log says... but nothing there, no
sphinx message of any kind.
I decided to change my model to use set_property :delta => :datetime
instead of :delta => true and it appears to be working at the
moment... but i would still like to find out why the :delta => true
fails.

i attach my production.conf so you might be able to point out where
i'm wrong (generated with ts:conf only).
http://pastie.org/485658

Thanx guys..

Scott.. yeah, it's not an extortion of course... at least not as worst
as it used to be :)

Pat Allan

unread,
May 21, 2009, 3:18:19 PM5/21/09
to thinkin...@googlegroups.com
Ah, if you're not seeing any output, then it could be that the Sphinx
binaries - indexer and searchd - aren't referenced in the PATH set
within your web server (I'm guessing passenger).

You'll need to add the following to your config/sphinx.yml (or create
it):

production:
bin_path: /usr/local/bin

The bin_path should be set to the output of `which indexer` from the
shell - minus /indexer, of course.

You're not seeing the problem with the datetime approach because
you're invoking the rake task yourself, not via the web server.

... of course, I may be completely wrong about the problem :)

--
Pat

Elad Meidar

unread,
May 21, 2009, 3:43:49 PM5/21/09
to Thinking Sphinx
you nailed it.

on development it's working...

but i don't seem to see the bin_path instructions on the development
conf.... where it should be ?
this is my sphinx.yml


development: &my_settings
min_prefix_len: 0
min_infix_len: 1
min_word_len: 1
max_results: 70000
bin_path: /usr/local/bin
morphology: none
charset_table: ** huge charset.. i removed it for readability **
test:
<<: *my_settings
production:
<<: *my_settings

Pat Allan

unread,
May 21, 2009, 3:46:21 PM5/21/09
to thinkin...@googlegroups.com
You probably don't need bin_path set on your development environment -
I've never had problems, because it's all run by my own user, not
passenger.

And it's not something that appears in the configuration file, it's
just how Thinking Sphinx makes calls to the binaries.

Cheers

--
Pat

Elad Meidar

unread,
May 21, 2009, 4:25:03 PM5/21/09
to Thinking Sphinx
it makes no difference....

still i can't see no sphinx messages when new items are created, and
delta => true status_updates aren't in results either...

is there anything else i need to do in order to enable access to the
indexer bin ? maybe permissions to the www user? seem like this is the
problem.

Pat Allan

unread,
May 21, 2009, 7:21:40 PM5/21/09
to thinkin...@googlegroups.com
Hm, wait, just re-thinking - if you can't see the output from script/
console, then it's not a path issue - because script/console is run by
yourself, not passenger.

What's the output of `indexer` (with angle quotes) in script/console?

--
Pat

Elad Meidar

unread,
May 21, 2009, 10:49:32 PM5/21/09
to Thinking Sphinx
this http://pastie.org/486059

which seem like the right output for the command...

i tried to stop/start the server....nothing helps.

Elad Meidar

unread,
May 21, 2009, 11:04:46 PM5/21/09
to Thinking Sphinx
ok,
i think i have a misconception somewhere and would VERY appreciate if
someone would be able to clarify that.

i killed the 'searchd' process manually and restarted it with
ts:start.

i invoked manually 'indexer -c path/to/conf --rotate'

and not it appears that deltas ARE showing on results, but only after
couple of moments after the actual record is created.

My questions are:

1. is this delay normal?
2. i don't want to manually kill searchd everytime... so i'm pretty
sure the problem is somewhere on my deployment process... can someone
point me to the must updated tutorial (until the homepage will be
finished that is.)

Elad Meidar

unread,
May 21, 2009, 11:31:23 PM5/21/09
to Thinking Sphinx
Sorry bout the typos... i am dead tired and was on this issue all
day. :)

Elad Meidar

unread,
May 22, 2009, 11:05:07 PM5/22/09
to Thinking Sphinx
this is the log of searchd.log

[Sat May 23 03:01:43.300 2009] [12951] WARNING: rotating index
'status_update_delta': cur to old rename failed: rename /var/www/
statussearch2/releases/20090523013634/db/sphinx/production/
status_update_delta.spl to /var/www/statussearch2/releases/
20090523013634/db/sphinx/production/status_update_delta.old.spl
failed: No such file or directory
[Sat May 23 03:01:43.300 2009] [12951] rotating finished

there really isn't a file like that in that folder

Pat Allan

unread,
May 23, 2009, 12:35:53 AM5/23/09
to thinkin...@googlegroups.com
What's the contents of db/sphinx/production?

This isn't something I've seen before... but just to confirm, have you
stopped Sphinx since enabling deltas, re-indexed, and then restarted
Sphinx?

If not, then it looks like it's getting a bit confused about file
names. I'd recommend stopping Sphinx, deleting db/sphinx/production
(after you confirm what's in that directory :), re-indexing, and then
restarting.

--
Pat

Elad Meidar

unread,
May 23, 2009, 12:46:32 AM5/23/09
to Thinking Sphinx
That's what i am doing at the moment...
hopefully i'll see the end of it :)

Elad Meidar

unread,
May 23, 2009, 1:38:43 AM5/23/09
to Thinking Sphinx
ok, os i deleted everything from db/sphinx

here's the output: http://pastie.org/487140

it looks like that all the *_delta* files are owned by root, and not
by the user that i run ts:index with (web).
so that explains probably why they are unaccessible... any idea what
can i do?

On May 23, 12:35 am, Pat Allan <p...@freelancing-gods.com> wrote:

Pat Allan

unread,
May 23, 2009, 1:43:25 AM5/23/09
to thinkin...@googlegroups.com
You need the web server and the rake tasks to be run by the same user
- either both by root, or some other user of your choice. This should
avoid any permissions issues.

The *easiest* way is probably to run the rake tasks with sudo - not
convinced that's the *best* way though. Others may know better :)

--
Pat

James Healy

unread,
May 23, 2009, 2:19:04 AM5/23/09
to thinkin...@googlegroups.com
Pat Allan wrote:
> You need the web server and the rake tasks to be run by the same user
> - either both by root, or some other user of your choice. This should
> avoid any permissions issues.
>
> The *easiest* way is probably to run the rake tasks with sudo - not
> convinced that's the *best* way though. Others may know better :)

As a general rule you really don't want to run internet accessible
daemons as root.

I personally use the Debian convention of www-data user and group for my
webserver, mongrels and cron triggered rake tasks. It doesn't matter too
much which user you use, just pick or create one with reduced
privileges. You want to minimise the impact of a malicious user finding
an exploitable bug in the prcess.

-- James Healy <jimmy-at-deefa-dot-com> Sat, 23 May 2009 16:14:36 +1000

Elad Meidar

unread,
May 23, 2009, 10:20:47 AM5/23/09
to Thinking Sphinx
well, the rake tasks are run by the deploying user, which is 'web'

but i think that there are some cron tasks (--rotate for example) that
are run by 'root'

i'll move everything to 'web' and i'll see where it's heading.


Thnx.

Elad Meidar

unread,
May 23, 2009, 4:34:08 PM5/23/09
to Thinking Sphinx
Well, i moved everything to web
(ts:stop, ts:index, :ts:start after clearing all the db/sphinx folder)

but still all the delta files are created under the root ownership, i
really don't know why.. i am sure that only the web user is doing any
kind of thinking_sphinx related actions.
when i manually chown the files to be under the "web" user, deltas
appear on search and everything is awesome.

this is my crontab for the web user... any idea how or who is changing
those files ownerships?

*/2 * * * * cd /var/www/statussearch2/current/ && rake
RAILS_ENV=production ts:index --rotate
* */5 * * * cd /var/www/statussearch2/current/ && rake
RAILS_ENV=production ts:index

Pat Allan

unread,
May 23, 2009, 4:37:30 PM5/23/09
to thinkin...@googlegroups.com
Are your mongrels running as root? Or passenger? This is the process
that will invoke delta indexing, and thus overwrite the existing files
to new ones with root access only.

--
Pat

Elad Meidar

unread,
May 23, 2009, 6:23:57 PM5/23/09
to Thinking Sphinx
i'm running passenger on the default apache user www-data, i didn't
change nothing from the default apache/passenger installations.

i tried a little test....

i chown'ed the *detla* files to web:web, just like the *core* files
and checked that it really happened.
then, i ran "rake RAILS_ENV=production ts:index --rotate" and listed
the files again.

owner was again root.

Pat Allan

unread,
May 23, 2009, 6:34:28 PM5/23/09
to thinkin...@googlegroups.com
How are you running the rake task? Via capistrano? Or ssh'd into your
production machine?

--
Pat

Elad Meidar

unread,
May 23, 2009, 8:59:05 PM5/23/09
to Thinking Sphinx
now SSH. i thought about testing the configuration and running process
manually before deploying with it.

Pat Allan

unread,
May 23, 2009, 9:25:03 PM5/23/09
to thinkin...@googlegroups.com
I guess what I was wondering is whether you were using the 'run'
command or the 'sudo' command in your capistrano tasks - I know I've
made the mistake of using the latter when 'run' would have been the
better choice.

--
Pat

Elad Meidar

unread,
May 24, 2009, 10:58:42 AM5/24/09
to Thinking Sphinx
run... and i also set the :admin_runner option to 'web'...

The same user that runs 'searchd' in the first place, it's the same
user that should be running 'indexer' calls too?

Pat Allan

unread,
May 24, 2009, 1:40:49 PM5/24/09
to thinkin...@googlegroups.com
Yup, you want that web user to do everything. If the delta indexes are
still ending up as owned by root, I'm confused. The TS code doesn't
invoke any users, it just makes the call to indexer.

--
Pat

Elad Meidar

unread,
May 24, 2009, 6:31:53 PM5/24/09
to Thinking Sphinx
Ok,

i re-did my entire deployment all over again, making sure that the
'web' user is responsible for all actions taken in the deployment
process, including thinking sphinx related tasks.

Now, deltas *DO* Appear on search, but i can't re-index:

web@socialninjaz:/var/www/statussearch2/current$ rake
RAILS_ENV=production ts:index
(in /var/www/statussearch2/releases/20090523013634)
Generating Configuration to /var/www/statussearch2/releases/
20090523013634/config/production.sphinx.conf
/usr/local/bin/indexer --config /var/www/statussearch2/releases/
20090523013634/config/production.sphinx.conf --all --rotate
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/var/www/statussearch2/releases/20090523013634/
config/production.sphinx.conf'...
indexing index 'status_update_core'...
collected 62039 docs, 5.7 MB
collected 0 attr values
sorted 0.1 Mvalues, 100.0% done
sorted 22.3 Mhits, 97.9% done
total 62039 docs, 5703116 bytes
total 3146.338 sec, 1812.62 bytes/sec, 19.72 docs/sec
indexing index 'status_update_delta'...
FATAL: failed to open /var/www/statussearch2/releases/20090523013634/
db/sphinx/production/status_update_delta.tmp.spl: Permission denied,
will not index. Try --rotate option.

The file exists but under root ownership again.

web@socialninjaz:/var/www/statussearch2/current$ ls -l db/sphinx/
production/
total 133328
-rw-r--r-- 1 web web 2479160 May 24 20:25 status_update_core.spa
-rw-r--r-- 1 web web 83895816 May 24 20:25 status_update_core.spd
-rw-r--r-- 1 web web 367 May 24 20:25 status_update_core.sph
-rw-r--r-- 1 web web 6302754 May 24 20:25 status_update_core.spi
-rw------- 1 web web 0 May 24 20:26 status_update_core.spl
-rw-r--r-- 1 web web 1266960 May 24 20:25 status_update_core.spm
-rw-r--r-- 1 web web 40304468 May 24 20:25 status_update_core.spp
-rw-r--r-- 1 root root 30960 May 24 22:30 status_update_delta.spa
-rw-r--r-- 1 root root 1165980 May 24 22:30 status_update_delta.spd
-rw-r--r-- 1 root root 367 May 24 22:30 status_update_delta.sph
-rw-r--r-- 1 root root 364375 May 24 22:30 status_update_delta.spi
-rw------- 1 web web 0 May 24 22:30 status_update_delta.spl
-rw-r--r-- 1 root root 15476 May 24 22:30 status_update_delta.spm
-rw-r--r-- 1 root root 514466 May 24 22:30 status_update_delta.spp
-rw-r--r-- 1 root root 0 May 24 22:30
status_update_delta.tmp.spl



i made sure that all capistrano activity and cron jobs are operated by
the 'web' user... i don't really know what is going on really...

Pat Allan

unread,
May 24, 2009, 6:59:38 PM5/24/09
to thinkin...@googlegroups.com
Getting closer...

The question we need to solve: Why are the delta files owned by root?
Are you running the site via mongrels or passenger?

What's the output of `ps aux | grep mongrel`? (if mongrels are what
you're using, of course)

--
Pat

Elad Meidar

unread,
May 24, 2009, 7:40:09 PM5/24/09
to Thinking Sphinx
Running on passenger, simple install, nothing fancy. i can post
whatever you want :)

i am confused too, probably cause' i don't know what exactly the
'searchd' and 'indexer' commands are doing with these files...

i did 'ps aux | grep apache' (thinking that would be the right command
to invoke in a passenger installation case).

root 3964 0.0 0.7 156428 8104 ? Ss May19 0:44 /usr/
sbin/apache2 -k start
root 3985 0.0 0.3 152440 3712 ? Sl May19 0:01 /opt/
ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.2.2/
ext/apache2/ApplicationPoolServerExecutable 0 /opt/ruby-
enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.2.2/bin/
passenger-spawn-server /opt/ruby-enterprise-1.8.6-20090201/bin/ruby /
tmp/passenger.3964
www-data 4057 0.0 0.5 156428 5572 ? S 17:20 0:00 /usr/
sbin/apache2 -k start
www-data 6055 0.0 0.5 156428 5596 ? S 09:34 0:00 /usr/
sbin/apache2 -k start
www-data 19524 0.0 0.5 156428 5580 ? S 12:44 0:00 /usr/
sbin/apache2 -k start
www-data 22550 0.0 0.5 156432 5728 ? S May23 0:00 /usr/
sbin/apache2 -k start
www-data 22557 0.0 0.5 156564 5736 ? S May23 0:00 /usr/
sbin/apache2 -k start
www-data 22820 0.0 0.5 156428 5572 ? S 14:28 0:00 /usr/
sbin/apache2 -k start
www-data 22946 0.0 0.5 156560 5700 ? S May23 0:00 /usr/
sbin/apache2 -k start
www-data 24681 0.0 0.5 156564 5736 ? S May21 0:00 /usr/
sbin/apache2 -k start
www-data 25864 0.0 0.5 156428 5588 ? S 14:56 0:00 /usr/
sbin/apache2 -k start
web 28277 0.0 0.0 3948 648 pts/2 S+ 23:38 0:00 grep
apache
www-data 29885 0.0 0.5 156428 5576 ? S 16:09 0:00 /usr/
sbin/apache2 -k start

Pat Allan

unread,
May 24, 2009, 7:47:49 PM5/24/09
to thinkin...@googlegroups.com
Looks like passenger is being run as root - I'm not sure why the first
apache is root and the rest not, but it's probably passenger that
involves talking to Rails.

Something I'm not sure was made clear in previous emails (so sorry if
I'm explaining what's already obvious): delta indexing is invoked by
the rails stack, not a rake task, so that means passenger in this
case, and so it's using passenger's permissions to run indexer. I
think that's the source of the problem.

--
Pat

Elad Meidar

unread,
May 24, 2009, 8:28:09 PM5/24/09
to Thinking Sphinx
mmm..... might be.... i'll try to check in the passenger configuration
what is going on...
> ...
>
> read more »

Elad Meidar

unread,
May 24, 2009, 8:55:06 PM5/24/09
to Thinking Sphinx
yes, you are right.

i did 'ps aux | grep index' to see who is running 'indexer' calls

root@socialninjaz:~# ps aux | grep index
web 29873 0.0 6.5 150724 68792 pts/2 S+ 00:13 0:02 /opt/
ruby-enterprise-1.8.6-20090201/bin/ruby /usr/bin/rake
RAILS_ENV=production ts:index
root 31432 0.6 0.2 30016 2336 ? S 00:53 0:00 /usr/
local/bin/indexer --config /var/www/statussearch2/releases/
20090523013634/config/production.sphinx.conf status_update_delta
root 31434 0.0 0.0 3944 604 pts/0 R+ 00:53 0:00 grep
index


so root is running the delta updates... any idea why? ts:start was
made with 'web' as you can see here....
> ...
>
> read more »

Pat Allan

unread,
May 24, 2009, 8:57:56 PM5/24/09
to thinkin...@googlegroups.com
ts:start only controls the search daemon (searchd) process. Delta
indexes are completely separate, and run by the rails app - which in
your case, is done within Passenger (running as root).

--
Pat

Elad Meidar

unread,
May 24, 2009, 9:04:49 PM5/24/09
to Thinking Sphinx
i am not running that command i posted before (indexer with
status_update_delta parameter)... what do you mean 'invoked by the
stack' ? rake ? passenger?

Pat Allan

unread,
May 24, 2009, 9:08:14 PM5/24/09
to thinkin...@googlegroups.com
Rails is running on Passenger, and it's Rails (via Thinking Sphinx)
that makes the call to indexer using the status_update_delta parameter.

If you can get Passenger running on the web user instead of root, that
should probably fix the problem.

Cheers

--
Pat

Elad Meidar

unread,
May 24, 2009, 10:21:51 PM5/24/09
to Thinking Sphinx
ok, so i moved everything to run under root (server + ts:index)...

and now it seems that the deltas are working fine... i'll try to re-
index everything later on and see if that goes successful.

Pat Allan

unread,
May 24, 2009, 10:32:15 PM5/24/09
to thinkin...@googlegroups.com
Well, I guess if it works, that's great.

Ideally, it's best to have it all running as web, but I guess
pragmatism sometimes comes before potential security issues - I know
my deployment approaches aren't perfect either.

--
Pat

Elad Meidar

unread,
May 24, 2009, 11:36:37 PM5/24/09
to Thinking Sphinx
nop, it's not ok yet...

even when running as root.... the indexer fails..

Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/var/www/statussearch2/releases/20090523013634/
config/production.sphinx.conf'...
indexing index 'status_update_delta'...
FATAL: failed to lock /var/www/statussearch2/releases/20090523013634/
db/sphinx/production/status_update_delta.tmp.spl: Resource temporarily
unavailable, will not index. Try --rotate option.

that's the automated indexer instance... i made sure that nothing else
is running..

Pat Allan

unread,
May 25, 2009, 2:38:00 AM5/25/09
to thinkin...@googlegroups.com
I'm running out of ideas, I'm afraid - I've not seen .tmp in a index
name before (only .old and .new).

--
Pat

Elad Meidar

unread,
May 25, 2009, 1:43:56 PM5/25/09
to Thinking Sphinx
well, right now... i am running a cron myself with 'rake ts:index --
rotate'
which loads the deltas and makes them available on search, but i am
positive it's not the right way cause when i try to do a full index, i
get the error stated above that the resource
(status_update_delta.tmp.spl) is not available.

I tried to read the sphinx documentation and see if there is anything
i can read about the file structure and who invokes what.. but
apparently it's not there...

Pat, i appreciate your help and i will wait for you to finish the
'deployment' and 'indexing' sections of the new documentation.

If you can please, pay close attention to the parts where you explain
about users and how you should setup your own environment.... i know
it's kind of rude to ask, but a sample app with thinking_sphinx,
deltas, cron jobs and capistrano instructions would be more than
awesome.

Pat Allan

unread,
May 25, 2009, 9:32:44 PM5/25/09
to thinkin...@googlegroups.com
Heya Elad

Something that I didn't know, but may be useful in further debugging:
Passenger doesn't necessarily invoke Rails apps using root, even
though it's running as root.
http://www.modrails.com/documentation/Users%20guide%20Apache.html#PassengerDefaultUser

I'm out of my depth on this, but sounds useful. Will do my best to get
more documentation for TS done, but no promises as to when.

--
Pat
Reply all
Reply to author
Forward
0 new messages