How to overide full index with new data from delta index when merging?

94 views
Skip to first unread message

Canvas

unread,
Nov 24, 2009, 4:00:33 PM11/24/09
to Thinking Sphinx
Hi there guys,

I am currently using sphinx 9.8.1. The following is from sphinx 9.8.1
document:

" The basic command syntax is as follows:

indexer --merge DSTINDEX SRCINDEX [--rotate]

Only the DSTINDEX index will be affected: the contents of SRCINDEX
will be merged into it. --rotate switch will be required if DSTINDEX
is already being served by searchd. The initially devised usage
pattern is to merge a smaller update from SRCINDEX into DSTINDEX.
Thus, when merging the attributes, values from SRCINDEX will win if
duplicate document IDs are encountered. Note, however, that the "old"
keywords will not be automatically removed in such cases. For example,
if there's a keyword "old" associated with document 123 in DSTINDEX,
and a keyword "new" associated with it in SRCINDEX, document 123 will
be found by both keywords after the merge. You can supply an explicit
condition to remove documents from DSTINDEX to mitigate that; the
relevant switch is --merge-dst-range:

indexer --merge main delta --merge-dst-range deleted 0 0

This switch lets you apply filters to the destination index along with
merging. There can be several filters; all of their conditions must be
met in order to include the document in the resulting mergid index. In
the example above, the filter passes only those records where
'deleted' is 0, eliminating all records that were flagged as deleted
(for instance, using UpdateAttributes() call). "

It seems that I need to use "UpdateAttributs()" call to update full
index before merging. My question here is how to call "UpdateAttributes
()" to update the full index to mark the records in the delta index as
deleted?

By the way, I am using a view which contains a "update_at" column,
which is used as the timestamp to catch data in delta index.


Any suggestion is appreciated. Thanks.

Best wishes,

Canvas

Pat Allan

unread,
Nov 25, 2009, 3:26:45 AM11/25/09
to thinkin...@googlegroups.com
Hi Canvas

Are you using Thinking Sphinx? It adds an internal attribute called
sphinx_deleted, and sets records' values to 1 when they are deleted in
Ruby code. Then, if you're using the datetime deltas, the merge
automatically uses the --merge-dst-range option to remove deleted
items from the index.

However, if you're using Riddle, then the equivalent call is
client.update, not UpdateAttributes. I recommend looking at the source
code to get a good understanding of how it all works. Riddle
documentation is pretty thin on the ground, but I'd like to improve it
over time.

Hope this helps.

--
Pat
> --
>
> You received this message because you are subscribed to the Google
> Groups "Thinking Sphinx" group.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en
> .
>
>

Canvas

unread,
Nov 26, 2009, 8:48:51 PM11/26/09
to Thinking Sphinx
Hi Pat,

Thanks for your timely reply. client.update() does work. But I still
have a wierd problem. Following are the steps for me to reproduce the
problem.

Step 1: build full index
# rake thinking_sphinx:index RAILS_ENV=development

Step 2: make some changes to file 4013

Step 3: build delta index ==> the file 4013 is searchable with both
old and new data
# rake thinking_sphinx:index:delta RAILS_ENV=development

Step 4: flag the file in core index as "sphinx_deleted" ==> the file
is searchable only by new data, so far so good.
>> client.update('buy_sell_file_core', ['sphinx_deleted'], { 4013 => [1] })

Step 5: merge delta to core
# /usr/local/bin/indexer --config '/workspace/CA/BETA_2/EconveyancePro/
config/development.sphinx.conf' --rotate --merge buy_sell_file_core
buy_sell_file_delta --merge-dst-range sphinx_deleted 0 0

And now the problem occurs, I can search by both new and old data
again. It doesn't make any sense to me. Why can I search by the old
data again? Isn't the old index supposed to be deleted in Step 4? Did
I do anything wrong in step 5?

Thank you very much for your help.


Best wishes,

Canvas
> > .- Hide quoted text -
>
> - Show quoted text -

Pat Allan

unread,
Nov 26, 2009, 9:25:04 PM11/26/09
to thinkin...@googlegroups.com
Hmm, I wonder if Sphinx doesn't delete documents that appear in both
indexes...

Scenarios to try:
- delete a record, add a different record, merge and see if the
deleted record is kept around or not
- flag the core copy as deleted, merge with the empty delta, re-index
delta, merge again, see if only the updates are kept.

--
Pat

Canvas

unread,
Nov 27, 2009, 8:37:11 PM11/27/09
to Thinking Sphinx
Hi Pat,

It turns out that "client.update('buy_sell_file_core',
['sphinx_deleted'], { 4013 => [1] }) " updates index in RAM only. And
the original core index stays unchanged. I am now trying sphinx 0.9.9-
rc2, which includes a new feature sql_query_killlist to deal with this
issue. I'll let you know when I try it out.

http://www.sphinxsearch.com/forum/view.html?id=2552

Thanks a lot.

Canvas

Pat Allan

unread,
Nov 27, 2009, 9:04:47 PM11/27/09
to thinkin...@googlegroups.com
Ah, I never knew that it only changed the values in memory... that's
good to know! And I really should get the kill list stuff into
Thinking Sphinx proper.

Thanks for that info, muchly appreciated.

Cheers

--
Pat

Canvas

unread,
Nov 30, 2009, 3:36:08 PM11/30/09
to Thinking Sphinx
Hi Pat,

I just tried it. It works!

Please refer to the following url for further information.

http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-sql-query-killlist

Pat Allan

unread,
Dec 10, 2009, 2:11:45 AM12/10/09
to thinkin...@googlegroups.com
Quick question: how did you write a select that returns deleted values? :)

--
Pat

adamcooper

unread,
Dec 15, 2009, 2:48:02 PM12/15/09
to Thinking Sphinx
Hi Pat,

It could be integrated with a deleted_at column and then each delta
mechanism could track which deleted records haven't been merged to
keep the deleted killlist smaller and up to date. Apart from a
deleted_at column I have no ideas on how to detected deleted ids.

We are may going to have to disable the delta index merging until we
can sort out the deletion merge issue, although we never had index
merging before so it's not like we are losing any existing
functionality.

Cheers,
Adam

On Dec 9, 11:11 pm, Pat Allan <p...@freelancing-gods.com> wrote:
> Quick question: how did you write a select that returns deleted values? :)
>
> --
> Pat
>
> On 01/12/2009, at 7:36 AM, Canvas wrote:
>
> > Hi Pat,
>
> > I just tried it. It works!
>
> > Please refer to the following url for further information.
>
> >http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-sql-query-kil...
Message has been deleted
Message has been deleted

Canvas

unread,
Dec 16, 2009, 5:31:39 PM12/16/09
to Thinking Sphinx
Hi Pat,

I customized thinking-sphinx. Threshold is not in use at all in my
case.

I created a table sphinx_delta_index_start_points. The table holds one
and only one row. The db migrate is as following:

class CreateTableSphinxDeltaIndexStartPoints < ActiveRecord::Migration
def self.up
create_table :sphinx_delta_index_start_points do |t|
t.datetime :delta_index_start_at, :null => false
end
end

def self.down
drop_table :sphinx_delta_index_start_points
end
end

Every time a full index or merge index is executed, the start-time
will be updated in the only row in the table above. And the three key
item in configuration file for delta index will be as following. #
{@model.quoted_table_name} is used to represent whatever table name
your model represents in your application.

sql_query = SELECT ... FROM "#{@model.quoted_table_name}" WHERE #
{@model.quoted_table_name}.id >= $start AND #
{@model.quoted_table_name}.id <= $end AND #
{@model.quoted_table_name}.`updated_at` >= ( SELECT MIN
(delta_index_start_at) FROM sphinx_delta_index_start_points ) GROUP BY
#{@model.quoted_table_name}.id ORDER BY NULL

sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
FROM #{@model.quoted_table_name} WHERE updated_at >= ( SELECT MIN
(delta_index_start_at) FROM sphinx_delta_index_start_points ).

sql_query_killlist = "SELECT id FROM #{@model.quoted_table_name}
WHERE updated_at >= (SELECT MIN(delta_index_start_at) FROM
sphinx_delta_index_start_points)".

Hope this helps.

Best wishes,

Canvas

On Dec 9, 11:11 pm, Pat Allan <p...@freelancing-gods.com> wrote:
> Quick question: how did you write a select that returns deleted values? :)
>
> --
> Pat
>
> On 01/12/2009, at 7:36 AM, Canvas wrote:
>
> > Hi Pat,
>
> > I just tried it. It works!
>
> > Please refer to the following url for further information.
>
> >http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-sql-query-kil...
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Canvas

unread,
Dec 16, 2009, 8:29:24 PM12/16/09
to Thinking Sphinx
Hi Pat,

I just found out that sql_query_killlist works perfectly with sphinx
0.9.9 + thinking-sphinx 0.9.9, but it does not work at all with sphinx
0.9.9 + thinking-sphinx 1.3.12. This proves that sql_query_killlist
works as far as sphinx 0.9.9 is concerned, which means something might
be wrong with thinking sphinx 1.3.12. I couldn't figure it out yet.
Any help will be appreciated.

I also noticed that when doing full index, thinking-sphinx 1.3.12 is
much faster, but searching is much slower , than thinking-sphinx
0.9.9. Is it possible to improve the searching performance for
thinking-sphinx 1.3.12?

Thank you very much.

Best wishes,

Canvas

On Dec 16, 2:31 pm, Canvas <canvasw...@gmail.com> wrote:
> Hi Pat,
>
> I customized thinking-sphinx. Threshold is not in use at all in my
> case.
>
> I created a table sphinx_delta_index_start_points. The table holds one
> and only one row. The db migrate is as following:
>
> class CreateTableSphinxDeltaIndexStartPoints < ActiveRecord::Migration
>   def self.up
>     create_table    :sphinx_delta_index_start_points do |t|
>         t.datetime    :delta_index_start_at, :null => false
>     end
>   end
>
>   def self.down
>     drop_table :sphinx_delta_index_start_points
>   end
> end
>
> Every time a full index or merge index is executed, the start-time
> will be updated in the only row in the table above. And the three key
> item in configuration file for delta index will be as following. #

> {...@model.quoted_table_name} is used to represent whatever table name


> your model represents in your application.
>

> sql_query = SELECT ... FROM "#...@model.quoted_table_name}"  WHERE #
> {...@model.quoted_table_name}.id >= $start AND #
> {...@model.quoted_table_name}.id <= $end AND #
> {...@model.quoted_table_name}.`updated_at` >= ( SELECT MIN


> (delta_index_start_at) FROM sphinx_delta_index_start_points ) GROUP BY

> #...@model.quoted_table_name}.id  ORDER BY NULL


>
> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)

> FROM #...@model.quoted_table_name} WHERE updated_at >= ( SELECT MIN
> (delta_index_start_at) FROM sphinx_delta_index_start_points ).
>
> sql_query_killlist = "SELECT id FROM #...@model.quoted_table_name}

> ...
>
> read more »

Canvas

unread,
Dec 17, 2009, 2:48:03 PM12/17/09
to Thinking Sphinx
Hi Pat,

I still can not figure out why sql_query_killlist does not work in
thinking-sphinx 1.3.12. Any advice is appreciated. I am now using
rails 2.3.4, sphinx 0.9.9 final release, thinking-sphinx 1.3.12.

It's quite interesting that sql_quwery_killlist works quite well with
rails 2.0.2, sphinx 0.9.9 final release and thinking-sphinx 0.9.9
(Ed's fork).

I am stuck here now. Any advice will be appreciated.

Best wishes,

Canvas

> ...
>
> read more »

Message has been deleted

Canvas

unread,
Dec 17, 2009, 3:56:53 PM12/17/09
to Thinking Sphinx
Hi Pat,

I just tried rails 2.3.4 + thinking-sphinx 0.9.9 (Ed's fork) + sphinx
0.9.9 final release. Everything works fine. Indexing is fast,
searching is fast, and sql_query_killlist works.. It's really weird
that thinking-sphinx
1.3.12 is slow in searching and sql_query_killlist does not work in my
case.

Best wishes,

Canvas

> ...
>
> read more »

Pat Allan

unread,
Dec 25, 2009, 11:00:04 PM12/25/09
to thinkin...@googlegroups.com
Hi Canvas

Sorry for the delay in getting back to you...

When you're using 1.3.12 (and I recommend switching to 1.3.14 anyway), does the kill list appear in the config file? Is it that the setting is there and not having any effect, or that it's not getting into the conf file?

As for the speed changes, that's interesting to know... there's been a *lot* of changes since Ed's fork appeared, so I guess that means there's plenty of potential reasons why that's now the case. I'll try to investigate when I have some time.

--
Pat

> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.

Canvas

unread,
Jan 4, 2010, 1:29:38 PM1/4/10
to Thinking Sphinx
Hi Pat,

I just came back from a long vocation. Sorry for the long delay.

sqlquery_killlist does appear in development.sphinx.conf after I run
"rake thinking_sphinx:configure". So the problem I have now is that
the configuration is there but has no effect.

Best wishes,

Canvas

> ...
>
> read more »

Pat Allan

unread,
Jan 5, 2010, 3:47:28 AM1/5/10
to thinkin...@googlegroups.com
This is confusing - if the setting's in the configuration file, then it's up to Sphinx to use it - it's not managed by Thinking Sphinx at all. I'm not quite sure what the next step here is...

If you want to construct a demo rails app which can reproduce the issue, that'd be great, then I can have a look on my machine.

--
Pat

Canvas

unread,
Jan 5, 2010, 1:44:22 PM1/5/10
to Thinking Sphinx
Hi Pat,

I am confused too. It doesn't make much sense to me the way it is now.
I will construct a sample app in the weekend for you to reproduce the
issue.


Best wishes,

Canvas

> ...
>
> read more »

Canvas

unread,
Jan 8, 2010, 7:44:33 PM1/8/10
to Thinking Sphinx
Hi Pat,

I extracted out the plugin into a simple rails app, and I used text
command in script/console to to add and modify data by a model, built
index (full index or delta index) and then tested sphinx search in
script/console. It turned out that sql_query_killlist works fine with
thinking_sphinx 1.3.12, rails 2.3.5, sphinx 0.9.9. And it makes sense.

But for some reason my real app is still having the problem, something
must be wrong with my app. I'll let you know when I figure out the
real problem. And sorry for the incorrect information provided before.


Best wishes,

Canvas

On Jan 5, 10:44 am, Canvas <canvasw...@gmail.com> wrote:
> Hi Pat,
>

> > >>> searching is fast, andsql_query_killlistworks.. It's really weird
> > >>> that thinking-sphinx
> > >>> 1.3.12 is slow in searching andsql_query_killlistdoes not work in my


> > >>> case.
>
> > >>> Best wishes,
>
> > >>> Canvas
>
> > >>> On Dec 17, 11:48 am, Canvas <canvasw...@gmail.com> wrote:
> > >>>> Hi Pat,
>

> > >>>> I still can not figure out whysql_query_killlistdoes not work in


> > >>>> thinking-sphinx 1.3.12. Any advice is appreciated. I am now using
> > >>>> rails 2.3.4, sphinx 0.9.9 final release, thinking-sphinx 1.3.12.
>
> > >>>> It's quite interesting that sql_quwery_killlist works quite well with
> > >>>> rails 2.0.2, sphinx 0.9.9 final release and thinking-sphinx 0.9.9
> > >>>> (Ed's fork).
>
> > >>>> I am stuck here now. Any advice will be appreciated.
>
> > >>>> Best wishes,
>
> > >>>> Canvas
>
> > >>>> On Dec 16, 5:29 pm, Canvas <canvasw...@gmail.com> wrote:
>
> > >>>>> Hi Pat,
>

> > >>>>> I just found out thatsql_query_killlistworks perfectly with sphinx

> > >>>>>>sql_query_killlist= "SELECT id FROM #...@model.quoted_table_name}

> > >>>>>>>>>> rc2, which includes a new featuresql_query_killlistto deal with this

> ...
>
> read more »- Hide quoted text -

Reply all
Reply to author
Forward
0 new messages