Delta indexing - seeing lost record *during* reindexing

63 views
Skip to first unread message

Eric

unread,
May 8, 2013, 1:56:22 PM5/8/13
to thinkin...@googlegroups.com
I currently have a small enough dataset that I can do a full reindexing every 2 hours.  However one of my clients is taking 9 minutes to do a full re-index.  They have been complaining about records that disappear and then reappear.

I was able to reproduce this, here's how:

1. Start up the rails server, and delayed job for delta indexing
2. Create a new record AAA
3. After delta indexing completes (which is very fast), AAA shows up in the sphinx index as expected
4. Kick off a full reindex
5. Watch the reindex output, and note when the model type of record AAA begins indexing.  Note that at this point, sphinx does an update to set the delta flag to 0 for all records in that table.
6. As soon as you see it start indexing AAA's model type, create a new record BBB of type AAA
7. When the delayed-job driven delta indexing of BBB completes, search for AAA in the sphinx index, and note that it is not there but BBB is there

Both AAA and BBB will show up when the full index job completes.  So if model type AAA takes a long time to full index, there is a period of time where the delta index returns incorrect results.

I'm wondering if anybody has encountered this and if there's a way around it, or simply something we have to live with?

Thanks,
Eric

Pat Allan

unread,
May 9, 2013, 3:15:10 AM5/9/13
to thinkin...@googlegroups.com
Hi Eric

What you've described makes sense logically, but yes, it's definitely not ideal. I've been thinking about the reindexing/rebuilding process a little lately (for Flying Sphinx, but the idea fits more broadly as well), where indexing happens in a separate location, and then the new files are brought across with .new. added before the file extension, and the live index set is rotated.

This would avoid the problem you're facing, but it's just a matter of managing the entire process (more moving parts = more complexity to manage = greater potential for bugs and odd edge-cases).

If you're particularly keen to see this change in place, can you log a ticket on GitHub, and then I've got something to keep it in my head :)

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
> To post to this group, send email to thinkin...@googlegroups.com.
> Visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>


Eric

unread,
May 9, 2013, 7:15:55 PM5/9/13
to thinkin...@googlegroups.com
Thanks a ton Pat for looking into this.


--> Eric

Pat Allan

unread,
May 9, 2013, 7:23:29 PM5/9/13
to thinkin...@googlegroups.com

Eric Hansen

unread,
May 9, 2013, 7:24:17 PM5/9/13
to thinkin...@googlegroups.com
Ack, sorry!

--> Eric
> You received this message because you are subscribed to a topic in the Google Groups "Thinking Sphinx" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/thinking-sphinx/TrQoZfP7WT4/unsubscribe?hl=en.
> To unsubscribe from this group and all its topics, send an email to thinking-sphi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages