Identical tables, one returns search results the other does not

14 views
Skip to first unread message

stasch

unread,
Oct 8, 2013, 1:28:14 PM10/8/13
to thinkin...@googlegroups.com
I've got two tables in the same database that are identical in structure - columns, indexes, permissions (one is an archive of the other).  The model files are not identical but the associations and index declarations are.  The archive model contains a subset of the functionality of the original, but I have already gone so far as to reduce the model file for the archive to just the associations and index declarations.  When I rebuild the indexes after deleting all files in the index directory both models are  processed without complaint.  The index files are created with identical permissions.  But when I do a simple search (for the word 'worried') on each the original file returns many results, the archive file returns no results.  When I use the error method to check for errors on the result set it returns nil.  When I look in the query log it shows that the query was successfully executed and returned 0 results.  But when I do a sql query on the indexed field in the archive file it returns over 3000 results (select count(*) from contact_archive where message like '% worried %').

My environment is Ruby 1.8.7, Rails 2.3.15 and Thinking Sphinx 1.4.10.  Any help would be greatly appreciated.

Pat Allan

unread,
Oct 9, 2013, 11:39:56 PM10/9/13
to thinkin...@googlegroups.com
Can you show us the index definitions for both models, and the queries you're running?

On 09/10/2013, at 4:28 AM, stasch <stas...@gmail.com> wrote:

I've got two tables in the same database that are identical in structure - columns, indexes, permissions (one is an archive of the other).  The model files are not identical but the associations and index declarations are.  The archive model contains a subset of the functionality of the original, but I have already gone so far as to reduce the model file for the archive to just the associations and index declarations.  When I rebuild the indexes after deleting all files in the index directory both models are  processed without complaint.  The index files are created with identical permissions.  But when I do a simple search (for the word 'worried') on each the original file returns many results, the archive file returns no results.  When I use the error method to check for errors on the result set it returns nil.  When I look in the query log it shows that the query was successfully executed and returned 0 results.  But when I do a sql query on the indexed field in the archive file it returns over 3000 results (select count(*) from contact_archive where message like '% worried %').

My environment is Ruby 1.8.7, Rails 2.3.15 and Thinking Sphinx 1.4.10.  Any help would be greatly appreciated.

--
You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
To post to this group, send email to thinkin...@googlegroups.com.
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.

Stan Shore

unread,
Oct 11, 2013, 12:09:52 PM10/11/13
to thinkin...@googlegroups.com
It doesn't get any simpler.  The index definitions are identical:

  define_index do
    indexes :first_name, :sortable => true
    indexes :last_name, :sortable => true
    indexes :message

    has created_at
    has dist90038

    where "type in ('ContactShow','DoctorContact','ProcedureRequest','TellUs')"

  end

By the way, I tried removing the has statements and the where statement and reconfiguring and reindexing with the same results.  Even if I try the simplest search:

ContactInfo.search('worried')
ContactArchive.search('worried')

The first one works, the second one doesn't.

Seems crazy but there it is.

I'm wondering about versions.  What gem version for Thinking Sphinx and Riddle should I be using for Rails 2.3.15?

Regards,

Stan



--
You received this message because you are subscribed to a topic in the Google Groups "Thinking Sphinx" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thinking-sphinx/tVmRxNDkM3M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to thinking-sphi...@googlegroups.com.

Stan Shore

unread,
Oct 11, 2013, 12:41:01 PM10/11/13
to thinkin...@googlegroups.com
I wanted to give you a little more info.  I recreated this archived table on my test server (I was originally working on my development machine) and encountered the same results!  The Thinking Sphinx version was the same, the Sphinx version was different (2.0.9 on my dev machine, 0.9.9 on the test server).  I don't know that this info helps, but I just wanted you to know.

Regards,

Stan

Pat Allan

unread,
Oct 11, 2013, 7:57:43 PM10/11/13
to thinkin...@googlegroups.com
Hi Stan

Is there any chance you could create a basic test app that reproduces the issue? I can't see any reason why the behaviour would be different, and it's a bit hard to debug any further without access to either your app or a test app that has the same problem.

With regards to gem versions, you can't use anything that's 2.x or 3.x - so, the latest that's friendly for Rails 2.3 is 1.5.0.

-- 
Pat

Stan Shore

unread,
Oct 14, 2013, 11:24:56 AM10/14/13
to thinkin...@googlegroups.com
Are there any limitations on the version of Riddle for 2.3?  Any of the other gem dependencies?

Pat Allan

unread,
Oct 14, 2013, 7:38:29 PM10/14/13
to thinkin...@googlegroups.com
I don't think so - but 1.4.10 was released a *long* time ago, so it's hard to be certain. I'm pretty sure the latest versions of dependencies will work with 1.5.0 though.

Stan Shore

unread,
Oct 15, 2013, 2:21:45 PM10/15/13
to thinkin...@googlegroups.com
I set up a test table with one row in it to try and simplify things (as well as eliminate the effects of pt-archiver, which was use to archive the records).  After indexing the table I ran Sphinx's indextool on the word worried and got the following:

indextool --config ~/Applications/doctvadm/config/development.sphinx.conf --dumphitlist contact_test_core worried
Sphinx 2.0.9-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/stan/Applications/doctvadm/config/development.sphinx.conf'...
dumping hitlist for index 'contact_test_core' keyword 'worried'...
keyword=worried, tok=worried, wordid=443981280
doc=12, hit=0x02000014

One hit -- exactly correct, so the index was built properly (using rake ts:index).

When I tried to search for worried, however:

search --config ~/Applications/doctvadm/config/development.sphinx.conf --index contact_test_core worried
Sphinx 2.0.9-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/stan/Applications/doctvadm/config/development.sphinx.conf'...
index 'contact_test_core': search error: .

I got a search error.  The error message ('.') was kinda useless.  But I'm not sure that this search was meaningful because if I run the same search on the table that is working in the console I still get the error:

search --config ~/Applications/doctvadm/config/development.sphinx.conf --index contact_info_core worried
Sphinx 2.0.9-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/stan/Applications/doctvadm/config/development.sphinx.conf'...
index 'contact_info_core': search error: .

In fact I get the same error if I don't specify any index at all (forcing a search on all indexes):

search --config ~/Applications/doctvadm/config/development.sphinx.conf worriedSphinx 2.0.9-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/stan/Applications/doctvadm/config/development.sphinx.conf'...
index 'ce_forum_core': search error: .

Which leads to two questions:

- Am I constructing the sphinx search command incorrectly?
- Is the version of sphinx that I'm using (2.0.9) incompatible with Thinking Sphinx 1.5.0?

I'm sorry that I cannot set up a stripped-down example for you but there are client-related restrictions that would make that extremely difficult.  I am absolutely mystified by these problems.  It seems that since I upgraded Sphinx and Thinking Sphinx only tables whose indexes were defined before the upgrades can be re-indexed.  I know that make no sense as I have been deleted the configuration file and recreating it from scratch but it also make no sense that for three table with identical setups (contact_infos, contact_archives and contact_test) only the original can be searched (although clearly all 3 are getting indexed).

Thanks again for all your help.


Stan Shore

unread,
Oct 15, 2013, 2:38:05 PM10/15/13
to thinkin...@googlegroups.com
I've got more for you.  I looked up that weird error and found out that it is a known bug in the search command.  One person said they got around it by adding the --noinfo option.  Sure enough when I ran it that way I got results (I've switched back to the contact_archives table which was the one that I originally contacted you about):

search --noinfo --config ~/Applications/doctvadm/config/development.sphinx.conf --index contact_archive_core worried

Sphinx 2.0.9-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/stan/Applications/doctvadm/config/development.sphinx.conf'...
index 'contact_archive_core': query 'worried ': returned 1000 matches of 4284 total in 0.001 sec

displaying matches:
1. document=649145, weight=2590, sphinx_internal_id=72127, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Annonymous, last_name_sort=Worried, created_at=Mon Nov 24 15:49:53 2008, dist90038=999999.000000
2. document=2610353, weight=2590, sphinx_internal_id=290039, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=really, last_name_sort=worried, created_at=Sun May  3 15:15:53 2009, dist90038=999999.000000
3. document=624035, weight=1610, sphinx_internal_id=69337, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Louise Evans, last_name_sort=Evans, created_at=Fri Nov 21 14:45:05 2008, dist90038=999999.000000
4. document=899012, weight=1610, sphinx_internal_id=99890, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Rachelle , last_name_sort=Petitclerc, created_at=Mon Dec 15 19:25:37 2008, dist90038=999999.000000
5. document=1166717, weight=1610, sphinx_internal_id=129635, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=tammy, last_name_sort=lessard, created_at=Sat Jan 10 15:40:16 2009, dist90038=999999.000000
6. document=3604520, weight=1610, sphinx_internal_id=400502, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=sharon, last_name_sort=lopez, created_at=Sat Aug 29 00:37:52 2009, dist90038=999999.000000
7. document=216290, weight=1603, sphinx_internal_id=24032, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Amanda, last_name_sort=Simmonds, created_at=Fri Oct 10 18:00:24 2008, dist90038=999999.000000
8. document=217055, weight=1603, sphinx_internal_id=24117, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Anna-Maria, last_name_sort=Martin, created_at=Fri Oct 10 19:35:08 2008, dist90038=999999.000000
9. document=352919, weight=1603, sphinx_internal_id=39213, sphinx_deleted=0, class_crc=2362243532, sphinx_internal_class=ProcedureRequest, first_name_sort=Melanie, last_name_sort=Roy, created_at=Wed Oct 29 19:48:12 2008, dist90038=999999.000000
10. document=611759, weight=1603, sphinx_internal_id=67973, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Jackie, last_name_sort=Vella, created_at=Thu Nov 20 16:26:23 2008, dist90038=999999.000000
11. document=867350, weight=1603, sphinx_internal_id=96372, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=natashia, last_name_sort=mews, created_at=Thu Dec 11 13:44:31 2008, dist90038=999999.000000
12. document=1090136, weight=1603, sphinx_internal_id=121126, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Darren, last_name_sort=Hamilton, created_at=Mon Jan  5 18:37:01 2009, dist90038=999999.000000
13. document=1092827, weight=1603, sphinx_internal_id=121425, sphinx_deleted=0, class_crc=1829651295, sphinx_internal_class=DoctorContact, first_name_sort=Daphne, last_name_sort=Sheehan, created_at=Mon Jan  5 19:55:36 2009, dist90038=999999.000000
14. document=1285616, weight=1603, sphinx_internal_id=142846, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Shannon, last_name_sort=Berger, created_at=Thu Jan 22 10:27:17 2009, dist90038=999999.000000
15. document=1382537, weight=1603, sphinx_internal_id=153615, sphinx_deleted=0, class_crc=1829651295, sphinx_internal_class=DoctorContact, first_name_sort=Ginny, last_name_sort=Naviello, created_at=Wed Jan 28 21:43:01 2009, dist90038=999999.000000
16. document=1705232, weight=1603, sphinx_internal_id=189470, sphinx_deleted=0, class_crc=1829651295, sphinx_internal_class=DoctorContact, first_name_sort=Amy, last_name_sort=Edwards, created_at=Mon Feb 23 06:41:22 2009, dist90038=999999.000000
17. document=1838666, weight=1603, sphinx_internal_id=204296, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Amanda, last_name_sort=Abraham, created_at=Thu Mar  5 23:38:42 2009, dist90038=999999.000000
18. document=2045162, weight=1603, sphinx_internal_id=227240, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Shanen, last_name_sort=Phillips, created_at=Tue Mar 24 16:48:04 2009, dist90038=999999.000000
19. document=2542052, weight=1603, sphinx_internal_id=282450, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Ramona, last_name_sort=Dominguez, created_at=Tue Apr 28 15:16:32 2009, dist90038=999999.000000
20. document=2565101, weight=1603, sphinx_internal_id=285011, sphinx_deleted=0, class_crc=1240653847, sphinx_internal_class=TellUs, first_name_sort=Camilla, last_name_sort=Salomonsson, created_at=Wed Apr 29 17:40:17 2009, dist90038=999999.000000

words:
1. 'worried': 4284 documents, 4668 hits

As you can see, Sphinx is successfully finding the search term that Thinking Sphinx is not (after starting the demon and loading the console):

>> ContactArchive.search 'worried'
=> []

And, again, a mysql query returns similar results:

mysql> select count(*) from contact_archives where message like '%worried%';
+----------+
| count(*) |
+----------+
|     4287 |
+----------+
1 row in set (0.18 sec)

Thoughts?

Pat Allan

unread,
Oct 16, 2013, 8:36:35 AM10/16/13
to thinkin...@googlegroups.com
Okay, so this is odd: class_crc for each of those records is 1240653847 - and yet, 'ContactArchive'.to_crc32 is 3196374234.

… do you have a type column in this ContactArchive model?

Stan Shore

unread,
Oct 16, 2013, 11:15:26 AM10/16/13
to thinkin...@googlegroups.com
Yes, the original model was used for single table inheritance so it required a column called 'type'.  You can see it in the index definition:


define_index do
    indexes :first_name, :sortable => true
    indexes :last_name, :sortable => true
    indexes :message

    has created_at
    has dist90038
   
    where "type in ('ContactShow','DoctorContact','ProcedureRequest','TellUs')"

end


Pat Allan

unread,
Oct 16, 2013, 12:54:58 PM10/16/13
to thinkin...@googlegroups.com
Ah, this will be why. Thinking Sphinx tracks the type of each result, and uses the type column to determine this. It'll pay attention to known subclasses, and it would make sense that all of those classes in your where clause are subclassed from Contact, not ContactArchive.

To be honest, I'm kinda surprised the tool you're using for archiving is okay with this… and I'm not sure what the best approach would be to get around it. Rails presumes that if the type column is there, it's for STI.

Stan Shore

unread,
Oct 16, 2013, 3:12:32 PM10/16/13
to thinkin...@googlegroups.com
I think I understand what you're saying.  Thinking Sphinx is looking for the subclasses in the type column which are descended from ContactInfo not ContactArchive.  A couple of strategies occur to me, but I'll have to discuss them with my client.  I'm pretty sure we can live without the type column in the archive but will need to think it through.

Thanks again for all your help.

Pat Allan

unread,
Oct 16, 2013, 9:12:41 PM10/16/13
to thinkin...@googlegroups.com
I guess you could rename the type column? That way the information's still there, just not getting in the way of Rails' and Thinking Sphinx's expectations.

Stan Shore

unread,
Oct 16, 2013, 9:29:26 PM10/16/13
to thinkin...@googlegroups.com
Already did it and the search now works.  Unfortunately pt-archiver, which is a database level utility written in Python, cannot map to a different column name as part of the archiving process.  I'm going to have use ActiveRecord to write an archiving utility so I have control over column names.  Not a huge deal but I was hoping to avoid writing that code.  Oh well.  Thanks again for all your help.
Reply all
Reply to author
Forward
0 new messages