Associations indexing

12 views
Skip to first unread message

Aditya Naik

unread,
May 28, 2013, 6:46:54 PM5/28/13
to thinkin...@googlegroups.com
Hey team!

here is the scenario.. 

I have a Restaurant model and HABTM to Cuisine Model.. here are the console queries.. 

r = Restaurant.find 18405

r.cuisine_ids
 => [3, 9, 1, 4, 2, 23, 36, 24, 25, 30, 32, 38, 39, 37, 10, 8, 12] 

r.cuisines.pluck(:name).join(",")
 => "North West Frontier,Oriental,North Indian,Mughlai,Punjabi,South Indian,Bengali,Chettinad,Hyderabadi,Coastal,Konkani,Gujarati,Rajasthani,Maharashtrian,Thai,Malaysian,Vietnamese" 

Now if i search for the 8th cuisine.. i get results fine.. but if i search for 9th cuisine.. no results.. 

here is the query log
[Wed May 29 04:11:17.536 2013] 0.001 sec [ext2/1/ext 0 (0,10)] [restaurant_core] @cuisine (Hyderabadi) @city (Bangalore) @locality (St. John's Road) @sphinx_internal_class_name (Restaurant)
[Wed May 29 04:12:17.289 2013] 0.002 sec [ext2/1/ext 1 (0,10)] [restaurant_core] @cuisine (Chettinad) @city (Bangalore) @locality (St. John's Road) @sphinx_internal_class_name (Restaurant)

the 8th/9th is not consistent number.. usually depends on the characters in the cuisines before that.. i am assuming the indexes are getting cut off somewhere.. 

also If i just remove all the cuisines from the restaurant and just add Hyderabadi.. i can search for it.. so it looks like it has to do with the number os cuisines or the length of the index.. I doubt this has anything to do with pagination.. 

here is the Index definition

ThinkingSphinx::Index.define :restaurant, :with => :active_record do
  indexes :name, description, :sortable => true
  indexes cuisines.name, :as => :cuisine
end 

any ideas on how to debug and/or fix this?

Aditya Naik

unread,
May 28, 2013, 7:25:45 PM5/28/13
to thinkin...@googlegroups.com
Ok I found a problem..  i dont have an elegant solution yet. 

I didnt add the full index the index looks something like this

ThinkingSphinx::Index.define :restaurant, :with => :active_record do
  indexes :name, description, :sortable => true
  indexes locality.name, :as => :locality, :sortable => true
  indexes cuisines.name, :as => :cuisine
  indexes features.name, :as => :feature
  indexes restaurant_type.label, :as => :restaurant_type, :sortable => true
  indexes locality.city.name, :as => :city
  indexes taggings.tag.name, :as => :tag
  has rank
  has locality_id
  has :created_at
end

So there are 3 HABTM associations cuisine, feature and taggings. so the sql in the conf file looks something like this.. 

GROUP_CONCAT(cuisines.`name` SEPARATOR ' ') AS `cuisine`, 
GROUP_CONCAT(features.`name` SEPARATOR ' ') AS `feature`, 
GROUP_CONCAT(tags.`name` SEPARATOR ' ') AS `tag`, 

this creates a problem.. the cuisine is indexes repeatedly.. same with feature and tags because the joins will not be unique.. 

the query result for cuisine looks like this

Punjabi Malaysian Oriental Rajasthani Coastal Bengali Chettinad Malaysian Oriental Rajasthani Coastal Bengali Punjabi Malaysian Oriental Rajasthani Coastal Bengali Punjabi Thai Oriental Rajasthani Coastal Bengali Mughlai Thai Oriental Rajasthani Coastal Bengali Mughlai Oriental Gujarati Coastal Bengali Mughlai Thai Oriental Gujarati Coastal Bengali Mughlai Thai North West Frontier Gujarati Coastal Bengali Mughlai Thai North West Frontier Gujarati Coastal Bengali Mughlai Thai North West Frontier Gujarati Hyderabadi Bengali Mughlai Thai North West Frontier Gujarati Hyderabadi Bengali Mughlai Thai North West Frontier Gujarati Hyderabadi South Indian Vietnamese Mughlai Thai Gujarati Hyderabadi South Indian Vietnamese Mughlai Thai North West Frontier Gujarati Hyderabadi South Indian Vietnamese Mughlai Maharashtrian Thai North West Frontier Hyderabadi South Indian Vietnamese Mughlai Maharashtrian Thai North West Frontier Gujarati Hyderabadi South Indian Vietnamese North Indian Maharashtrian North West Frontier Guja

that means query is getting concatenated after 1024 characters.. but even with that the index seems to be getting cut off even before that.. i am guessing 255.. 

I went into the conf file and added DISTINCT in the GROUP_CONCAT and things work as expected at least for the timebeing.. 

fact still remains that after certain length the GROUP CONCAT string was getting chopped off.. 

thoughts?

Pat Allan

unread,
May 28, 2013, 7:35:16 PM5/28/13
to thinkin...@googlegroups.com
This is essentially a MySQL issue, but there are ways around it:
http://pat.github.io/thinking-sphinx/common_issues.html#mysql_large_fields

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
> To post to this group, send email to thinkin...@googlegroups.com.
> Visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



Aditya Naik

unread,
May 28, 2013, 7:40:17 PM5/28/13
to thinkin...@googlegroups.com
ok.. what about the fact that even with the 1024 character limit the sphinx is not reading a word that is within the 1024 chracters "Hyderabadi"

Also do you think adding multiple HABTM fields would create this repetition or am i doing something wrong?

Pat Allan

unread,
May 28, 2013, 9:23:56 PM5/28/13
to thinkin...@googlegroups.com
Each extra join is going to cause duplicate values - as you've found, with the DISTINCT helping the situation a bit, but given that's not what Thinking Sphinx generates, consistently adding in DISTINCT is going to be annoying, so I wouldn't recommend that.

I'd bump up the character limit to something well beyond 1024 - perhaps by a factor of 10 - and that should help. That said, if you've got many joins, then you'll need to factor that in, and maybe a larger number is necessary.

Also: the group_concat_max_len value is bytes, not string length (which matters if you're using UTF).

Aditya Naik

unread,
May 29, 2013, 12:07:53 AM5/29/13
to thinkin...@googlegroups.com

understood.. thanks!


--
Aditya Naik
You received this message because you are subscribed to a topic in the Google Groups "Thinking Sphinx" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thinking-sphinx/2aNdozhEyaU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to thinking-sphi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages