Index attributes creating redundant joins

84 views
Skip to first unread message

John Barker

unread,
Dec 28, 2012, 9:15:27 PM12/28/12
to thinkin...@googlegroups.com
I have an index in my User model that is creating redundant joins for attributes I have defined. My model looks as follows:

class User < ActiveRecord::Base
    has_many :relationships
    has_many :groups, through: :relationships, conditions: Proc.new { ['relationships.type = ?', 'Group'] }
    has_many :organizations, through: :relationships, conditions: Proc.new { ['relationships.type = ?', 'Organization'] }
    has_many :roles, through: :relationships
 
    
    define_index do
        # fields
        indexes first_name, sortable: true
        indexes last_name, sortable: true
        indexes email, sortable: true
        indexes relationships.group(:name), as: :groups, sortable: true
        indexes relationships.role(:name), as: :roles

        set_property field_weights: {
           first_name: 15,
           last_name: 15,
           email: 7,
           groups: 10
        }

        has relationships(:type), as: :relationship_type, crc: true
        has groups(:id), as: :group_id
        has organizations(:id), as: :organization_id
        has roles(:name), as: :role_name, crc: true
        has created_at, updated_at 
     end  

This produces the following query in the sphinx configuration:

SELECT 
    SQL_NO_CACHE `users`.`id` * CAST(5 AS SIGNED) + 4 AS `id` , 
    `users`.`first_name` AS `first_name` 
    `users`.`last_name` AS `last_name`, 
    `users`.`email` AS `email`, 
    GROUP_CONCAT(DISTINCT IFNULL(`groups`.`name`, '0') SEPARATOR ' ') AS `groups`, 
    GROUP_CONCAT(DISTINCT IFNULL(`roles`.`name`, '0') SEPARATOR ' ') AS `roles`, 
    `users`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, 
    765557111 AS `class_crc`, IFNULL('User', '') AS `sphinx_internal_class`, 
    IFNULL(`users`.`first_name`, '') AS `first_name_sort`, 
    IFNULL(`users`.`last_name`, '') AS `last_name_sort`, 
    IFNULL(`users`.`email`, '') AS `email_sort`, 
    GROUP_CONCAT(DISTINCT IFNULL(IFNULL(`groups`.`name`, ''), '0') SEPARATOR ' ') AS `groups_sort`, 
    GROUP_CONCAT(DISTINCT IFNULL(CRC32(`relationships`.`type`), '0') SEPARATOR ',') AS `relationship_type`, 
    GROUP_CONCAT(DISTINCT IFNULL(`groups_users`.`id`, '0') SEPARATOR ',') AS `group_id`, 
    GROUP_CONCAT(DISTINCT IFNULL(`organizations`.`id`, '0') SEPARATOR ',') AS `organization_id`, 
    GROUP_CONCAT(DISTINCT IFNULL(CRC32(`roles_users`.`name`), '0') SEPARATOR ',') AS `role_name`, 
    UNIX_TIMESTAMP(`users`.`created_at`) AS `created_at`, 
    UNIX_TIMESTAMP(`users`.`updated_at`) AS `updated_at` 
FROM `users` 
    LEFT OUTER JOIN `relationships` ON `relationships`.`user_id` = `users`.`id` 
    LEFT OUTER JOIN `groups` ON `groups`.`id` = `relationships`.`relationship_id` 
    LEFT OUTER JOIN `roles` ON `roles`.`id` = `relationships`.`role_id` 
    LEFT OUTER JOIN `relationships` `relationships_users_join` ON `relationships_users_join`.`user_id` = `users`.`id` 
    LEFT OUTER JOIN `groups` `groups_users` ON `groups_users`.`id` = `relationships_users_join`.`relationship_id` AND relationships.type = 'Group' 
    LEFT OUTER JOIN `relationships` `relationships_users_join_2` ON `relationships_users_join_2`.`user_id` = `users`.`id` 
    LEFT OUTER JOIN `organizations` ON `organizations`.`id` = `relationships_users_join_2`.`relationship_id` AND relationships.type = 'Organization' 
    LEFT OUTER JOIN `relationships` `relationships_users_join_3` ON `relationships_users_join_3`.`user_id` = `users`.`id` 
    LEFT OUTER JOIN `roles` `roles_users` ON `roles_users`.`id` = `relationships_users_join_3`.`role_id` 
WHERE (`users`.`id` >= $start AND `users`.`id` <= $end) 
GROUP BY `users`.`id` ORDER BY NULL 

Everything is great up until the joins; it creates three extra unnecessary joins (relationships_users_join_n) for the Group, Organization, and Roles attributes. I have tried changing the attribute syntax from groups(:id) to relationships.group(:id) but it fails to acknowledge the conditions that is applied to the join that is defined by the has_many in the model. Other than failing to acknowledge the condition, it solves my redundant join issue.

Any help is greatly appreciated!

Pat Allan

unread,
Dec 28, 2012, 11:51:16 PM12/28/12
to thinkin...@googlegroups.com
Hi John

I would avoid using has_many :through shortcuts in your index definition - so, you're doing the right thing with fields, but not with attributes. Thinking Sphinx uses ActiveRecord to generate the SQL, and it appears ActiveRecord isn't as smart as it could be with joins for those associations.

Of course, then you have the catch that you're applying conditions to the has_many :through versions of the associations… I would probably define those as additional associations on Relationship instead if necessary?

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/JvcwXKlb4MgJ.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.



John Barker

unread,
Dec 29, 2012, 12:06:14 AM12/29/12
to thinkin...@googlegroups.com
Hi Pat,

Thanks for such a prompt response! So moving the conditions from the User model to the Relationship model and changing the attributes to relationships.model(:column) seems to work exactly as I need it to. I tried a similar thing earlier, but must have overlooked something.

Thanks for such a great library (and a great service, Flying Sphinx)!

Cheers

Pat Allan

unread,
Dec 29, 2012, 12:11:07 AM12/29/12
to thinkin...@googlegroups.com
No worries - it helps that I had someone else dealing with the same issue on Flying Sphinx recently :)

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/IJE3e6jUKT8J.

John Barker

unread,
Jan 1, 2013, 7:34:32 PM1/1/13
to thinkin...@googlegroups.com
So I thought this problem was fixed, after moving the needed conditions to the relationship table, but come to find out, I can't have the conditions there; they have to be in my other models. Consider me stumped.

Pat Allan

unread,
Jan 1, 2013, 10:30:49 PM1/1/13
to thinkin...@googlegroups.com
Why do the conditions need to be in the other models? Can you run me through the associations in all models related to this issue?

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/c1sORwQ8_ssJ.

John Barker

unread,
Jan 2, 2013, 10:43:19 AM1/2/13
to thinkin...@googlegroups.com
Hi Pat,

So, to begin, I have an application that is membership-centric; users can have 'memberships' to both 'groups' and 'organizations', which is handled through a 'relationships' table/model to handle all the associations/relationships.

The relationships table handles all the relationships for the user for both organization and group, and simply distinguishes between which by the 'type' column. For example, here's my current 'Relationship' model (you'll see I have the conditioned commented out):

class Relationship < ActiveRecord::Base
  belongs_to :role
  belongs_to :user
  belongs_to :organization, foreign_key: 'relationship_id'#, conditions: { relationships: { type: 'Organization', active: true } }
  belongs_to :group, foreign_key: 'relationship_id'#, conditions: { relationships: { type: 'Group', active: true } }

  has_many :lists

  attr_accessible :user_id, :role_id, :relationship_id, :role, :user, :created_at, :updated_at, :type, :begin_date, :end_date, :active

  validates_uniqueness_of :relationship_id, scope: [ :type, :role_id, :user_id ]

  scope :page_with_cached_total_count, lambda { |page_number, total_count_value |
    page(page_number).extending {
      define_method(:total_count) { total_count_value }
    }
  }

  define_index do
    indexes user.first_name, as: :first_name, sortable: true
    indexes user.middle_name, as: :middle_name, sortable: true
    indexes user.last_name, as: :last_name, sortable: true
    indexes user.email, as: :email, sortable: true
    indexes group(:name), as: :group, sortable: true


    set_property field_weights: {
      first_name: 15,
      last_name: 15,
      middle_name: 2,
      email: 7,
      groups: 10
    }

    has organization(:id), as: :organization_id
    has created_at, updated_at
    has role(:name), as: :role_name, crc: true
    has :type, crc: true
  end

  # Override ActiveRecord's inheritance column method since we use a `type` column
  def self.inheritance_column
    nil
  end
end

I need those conditions on the User, Organization and other models so that they will respect the conditions when I call something like Organization.find(1).members, which would do a join on the relationships table and search by type 'organization' and the and ensure that the relationship is in fact active (active=1).

For example, my Organization model's relationship definition looks like this:

has_many :relationships, foreign_key: 'relationship_id', conditions: { relationships: { type: 'Organization', active: true } }
has_many :members, through: :relationships, source: :user

If I remove the conditions from the Organization model, and simply leave it on the Relationships model, then it doesn't respect the condition when I call something like Organization.find(1).members. There are some other errors, weirdness I experience as well, but that is the major reason.

If you have any help, or insight as to how I could better structure this and retain the needed index performance with TS, I would be forever grateful! I hope I made this clear enough.

Thank you.

Pat Allan

unread,
Jan 3, 2013, 3:50:19 AM1/3/13
to thinkin...@googlegroups.com
Hi John

Perhaps there's details I've forgotten (also: I'm a little short on sleep), but you can keep those conditions you've commented out. It's just that you can't use the has_many :through associations reliably in Thinking Sphinx index definitions - and I had noticed that you had conditions on those that perhaps weren't on the underlying associations.

If you've got conditions on non-:through associations and you want to use those associations with Thinking Sphinx, that should be fine.

Cheers

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/BgBwJl6miUwJ.

John Barker

unread,
Jan 3, 2013, 7:34:35 PM1/3/13
to thinkin...@googlegroups.com
Hi Pat,

So bottom line is, I can't have the conditions on the relationship model, they have to remain on the user/organization model as it breaks the creation of relationships. Is there a way to explicitly define the join in the index/attribute definition?

Pat Allan

unread,
Jan 5, 2013, 12:02:01 AM1/5/13
to thinkin...@googlegroups.com
Hi John

There wasn't, but there is now in the edge branch (3.x releases).

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/VhUvzzEzfwEJ.

John Barker

unread,
Jan 5, 2013, 5:29:45 PM1/5/13
to thinkin...@googlegroups.com
Great! Is TS 3 compatible with Flying Sphinx?

Pat Allan

unread,
Jan 5, 2013, 8:16:10 PM1/5/13
to thinkin...@googlegroups.com
Yes, the ts3 branch of github.com/flying-sphinx/flying-sphinx supports TS 3 - I've been using it for some of my own projects for the last couple of months and it's been fine.

But if you are upgrading, make sure you read these two pages:
https://groups.google.com/d/msg/thinking-sphinx/QM0BlS3gg3k/s61pfCBBTUoJ
https://github.com/pat/thinking-sphinx/blob/edge/README.textile

Cheers

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/cTUlW78tBYcJ.

Reply all
Reply to author
Forward
0 new messages