Associations, repeated values in index?

30 views
Skip to first unread message

Roger Kind Kristiansen

unread,
Jul 29, 2012, 11:06:26 AM7/29/12
to thinkin...@googlegroups.com
Hi again,

I just noticed that the SQL generated to my sphinx config returns a bit more than I expected. I'll not dig to deep into my data model or concrete indices first, as I assume this is yet another silly newbie mistake which is easy to pinpoint.

I've got one model which has a few associations (most are through a join table, but not all). I've set up some indices and some attributes making use of these associations. Now when I manually run the SQL generated by TS, I notice the same values from the associated models are repeated multiple times in each column. Is this expected behaviour, or is there some obvious thing I'm missing which TS doesn't do for me automatically, like some explicit grouping or unique constraints or something?

This hasn't caused me any trouble up until now, but now I'm trying to sort by counting the number of entries in one of the associated models and the numbers get all screwy. As far as I can understand it's related to the mentioned duplication.

Cheers,
Roger

Pat Allan

unread,
Jul 30, 2012, 4:52:28 PM7/30/12
to thinkin...@googlegroups.com
Hi Roger

Not sure exactly what you're referring to - can you provide the SQL statement and point out what you think could be more efficient?

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/C_gW49zbm6kJ.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.



Roger Kind Kristiansen

unread,
Jul 31, 2012, 3:50:17 PM7/31/12
to thinkin...@googlegroups.com
Sure!

I've narrowed down the problem slightly by eliminating all parts of my index which are unnecessary. I'll start with the parts of the datamodel I suppose are relevant:

Doctor
   has_many :doctors_practices, :inverse_of => :doctor
   has_many :practices, :through => :doctors_practices
   has_many :counties,   :through => :doctors_practices

* I've got an explicit doctors_practices relation to be able to access various fields on the relation.
* The doctors_practices table contain the county id for convenience, to minimize the number of joins necessary to get to the counties a doctor has practices in.

Perhaps this design is just stupid and I should just be stopped right here. But if not, here goes...

If I set up the index in Doctor like this:

    indexes [lastname, firstname], :as => :name, :sortable => true
    indexes practices.name,     :as => :practice_name,      :sortable => true

And rebuild the index, the SQL of my doctor_core_0 index is something like this:

CAST(COALESCE("doctors"."lastname"::varchar, '') as varchar) || ' ' || CAST(COALESCE("doctors"."firstname"::varchar, '') as varchar) AS "name",
array_to_string(array_agg(COALESCE("practices"."name", '0')), ' ') AS "practice_name",
FROM "doctors"
LEFT OUTER JOIN "doctors_practices" ON "doctors_practices"."doctor_id" = "doctors"."id"
LEFT OUTER JOIN "practices" ON "practices"."id" = "doctors_practices"."practice_id"
GROUP BY "doctors"."id",
         "doctors"."lastname",
         "doctors"."firstname",
         "doctors"."id",
         "doctors"."lastname",
         "doctors"."firstname"


Which works just fine. If a doctor is connected to one practice, a single practice name is returned. If connected to multiple practices, each practice name is displayed only once.

But once I include an index on another relation through my join table, for example county, so my index looks something like:

    indexes [lastname, firstname], :as => :name, :sortable => true
    indexes practices.name,     :as => :practice_name,      :sortable => true
    indexes counties.name,       :as => :county_name,       :sortable => true

Then the resulting joins in the doctor_core_0 index start playing tricks on me. SQL is as follows:

SELECT
CAST(COALESCE("doctors"."lastname"::varchar, '') as varchar) || ' ' || CAST(COALESCE("doctors"."firstname"::varchar, '') as varchar) AS "name",
array_to_string(array_agg(COALESCE("practices"."name", '0')), ' ') AS "practice_name"
array_to_string(array_agg(COALESCE("counties"."name", '0')), ' ') AS "county_name"
FROM "doctors"
LEFT OUTER JOIN "doctors_practices" ON "doctors_practices"."doctor_id" = "doctors"."id"
LEFT OUTER JOIN "practices" ON "practices"."id" = "doctors_practices"."practice_id"
LEFT OUTER JOIN "doctors_practices" "doctors_practices_doctors_join" ON "doctors_practices_doctors_join"."doctor_id" = "doctors"."id"
LEFT OUTER JOIN "counties" ON "counties"."id" = "doctors_practices_doctors_join"."county_id"
GROUP BY "doctors"."id",
         "doctors"."lastname",
         "doctors"."firstname",
         "doctors"."id",
         "doctors"."lastname",
         "doctors"."firstname"

Now, if doctors are connected to multiple practices, all practice names and county names are repeated twice. See?

Not being very proficient in neither thinking sphinx or rails I'm having trouble seeing how to fix this. Perhaps you see a sensible solution? Or perhaps I need to rethink my design.. or both. :-)

Cheers,
Roger



kl. 22:52:28 UTC+2 mandag 30. juli 2012 skrev Pat Allan følgende:
Hi Roger

Not sure exactly what you're referring to - can you provide the SQL statement and point out what you think could be more efficient?

Cheers

--
Pat

On 29/07/2012, at 5:06 PM, Roger Kind Kristiansen wrote:

> Hi again,
>
> I just noticed that the SQL generated to my sphinx config returns a bit more than I expected. I'll not dig to deep into my data model or concrete indices first, as I assume this is yet another silly newbie mistake which is easy to pinpoint.
>
> I've got one model which has a few associations (most are through a join table, but not all). I've set up some indices and some attributes making use of these associations. Now when I manually run the SQL generated by TS, I notice the same values from the associated models are repeated multiple times in each column. Is this expected behaviour, or is there some obvious thing I'm missing which TS doesn't do for me automatically, like some explicit grouping or unique constraints or something?
>
> This hasn't caused me any trouble up until now, but now I'm trying to sort by counting the number of entries in one of the associated models and the numbers get all screwy. As far as I can understand it's related to the mentioned duplication.
>
> Cheers,
> Roger
>
> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/C_gW49zbm6kJ.

> To post to this group, send email to thinking-sphinx@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphinx+unsubscribe@googlegroups.com.

Roger Kind Kristiansen

unread,
Aug 25, 2012, 3:10:20 AM8/25/12
to thinkin...@googlegroups.com
Sorry to drag this up again, but I still haven't managed to find a solution. Is my explanation still unclear, or is there no easy way around this?

Cheers,
Roger

Pat Allan

unread,
Aug 28, 2012, 2:22:48 PM8/28/12
to thinkin...@googlegroups.com
Sorry for not getting back to you sooner Roger. My inbox has been suffering some serious neglect over the last month. I blame the Edinburgh Fringe.

One thing that may be worth trying is being a little more explicit with your association references:

indexes doctors_practices.practices.name, :as => :practice_name
indexes doctors_practices.counties.name, :as => :counties

This should hopefully avoid the doubling up on joins. Also, I'd suggest there's not much point making these columns sortable, given they're aggregated values.

If that doesn't help, let me know - although the SQL joins are something managed by Rails, so there's limitations in how much they can be modified.

Cheers

--
Pat

> > To post to this group, send email to thinkin...@googlegroups.com.
> > To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.


> > For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
>
>
>
>
>
>
>
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/9_XhUmvM8NQJ.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.

Roger Kind Kristiansen

unread,
Sep 8, 2012, 8:11:30 AM9/8/12
to thinkin...@googlegroups.com
No worries about the delay. All work and no play is no good. :-) As usual your suggestion was right on the spot. Thank you!

Cheers,
Roger
Reply all
Reply to author
Forward
0 new messages