multiple group by

345 views
Skip to first unread message

J. Garcia

unread,
Feb 25, 2014, 12:47:53 PM2/25/14
to thinkin...@googlegroups.com
Hi,
According to Sphinx, it is possible to do a group_by on multiple fields since version 2.1.2: http://sphinxsearch.com/bugs/view.php?id=1453

I haven't found a way to do a search on multiple fields with Thinking Sphinx.
Is it possible to group by multiple fields on Thinking Sphinx?
I have tried by including several group_by but does not seem to work.
And passing an array of attributes does not seem to be accepted by Thinking Sphinx:

ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected $undefined, expecting IDENT (or 5 other tokens) near '`[:user_id,` :type] LIMIT 0, 20; SHOW META'
    from .../gems/thinking-sphinx-3.1.0/lib/thinking_sphinx/connection.rb:90:in `rescue in query'

J.

Pat Allan

unread,
Feb 25, 2014, 5:59:58 PM2/25/14
to thinkin...@googlegroups.com
Can you try passing in a string of comma-separated attributes, instead of an array of symbols?

— 
Pat

--
You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
To post to this group, send email to thinkin...@googlegroups.com.
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.

J. Garcia

unread,
Feb 26, 2014, 7:36:30 AM2/26/14
to thinkin...@googlegroups.com
Pat,
Thanks for your response.
Just passing any of the two fields in a string worked fine. When I tried to pass the two in a comma-separated string I got this stacktrace:


ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected $undefined, expecting IDENT (or 5 other tokens)  near '`user_id,` type LIMIT 0, 20; SHOW META'
    from .../gems/thinking-sphinx-3.1.0/lib/thinking_sphinx/connection.rb:90:in `rescue in query'

--
You received this message because you are subscribed to a topic in the Google Groups "Thinking Sphinx" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thinking-sphinx/bZpFqr1-iTU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to thinking-sphi...@googlegroups.com.

J. Garcia

unread,
Feb 26, 2014, 8:45:36 AM2/26/14
to thinkin...@googlegroups.com
I've been debugging the code and found the culprit:

class Riddle::Query::Select
...
  def to_sql
    sql = "SELECT #{ extended_values } FROM #{ @indices.join(', ') }"
    sql << " WHERE #{ combined_wheres }" if wheres?
    sql << " #{group_prefix} #{escape_column(@group_by)}" if !@group_by.nil?
    unless @order_within_group_by.nil?
      sql << " WITHIN GROUP ORDER BY #{escape_columns(@order_within_group_by)}"
    end
    sql << " HAVING #{@having.join(' AND ')}" unless @having.empty?
    sql << " ORDER BY #{escape_columns(@order_by)}" if !@order_by.nil?
    sql << " #{limit_clause}"   unless @limit.nil? && @offset.nil?
    sql << " #{options_clause}" unless @options.empty?

    sql
  end

It calls escape_column for @group_by, instead of escape_columns as for @order_within_group_by.


J. Garcia

unread,
Feb 26, 2014, 1:36:42 PM2/26/14
to thinkin...@googlegroups.com
Pat,
I do not know if it has further implications for older Sphinx versions, but if I patch the code, I still cannot get the group counts.

When issuing the raw SQL command for two group fields, Sphinx will return a specific column for each grouped field, plus a groupby() column with what seems like a CRC. I still cannot get this info via each_with_group_and_count, as it returns the groupby() column with some CRC value.

Pat Allan

unread,
Feb 26, 2014, 6:05:45 PM2/26/14
to thinkin...@googlegroups.com
Yeah, I don’t think you’re going to be able to use `each_with_group_and_count` when grouping by multiple columns - you’ll just have to use `each_with_count` or just `each`, and refer to those columns specifically.

Appreciate the suggestion for the patch to Riddle - if you want to submit a pull request (on the develop branch) that’d be brilliant, but otherwise I’ll sort it out soon.

J. Garcia

unread,
Feb 27, 2014, 5:15:54 AM2/27/14
to thinkin...@googlegroups.com
Trouble with each is I just get the rehydrated ActiveRecord objects, and no info on the grouping. With each_with_count, same plus the count. I would need the grouped field values as well, as returned from Sphinx raw query. I'm not sure how to get them or how to patch Riddle in order to obtain them. Any hint would be appreciated.

Pat Allan

unread,
Feb 27, 2014, 6:27:53 AM2/27/14
to thinkin...@googlegroups.com
Can you provide a full example of the index definition and the search call? I’m thinking what you’re after should be possible, just easier to be certain with a bit more context :)

J. Garcia

unread,
Feb 27, 2014, 7:32:18 AM2/27/14
to thinkin...@googlegroups.com
ThinkingSphinx::Index.define :list_item, :with => :active_record do
  has user_item.artifact_id
  has user_item.artifact.name
  has user_item.user_id
  has list.type
  has state
end

For a given artifact_id, I need to count per user_id and type, how many list_items are.

a = ListItem.search  :with => {:artifact_id => 85}, group_by: [user_id, type]

a.each_with_group_and_count do |i, g, c|
   # g could be an array for group_by multiple fields, or a single element for group_by one field
end


Pat Allan

unread,
Feb 27, 2014, 10:32:32 PM2/27/14
to thinkin...@googlegroups.com
Okay, let’s try the following:

  search = ListItem.search :with => {:artifact_id => 85}, :group_by => [:user_id, :type]; ‘'
  search.context[:panes] << ThinkingSphinx::Panes::AttributesPane
  search.each do |list_item|
    list_item.sphinx_attributes[‘count(*)’] # count
    list_item.sphinx_attributes[‘user_id’]
    list_item.sphinx_attributes[‘type’]
  end

The ; ‘’ at the end of the first line is only required if you’re running this in the console, as you don’t want the search to have inspect called (that will populate results, and you need to add the AttributesPane *before* populating).

— 
Pat

J. Garcia

unread,
Feb 28, 2014, 4:46:59 AM2/28/14
to thinkin...@googlegroups.com
Just this will do, to keep things simpler and similar to order_by option: ListItem.search :with => {:artifact_id => 85}, :group_by => "user_id, type"; ‘'

Works great!

I'll submit a patch for Riddle.
Thanks.


J. Garcia

unread,
Mar 2, 2014, 1:34:01 PM3/2/14
to thinkin...@googlegroups.com
Hey Pat,
I was playing a bit with group_by and order and I haven't been able to make it work smoothly, so that group_by results are ordered by count(*):

search = ListItem.search  :with => {:artifact_id => 85}, group_by: "user_id", :order => "count(*) desc"

ThinkingSphinx::SphinxError: index list_item_core: sort-by attribute 'count(' not found

    from .../gems/thinking-sphinx-3.1.0/lib/thinking_sphinx/connection.rb:90:in `rescue in query'
    from .../gems/thinking-sphinx-3.1.0/lib/thinking_sphinx/connection.rb:93:in `query'

The raw SQL query that  is submitted to Sphinx is:

"SELECT *, groupby(), count(*) FROM `list_item_core` WHERE `artifact_id` = 85 AND `sphinx_deleted` = 0 GROUP BY `user_id` ORDER BY count(*) desc LIMIT 0, 20"


If I launch this query under the sql console, I get this:
ERROR 1064 (42000): index list_item_core: sort-by attribute 'count(' not found

However, if I modify the raw query to something like this, it works well:
SELECT *, groupby(), count(*) as total FROM `list_item_core` WHERE `artifact_id` = 85 AND `sphinx_deleted` = 0 GROUP BY `user_id` ORDER BY total desc LIMIT 0, 20;

If I try the option order_group_by, I get the same error.

Am I doing something wrong?

Pat Allan

unread,
Mar 3, 2014, 8:15:37 AM3/3/14
to thinkin...@googlegroups.com
The following should work around this issue:

  ListItem.search :with => {:artifact_id => 85}, :group_by => ‘user_id’, :select => ‘*, count(*) as total’, :order => ‘total DESC’

It’s annoying, but I’m currently loath to alias count(*) by default when grouping by attributes, as the alias may end up being the name of an existing attribute.

J. Garcia

unread,
Mar 3, 2014, 12:39:42 PM3/3/14
to thinkin...@googlegroups.com
That worked fine, almost. What about using internally __count or similar as alias?

I explain the "almost":
At present I only have 4 ListItem indexed, and when grouped there should only be two of them.
But if I use

   groups = []
   search.each_with_group_and_count do |i,g,c|
      groups << [g,c]
   end

   I get:

[[1, 6], [2, 4], [nil, nil], [nil, nil]]

And I can see the query launched against mysql to recover 2 ListItems:
 ListItem Load (0.5ms)  SELECT `list_items`.* FROM `list_items` WHERE `list_items`.`id` IN (17, 1)

The more verbose alternative implementation works fine:
    search = ListItem.search with: {artifact_id: 47}, group_by: 'user_id', select: '*, count(*) as total', order: 'total DESC'

    search.context[:panes] << ThinkingSphinx::Panes::AttributesPane
    res = []
    search.each do |item|
      res << [item.sphinx_attributes['user_id'], item.sphinx_attributes['total']]
    end
    res

[[1, 6], [2, 4]]

Pat Allan

unread,
Mar 4, 2014, 9:28:52 AM3/4/14
to thinkin...@googlegroups.com
I’ve just pushed a patch to the develop branch to add aliases for both group and count: sphinx_internal_group and sphinx_internal_count. Such a smart and obvious idea - thanks for the suggestion :)

That doesn’t explain the nil records you’re seeing, though… just to confirm, what’s the search query you’re making to get those search results?

J. Garcia

unread,
Mar 17, 2014, 6:52:24 AM3/17/14
to thinkin...@googlegroups.com
Pat,
Got back to this.

I have been trying your update of alias sphinx_internal_count
I have been changing my indexing and query a bit.
Now, what I index is:


ThinkingSphinx::Index.define :list_item, :with => :active_record do
  has user_item.artifact_id
  has user_item.user_id
  has user_item.user.username

  has list.type
  has state
end

If I do:
  search = ListItem.search  :with => {:artifact_id => 47}, group_by: "user_id", order_by: 'sphinx_internal_count';
  groups = []
  search.each_with_group_and_count { |i,g,c| groups << [g, c] }
  groups

I get the result ordered by count as wished, with the group_by user_id:
  [[6, 14], [7, 9]]

If I do the same with username (a string) instead of user_id:
  search = ListItem.search  :with => {:artifact_id => 47}, group_by: "username", order_by: 'sphinx_internal_count';
  groups = []
  search.each_with_group_and_count { |i,g,c| groups << [g, c] }
  groups

I get the result ordered by count as wished, with a calculated integer field (a username crc?):
  [[-1448043994804220252, 14], [-1050173307187172618, 9]]

Then, in order to get the username, I need to use the panes, right?




Pat Allan

unread,
Mar 17, 2014, 8:01:54 AM3/17/14
to thinkin...@googlegroups.com
Yup, annoyingly Sphinx doesn’t return string attributes properly if they’re the group value, so you’ll need to use the pane to get the proper attribute value instead (or just list_item.user_item.user.username - though that’s not ideal from a database perspective).

For more options, visit https://groups.google.com/d/optout.

J. Garcia

unread,
Mar 17, 2014, 5:15:50 PM3/17/14
to thinkin...@googlegroups.com

Ok. Thanks for your support!

El dia 17/03/2014 13.02, "Pat Allan" <p...@freelancing-gods.com> va escriure:
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages