String attributes

192 views
Skip to first unread message

Clemens Kofler

unread,
Apr 17, 2011, 5:38:07 PM4/17/11
to Thinking Sphinx
I've been thinking about string attributes again lately, especially in
terms of facets, since my previous approach (see
http://groups.google.com/group/thinking-sphinx/browse_thread/thread/c8cc4fb1e38f7679/76f353007ff827ef?lnk=gst&q=string+attribute#76f353007ff827ef)
had speed issues. I came up with a way faster solution:
https://gist.github.com/924493. Now I'm wondering if it would make
sense to port that stuff back to Thinking Sphinx.

I'm thinking the following: When defining a string attribute, Thinking
Sphinx could internally keep 2 attributes (similar to what it does for
facets now) – the original string value as well as its str2ordinal
counterpart. For sorting, grouping and filtering one could use the
str2ordinal value and stuff like facet labels it could use the actual
string value. This would largely allow to get rid of the translation
part that does all the reverse lookups which is the main reason for
the speed decrease.

The main issue I see would be the support for 2 different flavors: pre
1.10-beta Sphinx installations would need the current implementation
whereas 1.10-beta and later could use the new implementation. In the
near future, there might even be a third version that could lose the
sorting/grouping/filtering column, once Sphinx is able to do that
natively on string attributes.

The question is: Would it make sense to implement that in Thinking
Sphinx, bearing in mind the additional complexity it brings (mapping
all grouping, sorting and filtering to a custom column internally –
 although that happens for facets anyway). If so, I'm happy to try
coming up with a clean implementation. Otherwise, I'll just adapt a
blog post that I have in my pipeline where I explain the whole issue
and my solution.

WDYT?

Pat Allan

unread,
May 3, 2011, 8:41:35 AM5/3/11
to thinkin...@googlegroups.com
Okay, here's a few thoughts on the matter - finally (sorry for the delay).

So, I definitely think this should all be part of TS proper - I'd love for Thinking Sphinx to opt for smarter string facets when possible (as it removes the need for the class_crc attribute, and tracking which models are being indexed, which has been a pain from the beginning).

And same for string facets - although we're still faced with the existing limitation there when having arrays of strings - MVAs are still integer-only.

Also, 1.10-beta has a useful option of sql_field_string - declaring both a field and attribute from a single column. This is a natural fit for :sortable => true.

And the newly released 2.0.1-beta improves things further with string attributes - they can be used for sorting and grouping - wish they'd hurry up with the filtering as well! (I've had a brief attempt at getting TS working with Sphinx 2.0.1-beta - Sphinx was having problems starting up via Ruby code, but was fine if I called it from the command line. Will investigate more soon).


The trick will be having it all degrade nicely for older versions of Sphinx. There's already a check or two in Riddle for this - Riddle.loaded_version. If you want to take a stab at it, that'd be wonderful - but otherwise, I'm guessing you'll be in Berlin for Euruko? Perhaps can work on it together then?

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
>

Clemens Kofler

unread,
May 3, 2011, 11:08:27 AM5/3/11
to thinkin...@googlegroups.com
Hey Pat,

thanks for the feedback!

On May 3, 2011, at 2:41 PM, Pat Allan wrote:

> Okay, here's a few thoughts on the matter - finally (sorry for the delay).
>
> So, I definitely think this should all be part of TS proper - I'd love for Thinking Sphinx to opt for smarter string facets when possible (as it removes the need for the class_crc attribute, and tracking which models are being indexed, which has been a pain from the beginning).

Yes, that would be awesome!

> And same for string facets - although we're still faced with the existing limitation there when having arrays of strings - MVAs are still integer-only.

I haven't even thought about the MVA issue ... Still, maybe it's possible to simplify the lookup code (right now it's quite expensive). Or maybe it makes sense to just not support this directly? Is it much used anyway? While taking a feature out always sucks, it might benefit the overall codebase.

> Also, 1.10-beta has a useful option of sql_field_string - declaring both a field and attribute from a single column. This is a natural fit for :sortable => true.

+1

> And the newly released 2.0.1-beta improves things further with string attributes - they can be used for sorting and grouping - wish they'd hurry up with the filtering as well! (I've had a brief attempt at getting TS working with Sphinx 2.0.1-beta - Sphinx was having problems starting up via Ruby code, but was fine if I called it from the command line. Will investigate more soon).

Gosh, that would be *so* great. However, in terms of facets, sorting and grouping would actually be enough. You could then – in contrast to my quick and dirty implementation for 1.10-beta – even sort by group rather than just count.

> The trick will be having it all degrade nicely for older versions of Sphinx. There's already a check or two in Riddle for this - Riddle.loaded_version. If you want to take a stab at it, that'd be wonderful - but otherwise, I'm guessing you'll be in Berlin for Euruko? Perhaps can work on it together then?

Sure, I'll be there for a couple of days. I'm assuming you'll stay a while, too? Sounds like a good fit for a pairing session at co_up or Euruko hallway ...

- C.

Reply all
Reply to author
Forward
0 new messages