Matching "@UserName" and not "UserName"

33 views
Skip to first unread message

Neil

unread,
Jan 8, 2013, 4:09:25 AM1/8/13
to thinkin...@googlegroups.com
The plan is to use Thinking Sphinx to search for @Replies and @Mentions within a messages.content column, but at present Sphinx is also returning "UserName" matches alongside "@UserName":

@Replies (Only return Messages where messages.content begins with "@UserName"):
Message.search("^\\@#{user_name}")

@Mentions (Only return Messages where messages.content contains "@UserName" but does not being with "@UserName"):
Message.search("\\@#{user_name}", conditions: { content: "!^\\@#{user_name}" })

Does anyone know how to filter out the "UserName" matches and to only return "@UserName" in both cases?

Pat Allan

unread,
Jan 8, 2013, 5:42:55 AM1/8/13
to thinkin...@googlegroups.com
Hi Neil

It may be that you need to add the @ symbol to your charset_table, to ensure it gets indexed as a word character. I'm guessing that the default is it's ignored by Sphinx's indexer?

See here:
http://sphinxsearch.com/docs/manual-2.0.6.html#conf-charset-table

And two-thirds down this page:
http://pat.github.com/ts/en/advanced_config.html

Thinking Sphinx defaults to using the utf-8 charset_type (and thus, the default utf-8 charset_table values).

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/crUUOWFC_soJ.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.



Neil

unread,
Jan 9, 2013, 1:24:17 PM1/9/13
to thinkin...@googlegroups.com
Thanks Pat

I tried tinkering with the charset before posting but without much luck.

Is the TS charset_table setting supplementary to Sphinxs default charset rules? i.e. if the sphinx.yml looks as follows, will Sphinx merge the # and the @ symbol to its charset rules?

charset_table: "#, @"

The Sphinx config docs lists an ignore_chars method - it would be great if it had an include_chars method.

Pat Allan

unread,
Jan 9, 2013, 11:58:25 PM1/9/13
to thinkin...@googlegroups.com
Hi Neil

charset_table setting is not supplementary - so, you need to include all values. Also, it's worth noting that # is Sphinx's configuration comment character, so you'll need to put the Unicode code in the list for that instead (U+0023). Not sure if Sphinx prefers unicode for the @ as well - it's U+0040.
http://en.wikipedia.org/wiki/List_of_Unicode_characters

So, a full set could be this:
0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+023, U+040

Give that a spin, let us know how you go.

Cheers

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/tDjHHbgc0IEJ.

Neil

unread,
Jan 10, 2013, 7:08:10 AM1/10/13
to thinkin...@googlegroups.com
Thanks Pat. It seems to be working great on development:

charset_table: '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+023, U+040'

The production app (on Heroku) seems to be having some issues related to indexing and authentication at the moment, but hopefully it isn't related to adding the charset_table to sphinx.yml. 
Reply all
Reply to author
Forward
0 new messages