> On Feb 23, 2021, at 12:02 AM, Pat Allan <
p...@freelancing-gods.com> wrote:
>
> Having the setting in the default block should be fine - you should be able to see the charset_table setting in the generated Sphinx configuration files.
>
> Also: I generally recommend just using ts:rebuild, as that handles both real-time indices and SQL-backed indices (i.e. it’s running the same things as ts:rt:rebuild) - if you’re finding ts:rebuild is not working well for you, I’m keen to hear why!
While I was fighting with this, and fiddling with the configuration to use has instead of indexes, I got myself into a state where ts:rebuild would blow up with a SQL error (I think it was a Sphinx SQL error) and ts:rt:rebuild would work fine. But with the current configuration that I shared with you, both work.
>
> All that said, doesn’t sound like you’re doing anything wrong. I wonder if html_strip is somehow filtering out the octothorps? Though I’m pretty sure it’s looking just for HTML tags… still, may be worth turning that off to double-check.
>
> And I’ve just run some quick tests locally - without the custom charset_table value, I find the string “#test” is found by Sphinx when searching by “#test” or “test” (because # is ignored, given it’s not an indexable character - so the two searches are actually identical). Adding in the charset_table setting, rebuilding - searching for #test returns a result, but test doesn’t (as that now doesn’t exist as a standalone word in what’s indexed).
>
> I doubt it matters, but: which version of Sphinx are you using?
Sphinx 2.2.11-id64-release (95ae9a6), TS 5.0.0.
It's definitely odd. I'm not sure if re-indexing is picking up the tag names when it runs en masse, and it seems to be something with GutenTag. If I find a document in console, the object that I get back has tag_names set to nil, but if I then call tag_names on that object, I get back the array of strings I am expecting. It's just the value that I see inside the <> brackets initially when to_s is called on the found object by irb, so I don't know if that's significant at all, or is getting in the way of Sphinx extracting the values. Again, when I test in console by calling my tags_for_indexing method on a found object, I get back the expected string value.
I've told the client that she may need to get rid of her beloved hashtags in the tagging interface, or use Gutentag in place of Sphinx to get "everything tagged with this tag". I'm not convinced that's a bad idea, either.
Walter
> To view this discussion on the web visit
https://groups.google.com/d/msgid/thinking-sphinx/09329FD3-9473-4361-B9DE-C4A1847C882D%40freelancing-gods.com.