Hi Michael,
You can write your own tokenizer, for example and pass it to the indexing method (for the index) or to the tokenizer option (for a single category):
Simple test for it is here:
If you don't want a whole new tokenizer like in the test, just get a new tokenizer
class << tokenizer
def preprocess array
array # Does essentially nothing anymore. This will also jump over character substitution, illegal character removal and stopword removal.
end
def split array
array # Already split.
end
end
Then, pass this into the index:
index = Picky::Index.new ... do
indexing tokenizer # Use your custom tokenizer that handles arrays.
category :text, tokenizer: tokenizer # Use your custom tokenizer just on a single category.
end
Or rewrite the preprocess step to be able to handle the Array:
def preprocess array
array.collect! { |element| remove_illegals substitute_characters(element); remove_non_single_stopwords element }
end
(Please note that I wrote this by heart)
Does this help?
Cheers,
Florian