String Tokenizer from AQL?

22 views

Skip to first unread message

da...@become.education

unread,

Oct 4, 2017, 6:32:01 PM10/4/17

to ArangoDB

Hello everyone. I'm quite new to ArangoDB and hope to avoid asking silly questions.

I'm building an AQL query that will take a multiple word phrase (user input) as a search term. This will be applied against a Title attribute on 300k+ documents.

For performance reasons I'm avoiding FILTER REGEX_TEST (4 seconds) and FILTER LIKE (2 seconds and case sensitive) and I struggle to get these to use indexes.

FOR FULLTEXT(..., "|prefix:Word1, |prefix:Word2, ...") generally runs in under 10ms and is case insensitive.

Questions:

1) Is it possible to access the String Tokenizer SYS_SPLIT_WORDS_ICU() from AQL? I'm currently using SPLIT on the user input and will need to filter out short words, etc.

2) Is it possible to order the FOR FULLTEXT results based on the number of search words found? What would the AQL SORT phrase look like?

3) A big improvement to 2) would be to rank results based on search token proximity, order, and count. Are there any statistics available from FULLTEXT that could be useful for this?

Regards,

David

Reply all

Reply to author

Forward

0 new messages