Support more word delimiters in search

1 view
Skip to first unread message

Yotam Leibovici

unread,
Nov 4, 2025, 10:48:12 AMNov 4
to inception-users
Hi Richard,

It seems that the search bar currently retrieves only tokens that are separated by regular spaces or newlines.
However, in many languages or text formats, words can also be separated by other delimiters such as ZWNJ or hyphens (-).

At the moment, tokens that are separated this way cannot be properly retrieved by the search bar.
It would be helpful if the search supported recognizing and splitting tokens based on a broader set of word delimiters.

Thanks,
Yotam

Richard Eckart de Castilho

unread,
Nov 4, 2025, 11:47:22 AMNov 4
to incepti...@googlegroups.com
Hi Yotam,
The search sidebar uses the tokens defined in the document.

If the document you imported did not have a tokenization, INCEpTION will use a basic tokenization scheme to create tokens.

If you need more control over token boundaries, you can import pre-tokenized files.

Does that help?

Cheers,

-- Richard

Reply all
Reply to author
Forward
0 new messages