Hello,
Currently text search support for Arabic is only possible on MongoDB Enterprise with dependency on 3rd party proprietary component that requires a separate license; Basis Technology Rosette Linguistics Platform (RLP) is used to perform normalization, word breaking, sentence breaking, and stemming or tokenization depending on the language.
I would like to champion the impelmentation of a free / open source Arabic search implementation for mongodb. The support would include normalization, stemming, word-breaking ...etc.
As such I would like to have the following basic guidance / hints on how can that be done for mongodb:
1. What are the possible implementation languages: c++, javascript?
2. What is the required interface / api / abi ?
3. Is there an available sample language codebase that I can use as a skeleton ? e.g. English.
4. How can I setup mongodb to use a custom language support extension so I can test it on ground before submitting.
That implementation can easily be further extended - by others - to supports other languages like Farsi (Iranian/Persian) and Urdu.
Thank you in advance for your help and guidance.
Regards,
- Kefah.