Hi Kenney,
I must admit that we currently don't have documentation for how to enable Chinese full text indexing in DSpace.
However, if you are storing primarily Chinese full text documents in your DSpace, I don't think it would be too difficult to change the current Solr indexing settings to support that.
What I think you'd want to do in DSpace is to add a new fieldType called "text_mandarin" (or similar) to the 'search' schema:
<fieldType name="text_mandarin" class="solr.TextField">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Then, if you want the "fulltext" field (which stores the fulltext of documents) to always do indexing/parsing of Chinese, you'd change its type to be "text_mandarin" (instead of just "text") here:
Then you'd have to reindex everything in Solr (./dspace index-discovery -b).
I think this would work, but I'll admit I've never tried it. So, it's always possible I'm overlooking a step to get this working.
Keep in mind, this would only change the behavior of full text indexing/searching... and it would change that behavior globally (so all documents in DSpace would be assumed to contain Chinese text). Unfortunately, at this time, DSpace doesn't have any smart
way to detect the language of documents and index each language differently.
If this sounds like what you need & you find it works for you, please let us know. That way we can more formally document similar instructions for others who may need them.
Tim