Has anyone configured Solr to search Chinese text? Using the Solr configuration files from
https://github.com/discoverygarden/basic-solr-config, our tests show that single character searches (e.g., 蘇) work; multiple character searches with no spaces between them (e.g., 蘇美) don't work; and searches with spaces between the characters (e.g., 蘇 美) works. These characters were copied from the OCR text that was indexed in Solr on ingest, and we performed our tests using the default simple search form.
If anyone has any suggestions for making the "phrase" searching work on Chinese text work, I'd love to hear them. The OCR transcripts contain mainly Traditional Chinese text with some English present as well, in much smaller quantities (the Chinese text is the full text of newspaper pages, the English text is the ads in the pages).