Another rebuild, this adds:
* Some more data points (ununusual kanji)
The actual numbers from the program look like this:
25677 OK, 1890 failed, total 27567.
Here OK is the number of kanji I have data for, 1890 is the number of Unicode plane 0 kanji I have no data for, and the total is the sum of these two numbers.
This increase in the number of kanji actually decreases the recognition accuracy slightly, since more false positives occur.
* Better server error messages
There have been a few problems with the server involving a "panic" due to some kind of input, and the typical problem with web programming occurred, which is trying to reconstruct the input which causes the error. There are about 100,000 inputs a day to
qhanzi.com, so just searching through them is an issue in itself, you can't even run command line "grep" on them because it doesn't accept that many arguments. Up to now I was just using the Go language http server handler, but what I've done here is to add more detail to the error message which happens when a panic occurs, and also mark the log file generated as being one which caused an error. I've never used this facility of the language before so I'm bracing myself for it to go wrong somehow when put into action.
* A slight improvement in matching
I've added a matching improvement for some kinds of input.
I wish all members of this group a happy new year.