I can add the bulk of these characters quite easily to the
qhanzi.com search just by a simple change in one line of code. The problem is not just obtaining recognition data for these characters, but the following issues:
1. If I add these characters, the search space becomes enormous, resulting in much longer delays of results.
2. User results get swamped with characters which people may not want or expect. This has already happened as I expanded to cover more and more of Unicode plane 0, and odd things started to appear in search results. If I add in 40,000 more characters it will happen even more.
I am trying to guess whether people want these results or not from looking at the search results. I would say that once or twice in every few hundred inputs someone wants a plane 2 character. Sometimes they draw it again and again, to the extent that I start to feel a bit sorry for them. Sometimes it seems as if they are drawing a plane 2 character, but they might just be confused about a plane 0 character. It's difficult to decide.
Yeah, these will hit the false positive problem (2 above) for the low-stroke-count things. Suppose someone wants katakana ナ and they get 𠂇. Similar for 手 and 𠂌.
As I mentioned above, I actually have much of the data, and I'm deliberately not using it on
qhanzi.com at the moment. I'm thinking of making a new site with a new search over all of the Unicode characters. I might opt to have a "search" button like
shapecatcher.com rather than automatically search on mouse up.
My main work on the website at the moment is to try to integrate the shape match with the other match, and also get the kanji dictionary improved, so this is not the top priority, but I am thinking about it.
Thank you for an interesting discussion.