--
You received this message because you are subscribed to the "KanjiVG" group.
For options and unsubscribing, visit this group at
http://groups.google.com/group/kanjivg
Matthew said:
I think that in general we're unlikely to find that "missing" characters
are variants of characters with separate code points. See, for
instance, this photo I took in Maibara last Summer:
http://ansuz.sooke.bc.ca/gallery/index.php?display=2011-jtrip-17%2FP1000971.jpg
Out of four characters, three fail to match Unicode's Japanese example
glyphs:
湯 - nonstandard simplification. On the sign the upper right component
looks like 口, but Unicode's examples for all languages have it
looking like 日.
谷 - standard form is on the sign
神 - Unicode's Japanese example glyph has the small stroke at upper left
vertical, but the sign has it diagonal, typical of Chinese style.
社 - Unicode's Japanese example uses the newer form of the left-side
radical, which looks like 礻, but the sign uses the form that
looks like 示, typical of Korean style.
I don't think any of these have separate code points for the other forms.
In general, if a glyph really is a "variant" form of a character, then
under Unicode's unification policy it won't have a separate code point;
Unicode would only assign one if they made a mistake (which sometimes
happens) or if some other Unicode policy (in particular, round-trip
compatibility) took precedence over that one.
For the separate code point to be numerically close is even less likely.
I don't think this is really a big problem. There's no particular reason
that KanjiVG needs a separate code point for every database entry.
Already we have multiple XML files for some code points; for instance,
there are three for U+4FDA 俚. In principle, there's no reason we
couldn't have a database entry for a kanji that had no code point at all.
We just need to have some other way of knowing what to name the file.
5F50 彐
6220 戠
(only found in 213)
342C 㐬
34C1 㓁
4EBB 亻
590D 复
5C03 尃
8002 耂
8279 艹
98E0 飠
9EC3 黃
9ED1 黑
200A4 𠂤
26951 𦥑
Since I had a list of most frequent Chinese characters, I checked that as well. From the 3000 most frequent simplified characters, 1006 are missing in kanjivg. If they are converted to traditional characters, only 242 are missing.
> By the way, the glyphs for all kanji in your list with unicode codepoints startin with F### were not correct
They look okay in my output but wrong in the gist.
I think the addition of radicals and strokes is most important. That also covers most of the missing references in the kanjivg attributes and would make kanjivg self-contained.
Jan