Stroke markup reusage

Jan Eichhorn

unread,

Jul 5, 2012, 2:30:53 PM7/5/12

to kan...@googlegroups.com

Looking at the stroke markup I noticed there is a relatively low degree of reusing existing strokes and groups.

Here are some statistics about the stroke markup:

Total strokes = 142211

Unique strokes (identical svg markup) = 101131

If every stroke is translated to the origin (remove its start point): 95781 unique strokes.

Next I tried to identify strokes that are almost identical. Two strokes A,B are identical, if

A fits inside B's outline when B is drawn with a stroke width of 3 + n points, and vice versa:

B fits inside A's outline when A is drawn with a stroke width of 3 + n points

Doing so results in the following number of unique strokes:

Unique strokes (near identical paths) = 11161 (default stroke width + 3 points)

Unique strokes (near identical paths) = 17952 (default stroke width + 2 points)

Unique strokes (near identical paths) = 36909 (default stroke width + 1 point)

Next I also included uniform scaling. All strokes where scaled before comparison so that they have a bounds rectangle with a longest side of 30 points. Doing so yields:

Unique strokes (near identical paths) = 3841 (default stroke width + 3 points)

Unique strokes (near identical paths) = 6675 (default stroke width + 2 points)

Unique strokes (near identical paths) = 17866 (default stroke width + 1 point)

In conclusion, there is a relatively high potential for memory savings and reusage, which would also facilitate maintenance.

I am curious to know how the original kanjivg files were created. Automatic conversion of an existing font? Which editor is being used for manual editing? In the archives I found kanjivg is based on schoolbook fonts. What does "based on" mean? Is it legal?

Jan

Karl Rosvold

unread,

Jul 5, 2012, 10:40:20 PM7/5/12

to kan...@googlegroups.com

Hi again Jan,

Ulrich can tell you the precise information, but from what I know, I think all the characters were entered manually. It wasn't some automated conversion of a previously existing font.

"Schoolbook font" is usually called "kyokasho-tai' in Japanese, and it's the style of writing kanji that is supposed to be the same as handwriting. It's pretty close to "kaisho" but for example the bottom of 令 (4EE4) is different in the two versions. It's a bit like saying 'hand-printed' rather than some proprietary shape data. I don't think there are any legal issues. And I think based on means 'used as a model' so that characters were written (i.e. input into the system) with the intention that they conform to the shapes identified as 'kyokashotai = schoolbook font'.

If I have made any mistakes Ulrich may correct what I said, but I think this is all accurate.

Karl

Jan

--
You received this message because you are subscribed to the "KanjiVG" group.
For options and unsubscribing, visit this group at
http://groups.google.com/group/kanjivg

Jan Eichhorn

unread,

Jul 8, 2012, 2:23:56 PM7/8/12

to kan...@googlegroups.com

Am Freitag, 6. Juli 2012 04:40:20 UTC+2 schrieb Karl Rosvold:

Hi again Jan,

Ulrich can tell you the precise information, but from what I know, I think all the characters were entered manually. It wasn't some automated conversion of a previously existing font.

A huge amount of work. I suspected an automatic conversion since identical elements were not reused as much as one should think.

"Schoolbook font" is usually called "kyokasho-tai' in Japanese, and it's the style of writing kanji that is supposed to be the same as handwriting. It's pretty close to "kaisho" but for example the bottom of 令 (4EE4) is different in the two versions. It's a bit like saying 'hand-printed' rather than some proprietary shape data. I don't think there are any legal issues. And I think based on means 'used as a model' so that characters were written (i.e. input into the system) with the intention that they conform to the shapes identified as 'kyokashotai = schoolbook font'.