Stroke markup reusage

56 views
Skip to first unread message

Jan Eichhorn

unread,
Jul 5, 2012, 2:30:53 PM7/5/12
to kan...@googlegroups.com
Looking at the stroke markup I noticed there is a relatively low degree of reusing existing strokes and groups.

Here are some statistics about the stroke markup:
Total strokes = 142211
Unique strokes (identical svg markup) = 101131
If every stroke is translated to the origin (remove its start point): 95781 unique strokes.
Next I tried to identify strokes that are almost identical. Two strokes A,B are identical, if 
A fits inside B's outline when B is drawn with a stroke width of 3 + n points, and vice versa: 
B fits inside A's outline when A is drawn with a stroke width of 3 + n points 
Doing so results in the following number of unique strokes:
Unique strokes (near identical paths) = 11161 (default stroke width + 3 points)
Unique strokes (near identical paths) = 17952 (default stroke width  + 2 points)
Unique strokes (near identical paths) = 36909 (default stroke width  + 1 point)
Next I also included uniform scaling. All strokes where scaled before comparison so that they have a bounds rectangle with a longest side of 30 points. Doing so yields:
Unique strokes (near identical paths) = 3841 (default stroke width  + 3 points)
Unique strokes (near identical paths) = 6675 (default stroke width  + 2 points)
Unique strokes (near identical paths) = 17866 (default stroke width  + 1 point)

In conclusion, there is a relatively high potential for memory savings and reusage, which would also facilitate maintenance. 

I am curious to know how the original kanjivg files were created. Automatic conversion of an existing font? Which editor is being used for manual editing? In the archives I found kanjivg is based on schoolbook fonts. What does "based on" mean? Is it legal? 

Jan

Karl Rosvold

unread,
Jul 5, 2012, 10:40:20 PM7/5/12
to kan...@googlegroups.com
Hi again Jan,

Ulrich can tell you the precise information, but from what I know, I think all the characters were entered manually. It wasn't some automated conversion of a previously existing font.

"Schoolbook font" is usually called "kyokasho-tai' in Japanese, and it's the style of writing kanji that is supposed to be the same as handwriting. It's pretty close to "kaisho" but for example the bottom of 令 (4EE4) is different in the two versions. It's a bit like saying 'hand-printed' rather than some proprietary shape data. I don't think there are any legal issues. And I think based on means 'used as a model' so that characters were written (i.e. input into the system) with the intention that they conform to the shapes identified as 'kyokashotai = schoolbook font'.

If I have made any mistakes Ulrich may correct what I said, but I think this is all accurate.

Karl


Jan

--
You received this message because you are subscribed to the "KanjiVG" group.
For options and unsubscribing, visit this group at
http://groups.google.com/group/kanjivg

Jan Eichhorn

unread,
Jul 8, 2012, 2:23:56 PM7/8/12
to kan...@googlegroups.com
 Am Freitag, 6. Juli 2012 04:40:20 UTC+2 schrieb Karl Rosvold:
Hi again Jan,

Ulrich can tell you the precise information, but from what I know, I think all the characters were entered manually. It wasn't some automated conversion of a previously existing font.
A huge amount of work. I suspected an automatic conversion since identical elements were not reused as much as one should think.
 

"Schoolbook font" is usually called "kyokasho-tai' in Japanese, and it's the style of writing kanji that is supposed to be the same as handwriting. It's pretty close to "kaisho" but for example the bottom of 令 (4EE4) is different in the two versions. It's a bit like saying 'hand-printed' rather than some proprietary shape data. I don't think there are any legal issues. And I think based on means 'used as a model' so that characters were written (i.e. input into the system) with the intention that they conform to the shapes identified as 'kyokashotai = schoolbook font'.
 
Thanks for the explanation.
 
Jan
Reply all
Reply to author
Forward
0 new messages