Looking at the stroke markup I noticed there is a relatively low degree of reusing existing strokes and groups.
Here are some statistics about the stroke markup:
Total strokes = 142211
Unique strokes (identical svg markup) = 101131
If every stroke is translated to the origin (remove its start point): 95781 unique strokes.
Next I tried to identify strokes that are almost identical. Two strokes A,B are identical, if
A fits inside B's outline when B is drawn with a stroke width of 3 + n points, and vice versa:
B fits inside A's outline when A is drawn with a stroke width of 3 + n points
Doing so results in the following number of unique strokes:
Unique strokes (near identical paths) = 11161 (default stroke width + 3 points)
Unique strokes (near identical paths) = 17952 (default stroke width + 2 points)
Unique strokes (near identical paths) = 36909 (default stroke width + 1 point)
Next I also included uniform scaling. All strokes where scaled before comparison so that they have a bounds rectangle with a longest side of 30 points. Doing so yields:
Unique strokes (near identical paths) = 3841 (default stroke width + 3 points)
Unique strokes (near identical paths) = 6675 (default stroke width + 2 points)
Unique strokes (near identical paths) = 17866 (default stroke width + 1 point)
In conclusion, there is a relatively high potential for memory savings and reusage, which would also facilitate maintenance.
I am curious to know how the original kanjivg files were created. Automatic conversion of an existing font? Which editor is being used for manual editing? In the archives I found kanjivg is based on schoolbook fonts. What does "based on" mean? Is it legal?
Jan