Animated & styled kanjis vs kanjivg errors

48 views
Skip to first unread message

TGara

unread,
Jun 11, 2012, 7:25:44 AM6/11/12
to kan...@googlegroups.com
While working on a script to style and animate the kanjivg database (see 'good_kanjivg' example in attached zip), I got bitten by multiple kanjivg erros. They become most apparent when trying to style each stroke from a script. I can detect some of them with some parsing and dump the results.
Now, which of these do the mods consider actual errors and would be willing to start a however slow process of correcting, given an error listing?

- wrong stroke subtype (say, '㇐' instead of '㇐a') 
- wrong stroke (eg, says '㇑a' but is a '㇚', see attached 'bad_kanjivg' file in zip)
- same stroke type drawn with different point numbers( eg '㇒' can be found drawn with 2,3 or 4 points)
- multiple (80+ !) stroke types, some of them appearing just a few times, many describing the same shape and stroke style

Cheers, Ted
kanjivg_examples.zip

msk...@ansuz.sooke.bc.ca

unread,
Jun 11, 2012, 10:47:43 AM6/11/12
to kan...@googlegroups.com
On Mon, 11 Jun 2012, TGara wrote:
> Now, which of these do the mods consider actual errors and would be willing
> to start a however slow process of correcting, given an error listing?

I'm all for having consistent values of stroke types, but I think that has
to begin with at least a strong first draft of what the list of allowed
values for the field actually is. To my knowledge, no such list currently
exists. The closest thing we have is the "strokes.txt" file, which
doesn't match the current database, and so your description of some values
as "wrong" isn't terribly meaningful - there's nothing we could really say
would be "right." Unicode's list of strokes in the CJK Strokes range
(U+31C0 to U+31E3) seems to be the basis for what's currently in the
database, but it's not the only reasonable possibility on how to classify
strokes, and different approaches have different advantages and
disadvantages making it not trivial to choose the best one. I wrote about
this a bit in my March 12 message here about consistency rules; and my
test suite for KanjiVG could easily check for consistency of the stroke
type field at such time as there's a definition of what the correct values
are allowed to be.

Having a consistent number of points (and also a consistent pattern of
things like which points are "corner" points) per stroke type would
certainly be nice, but I don't think there's much point even starting on
that piece of the puzzle until we have a list of stroke types in the first
place. Once we have a list of stroke types, and choose the pattern of
control points (how many, and which ones are corners) for each stroke type,
then there's some reasonable possibility of fixing just that one thing in
an automated way by computing the standard-pattern control points that
best match the whatever-pattern curves already in the database.
--
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/

TGara

unread,
Jun 11, 2012, 12:13:11 PM6/11/12
to kan...@googlegroups.com


On Monday, June 11, 2012 5:47:43 PM UTC+3, Matthew wrote:
On Mon, 11 Jun 2012, TGara wrote:
> Now, which of these do the mods consider actual errors and would be willing
> to start a however slow process of correcting, given an error listing?

I'm all for having consistent values of stroke types, but I think that has
to begin with at least a strong first draft of what the list of allowed
values for the field actually is.  To my knowledge, no such list currently
exists.  The closest thing we have is the "strokes.txt" file, which
doesn't match the current database, and so your description of some values
as "wrong" isn't terribly meaningful - there's nothing we could really say
would be "right."  

I've read your March12th post and fully agree. Most of that would make my life much easier while trying to design a robust algorithm to style the strokes. 
How about no3 on the list? If kvg:type says it's a vertical line, the path should draw a vertical line of some sort, not a horizontal one. I *might* be able to detect these with some tolerable false positive rate.

Alexandre Courbot

unread,
Jun 12, 2012, 7:28:39 PM6/12/12
to kan...@googlegroups.com
Hi,
I guess points 1, 2 and 4 could be fixed rather safely I think. For 3
we'd need more precise descriptions of the strokes, and so far only
Ulrich could give these.

If you want to submit such fixes and make sure they are integrated in
a timely manner, you can clone the git repository, make the fixes in
your copy (one commit per change please), and then ask for a merge
request. It will be much easier and faster to do for us if we only
have to review the change instead of doing it ourselves.

Thanks,
Alex.
Reply all
Reply to author
Forward
0 new messages