> I made a primitive online viewer for kanjivg (not sure what the VG
> stands for?).
Vector Graphics, save as SVG since it is the format used.
> It is available here: http://kanjivg.lemoda.net/
Amazing. This is working great and the rendered images are really cool already.
> If you downloaded the previous parser, note that it had a bug whereby
> it didn't parse parts ("smooth curves") of the SVG and so if you tried
> parsing "学" bits would be missing. I'm not sure how the SVG was
> created but the format is a little untidy (dare I say it?). But the
> kanji images seem to be very high quality.
I think it's Adobe Illustrator. Indeed, most paths could be cleaned up
and simplified - that would also make the file smaller.
> If anyone wants the updated parser/db interface for kanjivg, I can
> send it by email. I think the "files" section here is not the best
> place for this so I'll think about making it a project on Source Forge
> or CPAN or something like that.
Actually I'm planning to do something similar for KanjiVG's website -
each kanji would have its own page where the user can see the data in
a user-friendly form as well as different renderings (image, stroke
order animation, etc.) and submit fixes. I was initially planning to
do that in Python (and a little bit of PHP for the front-end), as it's
the language my lib is written in so far (and I don't know about Perl
anyway). Would you like to join in? We could build the most complete
(although not most error-free at the moment) site proposing stroke
order diagrams!
On the main repository, every file is splitted into its own set of
files (one for the XML, one for the SVG) so I don't think the SQLite
database would be useful there.
Alex.
>> I made a primitive online viewer for kanjivg (not sure what the VG
>> stands for?).
>
> Vector Graphics, save as SVG since it is the format used.
Oh, I should have guessed!
>> It is available here: http://kanjivg.lemoda.net/
>
> Amazing. This is working great and the rendered images are really cool already.
Thanks.
>> If you downloaded the previous parser, note that it had a bug whereby
>> it didn't parse parts ("smooth curves") of the SVG and so if you tried
>> parsing "学" bits would be missing. I'm not sure how the SVG was
>> created but the format is a little untidy (dare I say it?). But the
>> kanji images seem to be very high quality.
>
> I think it's Adobe Illustrator. Indeed, most paths could be cleaned up
> and simplified - that would also make the file smaller.
I didn't check about simplification but it seems odd that there are
both relative and absolute paths mixed together in the same kanji
stroke.
I have also noticed that about twenty or so strokes don't have path information.
>> If anyone wants the updated parser/db interface for kanjivg, I can
>> send it by email. I think the "files" section here is not the best
>> place for this so I'll think about making it a project on Source Forge
>> or CPAN or something like that.
>
> Actually I'm planning to do something similar for KanjiVG's website -
> each kanji would have its own page where the user can see the data in
> a user-friendly form as well as different renderings (image, stroke
> order animation, etc.) and submit fixes.
I think it's better to start that project at the back end and think
about what you are going to do about version control, applying fixes
and so on. I think the user interface (web site or something) is the
easy part of this problem.
> I was initially planning to
> do that in Python (and a little bit of PHP for the front-end), as it's
> the language my lib is written in so far (and I don't know about Perl
> anyway).
The above site I made is mostly JavaScript in fact, there are only a
few lines of Perl which just pulls the SVG from the database and
writes it to a file.
Once the data is put into a database it is possible to then access it
using another language. If I get around to making a PNG version of the
stroke diagrams I might write it in C since there is a fairly easy
library in C called Cairo which I already use to make PNGs. SVG is
obviously much better than PNG for storing the kanji stroke data, but
it is not a very good format for presentation since Internet Explorer
doesn't support it and there are (really bad) errors in all the
renderers I know about. Inkscape, libsvg on Linux, Chrome, and Firefox
are all definitely buggy.
> Would you like to join in? We could build the most complete
> (although not most error-free at the moment) site proposing stroke
> order diagrams!
I'm sorry but unfortunately I already have a lot of other things to
do, so I don't have enough free time to make a commitment. The above
viewer is something I made to visually check that the curve
information was inserted into the database correctly.
Hi Ben,
Could you try referencing the SVG with object tags instead of img
tags? That works fine in Firefox at least.
<object id="strokediagram" data="img/kanji39340.svg" type="image/svg+xml"/>
For reference:
http://wiki.svg.org/SVG_and_HTML
The last section seems the proper way of handling this type of data.
See attached screenshot as well.
Kind regards,
Jeroen Hoek
> Could you try referencing the SVG with object tags instead of img
> tags? That works fine in Firefox at least.
>
> <object id="strokediagram" data="img/kanji39340.svg" type="image/svg+xml"/>
I changed it as you suggest and it is viewing OK now in both Firefox
and Chrome. Thanks for this fix.
Update: I changed it back to the way it was before, because it was not
working very well with Google Chrome (my default browser) like that. I
thought it was a bug on the Linux version of Chrome but it's the same
on Windows.
Obviously SVG just isn't well supported in browsers so the next step
is to make PNG from the data rather than trying to show SVG.
Right, I know about that too. These are the few I need to fix in order
to have a perfect match between XML and SVG data. There are about 50
kanjis in that case.
> I think it's better to start that project at the back end and think
> about what you are going to do about version control, applying fixes
> and so on. I think the user interface (web site or something) is the
> easy part of this problem.
Yeah, this part has been an headache, but we're going through it.
We already have a version control (SVN) which splits the data between
XML descriptions and complete, editable SVG files, one of both per
kanji. This makes editing easy. Every day, a script generates the
release files if a commit has been performed. This keeps things easy
and practical.
In an ideal world, users would be able to submit fixes straight from
the kanji page, but that seems difficult as of now. Maybe we will just
make the SVN public so that anyone can submit patches. I have to clean
it up first.
> I'm sorry but unfortunately I already have a lot of other things to
> do, so I don't have enough free time to make a commitment. The above
> viewer is something I made to visually check that the curve
> information was inserted into the database correctly.
Sure, I understand that. Still your program gave me some good ideas on
how to do the per-kanji pages.
Alex.
> Obviously SVG just isn't well supported in browsers so the next step
> is to make PNG from the data rather than trying to show SVG.
Another update: now it makes both png and svg files.
Just for fun, I made the PNG one have random stroke colours.
Works like a charm. Great!
By the way, KanjiVG is not my project, I'm just doing some code here
and there and use it in my own (Tagaini Jisho) - credit for KanjiVG is
due to Ulrich.
Alex.
There is at least this one:
http://www2005.org/cdrom/docs/p1152.pdf
And probably a couple of others, although KanjiVG is not referenced by
its present name. I'm pretty sure there is some research potential to
be exploited there, so I hope Ulrich will come with a couple of
perspectives for KanjiVG!
Alex.
I am very exited about the progress of KanjiVG in the last months
thanks to Alex and all of you.
> Is Ulrich Apel planning on publishing an article on KanjiVG? I would
> love to read it.
Thank you, Jeroen, for the question!
There are some papers on the topic together with Julien Quint who has
made the programming of animations and much of the theoretical work
about the information science side of the project -- pretty much like
Alex is doing it now. Unfortunately, Julien became very busy with
another job and other projects and we couldn't continue together.
One paper by Julien and me from SVGOpen 2004 is: Teaching and
Reference Material on Japanese Kanji in SVG
Stroke Order, Animated Drawing of Characters, Kanji Components and
their Relationships.
<http://www.svgopen.org/2004/papers/svgopen/>.
A similar paper from Coling 2004 is part of: <http://acl.ldc.upenn.edu/coling2004/W10/pdf/proceedings.pdf
>
Julien and I even won the Best Poster Award at the 14th World Wide Web
Conference (WWW 2005): <http://www.www2005.org/award.html>, <http://delivery.acm.org/10.1145/1070000/1062914/p1152-quint.pdf?key1=1062914&key2=3106415421&coll=GUIDE&dl=GUIDE&CFID=39765720&CFTOKEN=95415220
>
I am also planning a research project about the extension of KanjVG at
my new employer Tuebingen University.
By the way, the original Illustrator data also contains information
about numbering the starting points of the strokes. These numbers
should be in way that overlapping and misunderstanding is avoided.
Perhaps a new version of KanjiVG might include this information too.
Best wishes
Ulrich
Thanks for the list of papers!
> By the way, the original Illustrator data also contains information
> about numbering the starting points of the strokes. These numbers
> should be in way that overlapping and misunderstanding is avoided.
> Perhaps a new version of KanjiVG might include this information too.
I did not include them because this information would better be
computed IMHO. It wouldn't hurt much to add an attribute with the
position of the number, but this implies that we don't forget to
update them every time we edit a file, which is rather error-prone.
Alex.
Nor is it to me, to be honest. I guess that even manually, it is
difficult to always find a comprehensive layout where there are no
ambiguities as to which path a number belongs to. Still, using
bounding boxes for paths and numbers and ensuring they don't overlap
sound like a way to do that.
Alex.
>> Do you have an algorithm for this? It's not obvious to me how to compute it.
>
> Nor is it to me, to be honest. I guess that even manually, it is
> difficult to always find a comprehensive layout where there are no
> ambiguities as to which path a number belongs to. Still, using
> bounding boxes for paths and numbers and ensuring they don't overlap
> sound like a way to do that.
My vote is to include the information on the location of the number in
the data file.
Okay, I'll try to update the file then. Note that the information is
still subject to human errors.
Alex.
Actually KanjiVG does not impose any color scheme. It just gives you
the stroke paths, so you can apply whatever shape or color that
pleases you.
> For example I am using a scheme inspired in the colors of the rainbow
> for a kanji learning game I'm developing.
> I uploaded a couple samples (to Picasa for convenience):
> http://picasaweb.google.com/esaulgd/Kanji?authkey=Gv1sRgCKb3mIykpK3KFA&feat=directlink
Looks interesting. Be careful as the licence for KanjiVG prohibits
commercial use.
> BTW, you'll see the shapes themselves are pretty rough. This is
> because I'm currently using the data from the Tomoe project. I'd like
> to use the KanjiVG data instead, and for this purpose it would be
> great if I could take a look at the code for the online viewer. Would
> it be possible to obtain a copy?
Ben posted the code of the viewer he used for
http://kanjivg.lemoda.net/ . You can find it here:
Anyway, parsing SVG paths is not very hard. You can find documentation
about that here: http://www.w3.org/TR/SVG/paths.html#PathData
Have fun,
Alex.
> Incidentally, the data could prove to be useful for handwriting
> recognition as well. They could be used either as training data for
> learning algorithms or as test data.
The original reason I requested the data from Ulrich Apel was in order
to use it to test against a data set for "handwriting" recognition.
The current release of the data sprang from a discussion on Jim
Breen's dictionary mailing list. I'm also very glad that the data was
released.
> It would be just a matter of
> sampling points from the lines and bézier curves.
FYI, all the data in KanjiVG is cubic Bezier curves.
> BTW, you'll see the shapes themselves are pretty rough. This is
> because I'm currently using the data from the Tomoe project. I'd like
> to use the KanjiVG data instead, and for this purpose it would be
> great if I could take a look at the code for the online viewer. Would
> it be possible to obtain a copy?
I don't plan to release this source code at the moment. The basis of
the viewer is converting the SVG curves (or paths) from KanjiVG into
calls to the Cairo graphics library. The Cairo routines
"cairo_curve_to" and "cairo_rel_curve_to" correspond exactly to the
"C" and "c" curve information in KanjiVG. For the "S" and "s" curve
information you need to go back to the previously drawn curve and get
its second control point, since Cairo doesn't offer anything directly
equivalent. In my implementation I only use "cairo_curve_to", so I
convert all the c/s/S stuff. If you can read C (the computer
language), here is the algorithm:
>>>>>>>> Start C snippet
/* If the curves are relative to the current point ("c" or "s" in
SVG path notation), add the value of the current point to make
them absolute. */
if (curve_type == 'c' || curve_type == 's') {
int i;
for (i = 0; i < 3; i++) {
c.pt[i].x += last_point.x;
c.pt[i].y += last_point.y;
}
if (curve_type == 'c') {
curve_type = 'C';
} else if (curve_type == 's') {
curve_type = 'S';
} else {
fprintf (stderr, "Unknown type of curve %c\n", curve_type);
exit (1);
}
}
if (curve_type == 'S') {
point second_point;
secondpoint (k, & second_point);
c.pt[2] = c.pt[1];
c.pt[1] = c.pt[0];
c.pt[0].x = 2.0*last_point.x - second_point.x;
c.pt[0].y = 2.0*last_point.y - second_point.y;
}
<<<<<<<<<<<< end snippet
On Tue, Jun 23, 2009 at 4:00 PM, Alexandre Courbot <gnu...@gmail.com> wrote:
>
> Actually KanjiVG does not impose any color scheme. It just gives you
> the stroke paths, so you can apply whatever shape or color that
> pleases you.
>
I was talking about the online viewer, so this would be mostly a
suggestion to Ben, I think.
> Looks interesting. Be careful as the licence for KanjiVG prohibits
> commercial use.
This is a purely academic development (my master thesis actually). Of
course I will give full credit and follow all licensing restrictions.
I can post more info on the project as it develops if that's okay.
> Ben posted the code of the viewer he used for
> http://kanjivg.lemoda.net/ . You can find it here:
>
> http://kanjivg.googlegroups.com/web/kanjivg.tar?gda=mJNv2T0AAACZ1HcjKkQOwo5IfNM_r-emiwmoZe8lhpcIAhudhKj761dp9oANqoIL0POiyte4AGLlNv--OykrTYJH3lVGu2Z5
>
> Anyway, parsing SVG paths is not very hard. You can find documentation
> about that here: http://www.w3.org/TR/SVG/paths.html#PathData
Thanks a lot for the references. They'll be quite useful. Same goes for Ben.
I'm developing in XNA (Visual C#). There doesn't seem to be native SVG
support, but hopefully the graphics libraries support bezier curves.
Thanks.
-- Enrique