New feature of Tagaini - kanji selector build on KanjiVG component information

49 views
Skip to first unread message

Alexandre Courbot

unread,
Jan 2, 2010, 11:48:42 AM1/2/10
to tagain...@googlegroups.com, kan...@googlegroups.com
Hey everybody,

I wanted to do that one since a loooong time ago (since I learned
about KanjiVG, actually), so I christmas-treated myself and coded it
during the holidays. :p

Basically, this is the "new" (as far as I know - that is not very far)
kanji input method that is based on selecting components that make the
kanji. Think of it as a radical selector on steroids. Or rather,
forget about radicals - thanks to the very rich component information
of KanjiVG, any part of the kanji that you are able to recognize can
be the end of the thread that will lead you to it. Once you have found
that end, other components that can complement this one to make a
kanji are displayed, and you can incrementally build your way to the
kanji you are looking for. Jeroen had explained a similar and more
complete concept in his great thesis
(http://handle.jeroenhoek.nl/hoek_2009 ), I think the present work can
constitute the first step to its implementation.

To demonstrate it I have put a short video on Youtube. I think it
explains better than any description I could make.

http://www.youtube.com/watch?v=RhuhMpKv7BU

The feature should make it into 0.2.5 - 0.3.0 (or by the meantime into
a development build, if people ask for it). It is still rough on the
edges, and I realized some data is missing, but even in the current
state it is the fastest way for me to find a totally unknown kanji. 4
clicks max, provided I can identify a component. I believe any student
with a very basic knownledge about kanjis can use it to its full
power.

I'm curious to know what you guys think about it, whether it is
actually something new (at least, I haven't seen anything similar in
any software or web site), and how it could be improved.

Alex.

Ben Bullock

unread,
Jan 2, 2010, 11:53:44 PM1/2/10
to kan...@googlegroups.com
2010/1/3 Alexandre Courbot <gnu...@gmail.com>:

> Basically, this is the "new" (as far as I know - that is not very far)
> kanji input method that is based on selecting components that make the
> kanji. Think of it as a radical selector on steroids.

I looked at the selector you showed. It seems quite similar to a lot
of existing things. This is in gjiten and in web pages like

http://tangorin.com/mr-kanji
http://jisho.org/kanji/radicals/

and last my own poor effort:

http://kanji.sljfaq.org/mr.html

and new version

http://kanji.sljfaq.org/kanji13/mr.html

(This new version is going to replace the old one at the end of January.)

> Or rather,
> forget about radicals - thanks to the very rich component information
> of KanjiVG, any part of the kanji that you are able to recognize can
> be the end of the thread that will lead you to it.

I first made something like the above in about 1998, in a totally
defunct XWindows dictionary. In 1994 or so I had made a radical
selection program written using curses on HP-UX which ran inside
kterm. I sent it to Jim Breen, who seemed to really like it, and he
was going to incorporate it into xjdic, but he gave up because of the
problems with using curses across different computers. (This was in
the days before Linux-type systems became ubiquitous.) Eventually the
radical selector made it into wwwjdic, as you might know. The above
systems with lookahead blanking out the buttons which are unreachable
are quite similar to the 1998 dictionary I made for XWindows. As far
as I know, I invented that idea.

> Once you have found
> that end, other components that can complement this one to make a
> kanji are displayed, and you can incrementally build your way to the
> kanji you are looking for. Jeroen had explained a similar and more
> complete concept in his great thesis
> (http://handle.jeroenhoek.nl/hoek_2009 )

I've tried this "component" method where you get a list of components,
and I came to the conclusion that it's faster to have a fixed "bank"
of radicals, like the one at

http://kanji.sljfaq.org/kanji13/mr.html

and have the buttons which can't be reached go "dead" as components
are selected. That way you don't have to look through a list to find
radicals, but can just remember the position. After looking up a lot
of characters, I've found it is much easier.

As a useability test, I tried using jisho.org's radical selector for
quite a while, and got frustrated because the radicals are in
different places depending on how you resize the screen. I found it
easier to keep them in a block size. My old system at

http://kanji.sljfaq.org/mr.html

also frustrated me because although I intended to hide a lot of
radicals which aren't very common, sometimes I actually needed them or
I wanted to eliminate them from my search, so I have decided to scrap
that "hidden radical" thing.

> The feature should make it into 0.2.5 - 0.3.0 (or by the meantime into
> a development build, if people ask for it). It is still rough on the
> edges, and I realized some data is missing, but even in the current
> state it is the fastest way for me to find a totally unknown kanji. 4
> clicks max, provided I can identify a component. I believe any student
> with a very basic knownledge about kanjis can use it to its full
> power.

I guess you need a few students with a very basic knowledge to test it for you.

> I'm curious to know what you guys think about it, whether it is
> actually something new (at least, I haven't seen anything similar in
> any software or web site), and how it could be improved.

I haven't extensively tested the kanjivg data versus the kradfile
data. I am often finding small errors in kradfile as I test the above
web pages. I send them to Jim Breen as I find them. It would be nice
if some kind of global compatibility check could take place to finally
eliminate all the niggles.

Mathieu Blondel

unread,
Jan 3, 2010, 12:01:40 PM1/3/10
to kan...@googlegroups.com, tegaki-hwr
On Sun, Jan 3, 2010 at 1:53 PM, Ben Bullock <benkasmi...@gmail.com> wrote:

> http://tangorin.com/mr-kanji
> http://jisho.org/kanji/radicals/
>
> and last my own poor effort:
>
> http://kanji.sljfaq.org/mr.html
>
> and new version
>
> http://kanji.sljfaq.org/kanji13/mr.html
>
> (This new version is going to replace the old one at the end of January.)

Very nice interfaces indeed! I liked the fact that components that
can't be reached are grayed.

Having such an input method available at hand directly in SCIM or Ibus
would be very useful. I wonder if that would fit into Tegaki, since we
have SCIM support and plan to support Ibus too.

The above web-based electronic dictionaries are very nice and with the
offline capabilities coming in HTML5, there may even no longer be the
need for desktop dictionaries...

Mathieu

Ben Bullock

unread,
Jan 3, 2010, 11:14:28 PM1/3/10
to KanjiVG
On Jan 4, 2:01 am, Mathieu Blondel <mblon...@gmail.com> wrote:

> The above web-based electronic dictionaries are very nice and with the
> offline capabilities coming in HTML5, there may even no longer be the
> need for desktop dictionaries...

Somewhere, I have a JavaScript-only version of the multiradical
selection program. The only real disadvantage is that it requires a
fairly big data file (over 100,000 bytes), and it seems that this is a
problem for some people, not to mention it costs me bandwidth charges,
so I've stuck with the Ajax based system for now. It's feasible to
make a JavaScript-only recognition system too, but the data size there
is going to be ten times bigger. One solution might be to have a
hybrid of JavaScript for common characters, and Ajax for the rest.

The big advantage of web dictionaries is that the user doesn't have to
keep downloading the data over and over. With a permanent web
connection, and given the size and number and frequency of updates of
all the component files of kanjivg, edict, and so on, I would prefer
to use someone's web dictionary rather than have one on my computer. I
also like the idea of users being able to give feed back to the
content creator via the web dictionary. There is a system for that in
WWWJDIC but I think there is a little room for improvement.

Incidentally, the kanji web page I made above originally started as an
attempt to update gjiten, an open-source Japanese dictionary for
Linux. From user feedback on sci.lang.japan, I added links to WWWJDIC,
and hence it became a kind of unofficial interface of WWWJDIC.

Alexandre Courbot

unread,
Jan 4, 2010, 11:29:33 PM1/4/10
to kan...@googlegroups.com
Hi Ben, thanks for your comments.

> I looked at the selector you showed. It seems quite similar to a lot
> of existing things. This is in gjiten and in web pages like
>
> http://tangorin.com/mr-kanji
> http://jisho.org/kanji/radicals/
>
> and last my own poor effort:
>
> http://kanji.sljfaq.org/mr.html
>
> and new version
>
> http://kanji.sljfaq.org/kanji13/mr.html

I was trying to highlight the difference with a multi-radical
selector, but have to recognize both the look and functionality makes
this difficult. From a functionnal point of view, the difference is
that while a multi-radical selector proposes a fixed set of building
blocks, the selector I wrote adapts the list of blocks to make it
relevant for the current context, proposing complements from almost
6500 blocks seen in kanjis (which is why displaying them all would not
be practical). Whether this way of doing is efficient compared to a
classical selector is what I'd like to figure out.

> and have the buttons which can't be reached go "dead" as components
> are selected. That way you don't have to look through a list to find
> radicals, but can just remember the position. After looking up a lot
> of characters, I've found it is much easier.

Indeed, being able to remember the position of frequently used
radicals is an advantage that is lost with my selector. On the other
hand, it is possible to directly input components using the system
input method.

I guess I'll integrate this method and see the users' reactions. After
all, this is still a first draft.

Alex.

Reply all
Reply to author
Forward
0 new messages