| Re: Handwritten kanji recognition as WS? | Ben Bullock | 14/05/10 03:33 | Hi,
(I am CCing to the discussion group so excuse me for replying as a "blind CC" to hide your email.) Provisionally, you are welcome to use the engine for your phone application. Can you work out the format? Currently the input should look like this. You send a POST request to kanji.sljfaq.org/kanji16/kanji-0.016.cgi like this: H ae34567 ^^^^^^^^ X Y X Y for the first line, followed by successive lines like az567890 ^^^^^^^^^ X Y X Y where each X, Y coordinate is an integer written using two bytes in base 36. Scale is not important here since internally everything is scaled to 100x100, but obviously you need to keep X and Y coordinates within the range 0 - 36x36. X is from left to right and Y is from top to bottom. Finally there should be a blank line at the end. You will get a response which looks like this: Content-Type: text/plain; charset=UTF-8 {"status":"OK","input_id":"test","total_results":20, "results":["恋","奕","変","蛮","亦","弯","迹","卒","廱","旁","刻","廠","峭","崢","劾","奐","褒","衣","亥","廏"]} The content is in JSON format. If it fails, you'll get "status":"error" and maybe a message. If you want to get lookahead (as per the button on the web page), use HL at the front instead of H. Without the L it matches only kanji with the same number of strokes as the input. There is also S for "save", which you add like HSL Don't use HLS but HSL. If possible it's better for you to save the user's input on the phone, but that turned out not to be possible with JavaScript, so it is saved on the server. If you save on the server, the JSON will contain an ID to use, like "input_id":"S@0iQcCoAgIAAAynDSg". You can get the input back from the server using a post request L S@kP7sCoAgIAAAymCp8 (no H at the start) This will send a JSON of the drawing, which you need to reload into your application. In the JavaScript on the web it is just exactly the internal format used. It will send as gzipped if you ask for it. Old inputs are only kept for a few hours. If it can't find the old input it sends an error message which your program will need to deal with. If you decide you want to use it, please watch the Google Group sljfaq.org for updates to the format and engine. The system is still under development and so I don't guarantee to keep the format stable. Also, note that I have to pay for bandwidth so if your phone application starts to use a lot of bytes we might have to discuss this some more. Best wishes, Ben Bullock |
| Re: Handwritten kanji recognition as WS? | nick | 16/05/10 18:27 | On Fri, May 14, 2010 at 7:33 PM, Ben Bullock <benkasmi...@gmail.com> wrote:Thank you. I am working on it :) Are the formatted versions of the JS files available somewhere? Thank you for the detailed explanation. Still haven't had a chance to try it, will probably post more questions once I do. I am now subscribed to sljfaq.org. It's a free application, so it's OK if it breaks. There are no guarantees :) There are now a little over 500 active users, so this shouldn't be a problem, I think. I can't control it though, so I will set the user agent to something easily recognizable ('WWWJDIC for Android 1.0' or some such), so that you can throttle/block it if it uses too much bandwidth. How does that sound? |
| [kanji.sljfaq.org] Re: Handwritten kanji recognition as WS? | Ben Bullock | 16/05/10 21:37 | On 17 May 2010 10:27, N.E. wrote:I'm attaching a formatted version of the file you need, draw-<version>.js. I made this by indenting draw-0.016.js using Emacs. The actual original JavaScript file is created from a template so it's completely unuseable without all the code which creates it. I'm attaching a file which gives an example in Perl of accessing the API. It's a very simple format. At the moment you won't get much of an error message if your format is wrong, so you'd better check it. In the future, we should think about API keys and sending the POST data in a compressed format. But for the time being, it's OK not to worry. Well, even if it's free we can hope that people will use it. OK, that's great. Thanks. |
| Re: [kanji.sljfaq.org] Re: Handwritten kanji recognition as WS? | nick | 17/05/10 01:10 | On Mon, May 17, 2010 at 1:37 PM, B.B. wrote: >> I am working on it :) Are the formatted versions of the JS filesThank you. Thanks. My first (crude) attempt at this actually works :) I'm passing the x-y values I got from the device as is, just converting to your format. Since the internal format I used to draw on screen is pretty much the same, it was easier than I thought. I might change the user agent later, but for now it's Android-WWWJDIC/0.x. I still need to work out a few things and clean it up a bit, but since it mostly works, I will probably release it soon. So a couple of questions: * how do you refer to your web application? (in the code) I'm currently using 'kanji recognizer', but would would be the 'proper' name? * how do you want me to attribute you? I'll put a notice in the about screen and on the project site. Something like: 'Uses Kanji recognizer by Ben Bullock' . Btw, what software are you using to back this up? Thanks for all your help. |
| Re: [kanji.sljfaq.org] Re: Handwritten kanji recognition as WS? | Ben Bullock | 17/05/10 02:13 | On 17 May 2010 17:10, Nikolay Elenkov <nikolay...@gmail.com> wrote:I got some input which looks like this: # Input: HSL 4v2j4u2j4t2j4r2j4r2k4p2m4o2n4j2s4h2v4e304937453h423q403w4041414a434h464n4a4t4c4y4f534j584l5a4n5d4s5g4u5i4w5k4z5l515l555m575m585m5b5l5c5k5d5k5e5j5g5h5k5g5l5g5n5f5p5e5q5e5s5d5t5d5t5c5v5c 2s5z2s602s632s662s682s6b2s6f2s6i2t6m2t6s2t6u2t702t732s752s772s7b2s7c2s7d2s7e2s7f 2v5x2w5x2x5x2z5x315w325w335w345w375w385w3c5v3j5v3o5v3v5w425y4h624o645268576b5g6c5i6d5l6e5m6f5n6g5o6g5o6i5o6j5p6j5p6m5p6n5o6o5o6p5n6r5n6s5l6u5k6w5j6x5i6y5h705f725e735d745c755c765b76 317f337f357e4a744h734k734l734n734o734q734r734s734u734x735272567259725b715c715d71 # Response: {"status":"OK","input_id":"S-Ct@tBedcsAAQlgPWo","total_results":20, "results":["白","或","轅","車","的","自","魄","皖","甫","皚","皓","皎","皈","吏","束","禹","軟","軋","兎","守"]} # No errors. Could you please use "kanji recognizer at kanji.sljfaq.org". As above, I'd prefer you used the website name, not my name. That way, people will know where to go when they are on their PC rather than on the phone. Even better would be a complete URL, or link. The kanji recognizer underlying the website is a heavily modified version of KanjiPad. See fishsoup.net for more about KanjiPad. But the kanji data comes entirely from the KanjiVG project. |
| Re: [kanji.sljfaq.org] Re: Handwritten kanji recognition as WS? | nick | 17/05/10 02:33 | On Mon, May 17, 2010 at 6:13 PM, Ben Bullock wrote:> HSL 4v2j4u2j4t2j4... Hm, that's strange, I only use the 'H' option. Look ahead might be to expensive on a mobile connection so I don't use it, and I'm using the program to save state, so no need for the 'S'. Here's one request I just sent: H 2t2r2t2t2t3g2p4j2n4y2d622b6q287c267m267p267p 2v2u2w2t3g2q462n4z2n592o5n2x5u365y3p5y495w4l5r5g5f6b5d6g5a6q5a6x5a715a725b745b74 2x4z314y3b4x464u564t5i4t5s4w5u4x5u4x 2l7b2o7934753g71406x4j6w4u6w4w6w4y6w4y6w OK, will do. Sure, I have clickable links to WWWJDIC and WeOCR. Will do the same for your site. OK, thanks. I see KanjiVG is used for the stroke order diagrams too. |
| Re: Handwritten kanji recognition as WS? | Ben Bullock | 17/05/10 02:56 |
> > On 17 May 2010 17:10, Nikolay Elenkov <nikolay.elen...@gmail.com> wrote:OK, I see I have a whole lot of it's from your browser using at the same time. The kanji log doesn't record the user agent so I was cross- referencing. I don't know if you mean that the program only runs on user request, but regarding my program, look-ahead is almost the same speed as the non-lookahead. The average running time of the recognizer is about 1/30 of a second, which is "instantly" for human beings. The main delay is always going to be network delay, unless you are very nearby the server, which is in Arizona. The slow part of the algorithm is analyzing the shape of the strokes and this has to be done whether or not you look ahead. The CGI library I use for the recognizer has gzip and deflate compression built in, so what I think would be a good idea is to be able to accept that at my end. Let's talk about it at a later date. |
| Re: Handwritten kanji recognition as WS? | nick | 17/05/10 04:49 | On Mon, May 17, 2010 at 6:56 PM, Ben Bullock <benkasmi...@gmail.com> wrote: >> Look ahead might be to expensiveI mean there is a 'recognize' button that you have to press once you've drawn the kanji. (unlike your site where strokes a send as they are drawn, in true Ajax style) Since network delay is going to be even worse on a mobile connection, I don't want to send too many requests, that's why I said I don't want to use look ahead.But maybe I misunderstood what look ahead does. I thought it means 'send strokes as they become available and show possibly matching kanji'. It seems it is a variation of the matching algorithm. So what does it do exactly? Sure. I'm using httpclient, which supports compression. (unless they left it out of the Android version...) P.S. The 'new canvas' is indeed smoother on Chrome. |
| Re: Handwritten kanji recognition as WS? | Ben Bullock | 17/05/10 05:49 | On May 17, 8:49 pm, Nikolay Elenkov <nikolay.elen...@gmail.com> wrote: > On Mon, May 17, 2010 at 6:56 PM, Ben Bullock <benkasminbull...@gmail.com> wrote:OK I see. It saves a few wasted sends to the server. If you look at http://kanji.sljfaq.org/draw.html you'll notice that it only matches characters with exactly the same number of strokes as the number of lines the user has drawn. That is also the behaviour of http://kanji.sljfaq.org/kanji16/draw.html if you switch off the "look ahead". If you switch on "look ahead" it will look for characters which have more strokes than the one you are currently looking at. This is a new feature of the "kanji16" version. See this announcement: http://groups.google.com/group/sljfaqorg/browse_thread/thread/af5e8b657b39e57f There are some videos where you can see it beats Microsoft IME pad (only for some cases though!) The original program which it is based on, KanjiPad, doesn't have any lookahead. I put in a lookahead a long time ago, but it didn't work. It involved some big changes to the algorithms used; with KanjiPad if you tried to look ahead you would just get complete garbage as results. Good, I'll announce here when compression is up. I've tried testing the Taku Kudo version with "real people", and the truth is that when I've tried giving people the Kudo (draw.html) recognizer to use, some people just can't use it because of the way they use the mouse. I think the canvas version solves this problem, but I haven't done any "people tests" yet. |
| Re: Handwritten kanji recognition as WS? | nick | 18/05/10 18:54 | On Mon, May 17, 2010 at 9:49 PM, Ben Bullock <benkasmi...@gmail.com> wrote:Thanks for the explanation. I added the option to toggle look ahead via a checkbox. You can see how it looks at http://code.google.com/p/wwwjdic/ (I failed screenshot resizing, will add better ones later). I pushed the new version to the market yesterday, so you might start getting more requests. One thing I noticed with the look ahead is that when it's on, simple (radical level) kanji are sometimes not recognized. You can easily reproduce it with the web interface too. Draw 虫 for example and you get a list of kanji that have it as their left-side radical, but not the actual '虫' kanji. If you switch off look ahead, it matches the simple kanji. Since people are likely to try it with simple kanji first, I left look ahead off by default. Is this the intended behaviour? Btw, with some kanji, say '本' it matches the simple kanji even with look ahead on. P.S. People are already saying kanji recognition rocks :) Thanks for all your work. |
| Re: Handwritten kanji recognition as WS? | Ben Bullock | 18/05/10 20:36 | On 19 May 2010 10:54, Nikolay Elenkov <nikolay...@gmail.com> wrote:I couldn't see the pictures on that page. I got 130 requests between today and yesterday morning, GMT. I understand the phenomenon you mean: see this video: http://www.youtube.com/watch?v=5hSWD0HXfs8 As you can see from the video, mostly it happens if one writes the 虫 bit tall and thin. If the 虫 is tall and thin it looks like part of another character. Also, there is a bug in the software which makes the recognition data from the KanjiVG data, which also contributes to this. I discussed this a bit on the KanjiVG mailing list. Basically the KanjiVG data for the final stroke of 虫 is curved, and the bug lies in the recognizer which creates the data, which, due to a poor algorithm, biases it towards a very curved line. Then the data for, say, 蝠, in the KanjiVG data is not curved, and the recognizer then thinks you have drawn 蝠 rather than 虫 if the final stroke is straight. If Alexandre is reading this, this kind of bug is why I haven't responded yet about automatically recognizing for KanjiVG stroke shapes. I really need to fix this problem. To some extent it's the intended behaviour. There is a design decision I made here to assume that the person writing the character is starting on the left and working rightwards, or starting at the top and working downwards. When a character is read in, it is normalized to a size of 100x100, but the horizontal and vertical are not scaled. If it is tall and thin, it's normalized to the left, and if it is short and fat, it is normalized to the top. That would give the normal order of drawing kanji, top to bottom and left to right. Sometimes that would be wrong though. Now that I come to mention this, the lookahead is an experimental feature and I still haven't set up any kind of way of testing how accurately it works on real user data. I think the next thing to do is to work out some kind of testing regimen for lookahead, and see what parameters or algorithms give the best matches. There is no kanji with 本 as a left-hand component, so it won't do that quite so much. If you draw the 本 very tall and thin you'll get other candidates before 本 though. I'm glad to hear it. BTW, could I request you to change your link to my site to the following please? http://kanji.sljfaq.org/kanji16/draw-canvas.html The Android browser should be able to cope with this, I think. At least one person came to the site via your link yesterday, then went to http://kanji.sljfaq.org/draw.html and drew one picture before leaving :). I don't know if it works on the phone. I know that the Nintendo DSi browser doesn't work with the web page, unfortunately, since the touch pen insists on scrolling. The reason I ask is that I think it would make sense for them to come to the nearest approximation to the thing they're seeing on their phone, rather than the kanji.sljfaq.org top page, which is basically something which I put there only for people who type the URL in. Actually, originally the top page was only there to carry the "privacy policy". There aren't any links to the top page from anywhere else on the site, and the draw-canvas page is a better jumping-off page for users. |
| Re: Handwritten kanji recognition as WS? | nick | 19/05/10 01:24 | 2010/5/19 Ben Bullock <benkasmi...@gmail.com>:
> On 19 May 2010 10:54, Nikolay Elenkov <nikolay...@gmail.com> wrote:Hm, screenshots are at the bottom, should be publicly visible. Here's a direct link: http://wwwjdic.googlecode.com/svn/trunk/site/screenshots/hkr-draw.png I see. The device in portrait mode kind of predisposes to writing it this way, or maybe it's just my fat fingers :) I did change it on the google code page, will update the app in the next release. The Android browser is supposed to support canvas, but it didn't work on my phone (Nexus One) -- I couldn't draw anything, because it tires to scroll. Might work if the page small enough to fit on the screen. Sure, we just have to figure out why it doesn't work. |
| Re: Handwritten kanji recognition as WS? | nick | 24/05/10 22:51 | On Wed, May 19, 2010 at 5:24 PM, Nikolay ElenkovI upgraded to Android 2.2 (Froyo), and it still doesn't work with the new browser . The flash ads work though :) |
| Re: Handwritten kanji recognition as WS? | Ben Bullock | 25/05/10 00:24 | > I upgraded to Android 2.2 (Froyo), and it still doesn't work with theI think it's OK if it doesn't work with the telephone, since the users can use your software instead. It's just to point out that the site exists for when they are not using the telephone. BTW there were 371 accesses via your software up to Saturday 20th, and since then there have been 148 accesses. |