On Jan 17, 9:43 pm, 74yrs old <withblessi...@gmail.com> wrote:
> Dear Debayan Banerjee,
>
> I followed every steps mentioned in your blog on the above subject - to
> test for kannada.
>
> *extract of ubuntu's terminal is reproduced below for your information.*
> sriranga@ubuntu:~$ cd tesseractindic-0.2/
> sriranga@ubuntu:~/tesseractindic-0.2$ wordlist2dawg test.txt dawg
> Building DAWG from word list in file, 'test.txt'
> Compacting the DAWG
> Compacting node from 9990280 to 1000234 (2)
> 100 nodes reduced
> Writing squished DAWG file, 'dawg'
> 118 nodes in DAWG
> 118 edges in DAWG
> sriranga@ubuntu:~/tesseractindic-0.2$ sudo cp dawg
> /usr/local/share/tessdata/utf.
> [sudo] password for sriranga:
> sriranga@ubuntu:~/tesseractindic-0.2$ sudo cp dawg
> /usr/local/share/tessdata/utf.freq-dawg
> sriranga@ubuntu:~/tesseractindic-0.2$ sudo cp dawg
> /usr/local/share/tessdata/utf.word-dawg
> sriranga@ubuntu:~/tesseractindic-0.2$ tesseract sampletif.tif test1 -l utf
> Tesseract Open Source OCR Engine
> Image has 8 * 3 bits per pixel, and size (800,600)
> Resolution=300
> sriranga@ubuntu:~/tesseractindic-0.2$ cat test1.txt
> *ಕೆಫ}॥ಡೆಹಘಃಕೆಲಿಯರಿ*
>
> sriranga@ubuntu:~/tesseractindic-0.2$ echo '*ಕ ನ್ನ ಡ ವ ನ್ನು* *ಕ ಲಿ ಯಿ *
> ರಿ'>list
> sriranga@ubuntu:~/tesseractindic-0.2$ cat list
> *ಕ ನ್ನ ಡ ವ ನ್ನು* *ಕ ಲಿ ಯಿ ರಿ*
> From the above it could seen that out of *ಕೆಫ}॥ಡೆಹಘಃ ಕೆಲಿಯರಿ* *ಕೆಫ}॥ಡೆಹಘಃ=wrong
> */ *ಕೆಲಿಯರಿ= Ok except **ಕೆ should be **ಕ.
> Based on above,* It is felt that your logic about Dictionary will work( 50%)
> for Indic, if relevant codes of tesseract are improved by conducting
> similar experiments on different indic languages. Anyhow I appreciate your
> wonderful logic/idea.
> Awaiting your post on further research= *"** I intend to analyse the output
> and pinpoint the problem in the next post. In this post, lets concentrate on
> the results."*
>
> With Regards,
> -sriranga(77yrsold)
> *
> *
>
> sample.txt.txt
> < 1KViewDownload
>
> sampletif.tif
> 1900KViewDownload
>
> test.txt
> < 1KViewDownload
>
> test1.txt
> < 1KViewDownload
>
> list
> < 1KViewDownload
>
> -lutf.txt
> < 1KViewDownload
Dear Nishad,
Sorry for disturbing you. fwd for your information about the research on dictionary made by Deepayan - it may be useful in your professional work on OCR.
With regards,
-sriranga(77yrsold)