I have created a package with Brazilian Portuguese data. I do not know
if it is the best approach, but I have 300 words in freq-dawg and
260,000 words in the full dictionary. Is it a appropriated ratio ?
As soon as I get information in how to get the job done as good as
possible, I would like to send it to the official distribution. How
could I proceed ?
P.S.: the post about Tesseract-OCR in my blog, in portuguese, is by
far the most read post. I just would like to say thanks by giving
something back to the project.
If you just send the data to me, I will add it to the downloads.
Thanks,
Ray.
Cheers, Jayme
You can get a preliminary version in
http://profs.if.uff.br/tjpp/blog/entradas/ocr-de-qualidade-no-linux
(the package and instructions on how to I build it are described). I
assume you can read Portuguese :)
--
Thadeu Penna
Prof.Associado - Instituto de Física
Universidade Federal Fluminense
http://profs.if.uff.br/tjpp/blog