Re: Tesseract does not identify local words written in English

Nick White

unread,

Aug 9, 2012, 12:34:59 PM8/9/12

to tesser...@googlegroups.com

On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana
wrote:
> Do I need to train tesseract for local words written in English
> like Matara, Galle? If so How can I do that?

Which version of tesseract are you using? If v2.x, follow the advise
here:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary?

Otherwise, I think you have to unpack the .traineddata file, copy in
your word list, then repack. Something like this should work (from
your tessdata directory:

combine_tessdata -u eng.
cp /path/to/new/eng.user-words
combine_tessdata eng.

The new eng.traineddata will now include your words.

Hope this helps, and is clear enough.

Nick

Chathuri Gunawardhana

unread,

Aug 10, 2012, 12:43:56 PM8/10/12

to tesser...@googlegroups.com

Actually I'm using tesseract 3.02

On Fri, Aug 10, 2012 at 10:12 PM, Chathuri Gunawardhana <lanch.gun...@gmail.com> wrote:

Dear sir,

With your help I was able to unpack it. But in unpacked files there is no eng.user_words file. Can you please help me to fix it?

Thanks!

--
Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka

Chathuri Gunawardhana

unread,

Aug 10, 2012, 12:30:42 PM8/10/12

to tesser...@googlegroups.com

When I run combine_tessdata -u eng. I got an error saying Assert fail in file (a c file) .Can you please help me to fix that?

Thanks a lot!

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Chathuri Gunawardhana

unread,

Aug 10, 2012, 11:15:21 PM8/10/12

to tesser...@googlegroups.com

Dear sir,
I unpacked and repacked after adding these words as you said. But still it didn't recognize these words. I added words to both userwords and freqwords files. Any suggestions?

Thanks a lot!

Chathuri Gunawardhana

unread,

Aug 10, 2012, 12:42:29 PM8/10/12

to tesser...@googlegroups.com

Dear sir,

With your help I was able to unpack it. But in unpacked files there is no eng.user_words file. Can you please help me to fix it?

Thanks!

Reply all

Reply to author

Forward