On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana
wrote:
> Do I need to train tesseract for local words written in English
> like Matara, Galle? If so How can I do that?
Which version of tesseract are you using? If v2.x, follow the advise
here:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary?
Otherwise, I think you have to unpack the .traineddata file, copy in
your word list, then repack. Something like this should work (from
your tessdata directory:
combine_tessdata -u eng.
cp /path/to/new/eng.user-words
combine_tessdata eng.
The new eng.traineddata will now include your words.
Hope this helps, and is clear enough.
Nick