I need to extract cities in a Map. But for maps with some cities in Sri Lanka, tesseract fail to identify words correctly. Words include Matara, Galle, etc. But with same font sizes in foreign maps words are identified correctly.
Do I need to train tesseract for local words written in English like Matara, Galle? If so How can I do that?
Otherwise, I think you have to unpack the .traineddata file, copy in
your word list, then repack. Something like this should work (from
your tessdata directory:
On Thu, Aug 9, 2012 at 10:04 PM, Nick White <nick.wh...@durham.ac.uk> wrote:
> On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana
> wrote:
> > Do I need to train tesseract for local words written in English
> > like Matara, Galle? If so How can I do that?
> Which version of tesseract are you using? If v2.x, follow the advise
> here:
> Otherwise, I think you have to unpack the .traineddata file, copy in
> your word list, then repack. Something like this should work (from
> your tessdata directory:
> The new eng.traineddata will now include your words.
> Hope this helps, and is clear enough.
> Nick
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscribe@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
-- Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka
Dear sir,
I unpacked and repacked after adding these words as you said. But still it
didn't recognize these words. I added words to both userwords and freqwords
files. Any suggestions?
Thanks a lot!
On Fri, Aug 10, 2012 at 10:13 PM, Chathuri Gunawardhana <