Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Tesseract does not identify local words written in English
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Chathuri Gunawardhana  
View profile  
 More options Aug 9 2012, 11:32 am
From: Chathuri Gunawardhana <lanch.gunawardh...@gmail.com>
Date: Thu, 9 Aug 2012 08:32:17 -0700 (PDT)
Local: Thurs, Aug 9 2012 11:32 am
Subject: Tesseract does not identify local words written in English

 I need to extract cities in a Map. But for maps with some cities in Sri Lanka, tesseract fail to identify words correctly. Words include Matara, Galle, etc. But with same font sizes in foreign maps words are identified correctly.

Do I need to train tesseract for local words written in English like Matara, Galle? If so How can I do that?

Thanks a lot!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nick White  
View profile  
 More options Aug 9 2012, 12:34 pm
From: Nick White <nick.wh...@durham.ac.uk>
Date: Thu, 9 Aug 2012 17:34:59 +0100
Local: Thurs, Aug 9 2012 12:34 pm
Subject: Re: Tesseract does not identify local words written in English
On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana
wrote:

> Do I need to train tesseract for local words written in English
> like Matara, Galle? If so How can I do that?

Which version of tesseract are you using? If v2.x, follow the advise
here:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_o...

Otherwise, I think you have to unpack the .traineddata file, copy in
your word list, then repack. Something like this should work (from
your tessdata directory:

combine_tessdata -u eng.
cp /path/to/new/eng.user-words
combine_tessdata eng.

The new eng.traineddata will now include your words.

Hope this helps, and is clear enough.

Nick


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chathuri Gunawardhana  
View profile  
 More options Aug 10 2012, 12:43 pm
From: Chathuri Gunawardhana <lanch.gunawardh...@gmail.com>
Date: Fri, 10 Aug 2012 22:13:56 +0530
Local: Fri, Aug 10 2012 12:43 pm
Subject: Re: Tesseract does not identify local words written in English

Actually I'm using tesseract 3.02

On Fri, Aug 10, 2012 at 10:12 PM, Chathuri Gunawardhana <

lanch.gunawardh...@gmail.com> wrote:
> Dear sir,

> With your help I was able to unpack it. But in unpacked files there is no
> eng.user_words file. Can you please help me to fix it?

> Thanks!

--
Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chathuri Gunawardhana  
View profile  
 More options Aug 10 2012, 12:30 pm
From: Chathuri Gunawardhana <lanch.gunawardh...@gmail.com>
Date: Fri, 10 Aug 2012 22:00:42 +0530
Local: Fri, Aug 10 2012 12:30 pm
Subject: Re: Tesseract does not identify local words written in English

When I run combine_tessdata -u eng. I got an error saying Assert fail in
file (a c file) .Can you please help me to fix that?

Thanks a lot!

--
Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chathuri Gunawardhana  
View profile  
 More options Aug 10 2012, 11:15 pm
From: Chathuri Gunawardhana <lanch.gunawardh...@gmail.com>
Date: Sat, 11 Aug 2012 08:45:21 +0530
Local: Fri, Aug 10 2012 11:15 pm
Subject: Re: Tesseract does not identify local words written in English

Dear sir,
 I unpacked and repacked after adding these words as you said. But still it
didn't recognize these words. I added words to both userwords and freqwords
files. Any suggestions?

Thanks a lot!

On Fri, Aug 10, 2012 at 10:13 PM, Chathuri Gunawardhana <

--
Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chathuri Gunawardhana  
View profile  
 More options Aug 10 2012, 12:42 pm
From: Chathuri Gunawardhana <lanch.gunawardh...@gmail.com>
Date: Fri, 10 Aug 2012 22:12:29 +0530
Local: Fri, Aug 10 2012 12:42 pm
Subject: Re: Tesseract does not identify local words written in English

Dear sir,

With your help I was able to unpack it. But in unpacked files there is no
eng.user_words file. Can you please help me to fix it?

Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »