Tessnet2 DLL with C#.NET 2.0, bad OCR results

788 views
Skip to first unread message

benj588

unread,
Oct 9, 2008, 12:51:19 PM10/9/08
to tesseract-ocr
Hello! I'm very interested in using Tesseract with C# and have
downloaded and started playing with the Tessnet DLL for .NET. I've
followed all of the instructions for placing the tessdata folder and
files under the folder from which my EXE is running (currently is
'ProjectName\bin\Debug\project.exe'). I'm developing in Visual Studio
2005. I have a number of bitmap images that I've created that are
just screenshots of messageboxes from my system. I was using these to
test with and am getting very bad results. In some cases I get 0
results, and in others I'll get 2 or 3 but they don't match any text I
see in the image. These images are just a normal Windows MessageBox
that have message text and an OK button. I don;t have any crazy fonts
or system colors, just the default for Windows. My system is running
Windows XP Pro SP3. I've gone over my configuration a million times
(it seems) and searched the forums to see if there are any settings I
can tweak or known issues that I may need to account for, but no luck
as of yet. I've tried converting the bitmaps to gif and tif in
MSPaint, but I had identical results regardless of the image format;
which to me is a good thing that it is consistent. I appreciate the
time that has been put in to this project and would greatly appreciate
any assistance that can be provided. I have the following code in a
form and can provide a copy of the images, if needed; however I may
need a pointer to where they can be hosted.

Bitmap img = (Bitmap)Bitmap.FromFile(fileName);
Tesseract ocr = new Tesseract();
ocr.Init("eng", false);
List<Word> results = ocr.DoOCR(img, Rectangle.Empty);
string hold = String.Empty;
foreach (Word word in results)
{
hold += "Word: " + word.Text + "(" +
word.Confidence.ToString() + ")" + Environment.NewLine;
}
MessageBox.Show(hold);

Thanks!

Ngu Soon Hui

unread,
Oct 10, 2008, 9:14:37 AM10/10/08
to tesseract-ocr
I see, did you use the euro.txt that comes along with the Tesseract
and does it work well?

Benjamin Fair

unread,
Oct 10, 2008, 11:53:07 AM10/10/08
to tesser...@googlegroups.com
Hi, thanks for the response.  No, I don't have a euro.txt file; I'm assuming it would go into the Tessdata directory.  Where would I get this file and what do I do with it once I have it?

Thanks,
Ben Fair


> Date: Fri, 10 Oct 2008 06:14:37 -0700
> Subject: Re: Tessnet2 DLL with C#.NET 2.0, bad OCR results
> From: soonh...@gmail.com
> To: tesser...@googlegroups.com

benj588

unread,
Oct 14, 2008, 3:01:22 PM10/14/08
to tesseract-ocr
Thanks for the reply Ngu, but I don't know where to find the euro.txt
file. Can you tell me where to get it and what to do with it?

On Oct 10, 10:53 am, Benjamin Fair <benj...@hotmail.com> wrote:
> Hi, thanks for the response.  No, I don't have a euro.txt file; I'm assuming it would go into the Tessdata directory.  Where would I get this file and what do I do with it once I have it?Thanks, Ben Fair
>
> > Date: Fri, 10 Oct 2008 06:14:37 -0700> Subject: Re: Tessnet2 DLL with C#.NET 2.0, bad OCR results> From: soonhui....@gmail.com> To: tesser...@googlegroups.com> > > I see, did you use the euro.txt that comes along with the Tesseract> and does it work well?> > On Oct 10, 12:51 am, benj588 <benj...@hotmail.com> wrote:> > Hello! I'm very interested in using Tesseract with C# and have> > downloaded and started playing with the Tessnet DLL for .NET.  I've> > followed all of the instructions for placing the tessdata folder and> > files under the folder from which my EXE is running (currently is> > 'ProjectName\bin\Debug\project.exe').  I'm developing in Visual Studio> > 2005.  I have a number of bitmap images that I've created that are> > just screenshots of messageboxes from my system.  I was using these to> > test with and am getting very bad results.  In some cases I get 0> > results, and in others I'll get 2 or 3 but they don't match any text I> > see in the image.  These images are just a normal Windows MessageBox> > that have message text and an OK button.  I don;t have any crazy fonts> > or system colors, just the default for Windows.  My system is running> > Windows XP Pro SP3.  I've gone over my configuration a million times> > (it seems) and searched the forums to see if there are any settings I> > can tweak or known issues that I may need to account for, but no luck> > as of yet.  I've tried converting the bitmaps to gif and tif in> > MSPaint, but I had identical results regardless of the image format;> > which to me is a good thing that it is consistent.  I appreciate the> > time that has been put in to this project and would greatly appreciate> > any assistance that can be provided.  I have the following code in a> > form and can provide a copy of the images, if needed; however I may> > need a pointer to where they can be hosted.> >> >             Bitmap img = (Bitmap)Bitmap.FromFile(fileName);> >             Tesseract ocr = new Tesseract();> >             ocr.Init("eng", false);> >             List<Word> results = ocr.DoOCR(img, Rectangle.Empty);> >             string hold = String.Empty;> >             foreach (Word word in results)> >             {> >                 hold += "Word: " + word.Text + "(" +> > word.Confidence.ToString() + ")" + Environment.NewLine;> >             }> >             MessageBox.Show(hold);> >> > Thanks!> _________________________________________________________________
>
> Get more out of the Web. Learn 10 hidden secrets of Windows Live.http://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog...

Ngu Soon Hui

unread,
Oct 15, 2008, 11:12:33 AM10/15/08
to tesseract-ocr
HI ben,

I sent you an email with the eurotext.tif attachment.

benj588

unread,
Oct 16, 2008, 12:45:12 PM10/16/08
to tesseract-ocr
Ngu, thanks for the file. OK, using the eurotext.tif image it
correctly found all of the words. So, it seems there is some problem
in the way I created the images I was using. Thanks!
> > > Get more out of the Web. Learn 10 hidden secrets of Windows Live.http://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog...- Hide quoted text -
>
> - Show quoted text -

nguyenq

unread,
Oct 18, 2008, 8:43:52 AM10/18/08
to tesseract-ocr
I read somewhere that screenshots would not be good enough for any OCR
because of their low resolutions, 96 DPI I think. Try images with 200
DPI or higher.

Ray Smith

unread,
Oct 18, 2008, 3:13:22 PM10/18/08
to tesser...@googlegroups.com
Also see the FAQ in the wiki pages.
Ray.
Reply all
Reply to author
Forward
0 new messages