Re: Image to txt wrong output / C# eng.unicharset issue

480 views

Skip to first unread message

Quan Nguyen

unread,

Mar 21, 2013, 10:57:01 PM3/21/13

to tesser...@googlegroups.com

tessnet2 is Tesseract 2.04-based .NET wrapper while you're using Tesseract 3.0x language data. They are not compatible.

On Tuesday, March 19, 2013 10:45:12 AM UTC-5, Micael Leal wrote:

Hello,

After installing tesseract

http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-setup-3.02.02.exe&can=2&q=

http://code.google.com/p/tesseract-ocr/wiki/ReadMe

I try to launch:

C:\Program Files (x86)\Tesseract-OCR>tesseract C:\Users\admin\AppData\Local\Temp\Untitled.png C:\Users\admin\AppData\Local\Temp\out

I get : Tesseract Open Source OCR Engine v3.02 with Leptonica

Seems all went fine, but

I tried to write "Test" in a white background and dark text "Test" in English for my image file.

http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.eng.tar.gz&can=2&q=

I get as output in out.txt -> rssr instead of "Test".

I tried to write "My variable" in my image file and get "My vaname" as output.

Is it me or did I missed something?

Next issue, I want to do some manipulations with C# in VSO.

Bitmap image = new Bitmap(@"C:\Users\admin\AppData\Local\Temp\image.bmp");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(@"c:\temp", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);

ocr.Init(@"c:\temp", "eng", false); // To use correct tessdata

Doesn't work. I have tokensz.exe/tokensz.htm/usertokenestimate.ps1 inside that C:\temp. I tried to do instead C:\temp, I did C:/Program Files (x86)/Tesseract-OCR/ where is my tessdata folder located.

I get following error:

Unable to load unicharset file C:/Program Files (x86)/Tesseract-OCR/\eng.unicharset

http://code.google.com/p/tesseract-ocr/wiki/FAQ#Can%27t_open_eng.unicharset?

But I installed correctly all tessdata, because in C:/Program Files (x86)/Tesseract-OCR/tessdata are following files:
eng.cube.bigrams,eng.cube.fold,eng.cube.lm,eng.cube.lm_,eng.cube.nn,eng.cube.params,eng.cube.size
eng.cube.word-freq,eng.tesseract_cube.nn,osd.traineddata,tessconfigs

I hope those informations will help to find out the issue.

Thanks,
M.Leal

Reply all

Reply to author

Forward

0 new messages