OCR text problem using -psm 6 hocr

97 views
Skip to first unread message

Gunasekaran Velu

unread,
Jan 28, 2016, 4:24:26 AM1/28/16
to tesseract-ocr
Hi

I am using following tesseract command to do the HOCR file for bmp image

>tesseract.exe "test.bmp" Test -l eng -psm 6 hocr

Marked area in the attached image does not come in to Test.hocr.html file. except marked image all other text are available in hocr file.

Does anything wrong in my command ling arguments.

Do the needful.


Regards
Guna
Test.bmp

Tom Morris

unread,
Jan 28, 2016, 12:35:28 PM1/28/16
to tesseract-ocr
If you run tesseract with no arguments, you'll get its help message which explains what all the arguments and flags do.

For PSM, it says:

   6 = Assume a single uniform block of text.

That pages doesn't look like a single uniform block of text to me.

Tom

Gunasekaran Velu

unread,
Jan 29, 2016, 4:58:49 AM1/29/16
to tesseract-ocr
Hi Tom

Thanks for your reply.

If i use >tesseract.exe "test.bmp" Test -l eng -psm 1 hocr, its working for attached file but it won't work for other files(Majority files). 

Does any method for to get the full OCR text for attached files and also other files too.?

Looking forward your reply.


Regards
Guna
Reply all
Reply to author
Forward
0 new messages