tesseract 4 skips over some text

66 views
Skip to first unread message

Chris Hawley

unread,
Jul 18, 2017, 4:34:43 PM7/18/17
to tesseract-ocr
The file that i am running OCR on

https://drive.google.com/file/d/0B-iKKP8eIvdgZkhObUVXUVJ1N28/view?usp=sharing

Before anyone asks, it's part of the CIA's Crest Dataset. I noticed tesseract seems to skip over some text. The command that I am using is 

E:\Tesseract\build\bin\Release\tesseract.exe --psm 1 --oem 1  "D:\split\Folder 001\1946-06-21.tiff" test.txt 

The output is 

21 June 1946

MEMORANDUM For SUPERVISING AGENT,
U. S. SECRET SERVICE,
WHITE Hous®.

 

1. - It is requested that a White House pass be issued to
Lieutenant General Hoyt S. VANDENBERG, Director of Central Intel-

ligence.

 

2. - In connection with his official duties, it is necessary
for General Vandenberg to visit the White House frequently,.

 

 

 

3% His physical description is:

Height =-- 6 feet.
Hair «-- _ @FAY ,
Eyes -- _- blue.

Enclosed herewith is his photograph.

THOMAS F, CULLEN
Captain, USNR
Asgistant to the Director.

 

if you notice, it skips over the "weight -- 165 lbs" line. I wasn't sure if this qualified as a bug. Is there anything that I can do to improve the results so that line is included?

ShreeDevi Kumar

unread,
Jul 18, 2017, 11:25:03 PM7/18/17
to tesser...@googlegroups.com

You can try changing those constants to see if you get any improvement.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages