Issue 1052 in tesseract-ocr: Dotted lines are converting text into random characters

222 views
Skip to first unread message

tesser...@googlecode.com

unread,
Dec 26, 2013, 10:51:18 AM12/26/13
to tesserac...@googlegroups.com
Status: New
Owner: ----

New issue 1052 by rkiran...@gmail.com: Dotted lines are converting text
into random characters
http://code.google.com/p/tesseract-ocr/issues/detail?id=1052

What steps will reproduce the problem?
1. Reading Tiff file with viet ocr to convert into text
2.
3.

What is the expected output? What do you see instead?
File is attached

What version of the product are you using? On what operating system?
3.02 testing with viet OCR on Windows 7

Please provide any additional information below.


Attachments:
Tesseract dotted line problem.docx 32.2 KB

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

tesser...@googlecode.com

unread,
Dec 27, 2013, 3:32:22 PM12/27/13
to tesserac...@googlegroups.com
Updates:
Status: WontFix

Comment #1 on issue 1052 by zde...@gmail.com: Dotted lines are converting
Dotted line is noise. Remove it[1]

[1]
https://code.google.com/p/tesseract-ocr/wiki/FAQ#Output_without_result_or_bad_output

tesser...@googlecode.com

unread,
Jan 2, 2014, 11:43:39 AM1/2/14
to tesserac...@googlegroups.com

Comment #2 on issue 1052 by rkiran...@gmail.com: Dotted lines are
converting text into random characters
http://code.google.com/p/tesseract-ocr/issues/detail?id=1052

I removed the noise, now the results are better with psm 6 but I still have
issues . The numbers are read accurately with psm 6 but they are totally
ignored with psm 4.
Some characters are accurately read with psm 4 but not with psm 6. Overall,
I am noticing that psm 6 reads much better. Any advice?

tesser...@googlecode.com

unread,
Jan 2, 2014, 12:20:17 PM1/2/14
to tesserac...@googlegroups.com

Comment #3 on issue 1052 by rkiran...@gmail.com: Dotted lines are
converting text into random characters
http://code.google.com/p/tesseract-ocr/issues/detail?id=1052

correction. I meant psm 3 not psm 4 in the above comment.
Reply all
Reply to author
Forward
0 new messages