Convert image to text shows arrow instead of empty string

131 views
Skip to first unread message

AutobotRyszard

unread,
Oct 8, 2018, 10:25:34 AM10/8/18
to tesseract-ocr
After update to 4.0 version, blank(empty) img is converted to arrow instead of empty string.

Has anyone heard something about such problem and is able to help me:)?

Thank You in advance!

Zdenko Podobny

unread,
Oct 8, 2018, 12:49:13 PM10/8/18
to tesser...@googlegroups.com
Try to provide your input image, so somebody can test it ;-)

Zdenko


po 8. 10. 2018 o 16:25 AutobotRyszard <madzi...@gmail.com> napísal(a):
After update to 4.0 version, blank(empty) img is converted to arrow instead of empty string.

Has anyone heard something about such problem and is able to help me:)?

Thank You in advance!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/26542448-fe18-41b2-afd6-1b7bd5bf7336%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AutobotRyszard

unread,
Oct 9, 2018, 5:39:00 AM10/9/18
to tesseract-ocr
Hello:)
Generally it was an empty page and I want to see if orange message is not there.
It's checked using converting cutted image to text, If orange is not there, it should return empty message.
Before Update I had an empty message. Now it looks like arrow

Soumik Ranjan Dasgupta

unread,
Oct 11, 2018, 12:18:38 PM10/11/18
to tesser...@googlegroups.com
I tried to reproduce the error and it did not occur here. Could you be a bit more specific? What do you mean by "orange message"?
Just to clarify, I used tesseract image.jpg stdout and I got an empty string in return.

Magdalena Orzechowska

unread,
Oct 12, 2018, 5:05:47 AM10/12/18
to tesser...@googlegroups.com
Hello again:)
On 3.0.5 when I used command
tesseract C:/tmp\rip_message.png C:\automation\ext\ocr\out
I received empty file (out file), now for v4.0.0-rc2.20181008 the out file looks like in attachment.


czw., 11 paź 2018 o 18:18 Soumik Ranjan Dasgupta <srd...@cse.jgec.ac.in> napisał(a):
I tried to reproduce the error and it did not occur here. Could you be a bit more specific? What do you mean by "orange message"?
Just to clarify, I used tesseract image.jpg stdout and I got an empty string in return.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
out.txt
rip_message.png

Soumik Ranjan Dasgupta

unread,
Oct 14, 2018, 6:30:58 AM10/14/18
to tesser...@googlegroups.com
The image you provided does not have any text to perform OCR in the first place, of course you got an empty output file. What is the problem you are having?


For more options, visit https://groups.google.com/d/optout.


--
Regards,
Soumik Ranjan Dasgupta

Magdalena Orzechowska

unread,
Oct 15, 2018, 7:07:39 AM10/15/18
to tesser...@googlegroups.com
Actually when You open out.txt file in Notepad it's not empty. There is an arrow there. The same arrow appears in PyCharm output. Previously it was empty.

Soumik Ranjan Dasgupta

unread,
Oct 15, 2018, 7:15:39 AM10/15/18
to tesser...@googlegroups.com
I don't see any arrows opening it with gedit, just  a symbol.
I tried opening the file with python and reading the contents. Pasting the results below

>>> f = open("out.txt",'r')
>>> s = f.readline()
>>> s
'\x0c'

Let me know if this helps. Can anyone else confirm this?




For more options, visit https://groups.google.com/d/optout.

Zdenko Podobny

unread,
Oct 15, 2018, 7:28:02 AM10/15/18
to tesser...@googlegroups.com
it is page line separator or form feed. See https://en.wikipedia.org/wiki/Page_break#Form_feed

Zdenko


po 15. 10. 2018 o 13:15 Soumik Ranjan Dasgupta <srd...@cse.jgec.ac.in> napísal(a):

AutobotRyszard

unread,
Nov 6, 2018, 8:10:05 AM11/6/18
to tesseract-ocr
Actually it doesn't matter if it is a separator or arrow or other symbol. The problem is that there is any symbol. Output should be empty.
As it was in tesseract3.05

Zdenko Podobny

unread,
Nov 6, 2018, 8:30:15 AM11/6/18
to tesser...@googlegroups.com
It is up to you. 
Default setting is that page finish with page separator. Empty page is also page ;-)

Zdenko


ut 6. 11. 2018 o 14:10 AutobotRyszard <madzi...@gmail.com> napísal(a):

Shree Devi Kumar

unread,
Nov 6, 2018, 9:21:08 AM11/6/18
to tesser...@googlegroups.com
Probably you are referring to the form feed symbol which is the new default for page separator. You can change the setting by using the config variable. That will make it similar to 3.05. look in the FAQ page in wiki.

@stweil what about not outputting the page separator symbol if output is just a single page.

Zdenko Podobny

unread,
Nov 6, 2018, 11:26:47 AM11/6/18
to tesser...@googlegroups.com
4.0 is new major version with a lot of changes to 3.0x. So incompatibility is fine and expected. 4 code is here for 2 years. Release process (from beta3 to finale) took serious time and I asked several times on both forums (user and developers) for testings.

We should not change default behavior withing 4.0 version.

Zdenko


ut 6. 11. 2018 o 15:21 Shree Devi Kumar <shree...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages