Different output on same text picture sometimes

164 views
Skip to first unread message

Gokcer Gunes

unread,
Jan 5, 2015, 1:16:12 PM1/5/15
to tesser...@googlegroups.com

Examples is at above,what i am doing wrong?

Gokcer Gunes

unread,
Jan 5, 2015, 1:36:19 PM1/5/15
to tesser...@googlegroups.com
Btw im using charles's wrapper 3.02 tesseract

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/27dbe2d0-3183-4f05-910a-eb612f29d40f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Allistair C

unread,
Jan 7, 2015, 4:56:49 PM1/7/15
to tesser...@googlegroups.com
Your question is not self-evident, what are you trying to ask? Can you present your OCR results for each test you are conducting?

Gokcer Gunes

unread,
Jan 7, 2015, 5:29:27 PM1/7/15
to tesser...@googlegroups.com
yeah resul pictures are in message you cant see them?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.

Gokcer Gunes

unread,
Jan 7, 2015, 5:48:26 PM1/7/15
to tesser...@googlegroups.com
i uploaded them as pictures 

Allistair

unread,
Jan 7, 2015, 6:01:16 PM1/7/15
to tesser...@googlegroups.com
I see the pictures, yes, but I don't understand the question. Can you elaborate in detail what the issue is?

Gokcer Gunes

unread,
Jan 7, 2015, 6:05:08 PM1/7/15
to tesser...@googlegroups.com
they are input image and output result ,inputs have same first line but in first output M is correctly converted to M but in 2. output M is converted to H

Allistair C

unread,
Jan 7, 2015, 6:21:46 PM1/7/15
to tesser...@googlegroups.com
Ah, I see - interesting :) The 2nd example isn't quite the same - it seems to have some noise on the left hand edge. Also, what PSM are you using? Can you send original size images you send to Tesseract?

Gokcer Gunes

unread,
Jan 7, 2015, 6:24:32 PM1/7/15
to tesser...@googlegroups.com
ah no its not noise there is no noise in original img it just result of crop in paint


Gokcer Gunes

unread,
Jan 7, 2015, 6:24:54 PM1/7/15
to tesser...@googlegroups.com
yes i will get and upload them asap

Gokcer Gunes

unread,
Jan 7, 2015, 7:01:55 PM1/7/15
to tesser...@googlegroups.com
First original Image which sended to tesseract
Satır içi resim 1
Second Original Image which sended to tesseract

Satır içi resim 2
and outputs are in image which you already saw

Allistair

unread,
Jan 8, 2015, 7:31:20 AM1/8/15
to tesser...@googlegroups.com
Well, I must say I am unsure why one image works for that line and not the other. Clearly the addition of other lines of text is causing Tesseract's classification system to go slightly awry - must be learning something with the addition of the new text that it feeds back to the 1st and gets it wrong.

However, I was easily able to get a perfect read on both of them by performing a simple threshold filter (turning all text dark black) before providing to Tesseract.

Gokcer Gunes

unread,
Jan 8, 2015, 8:47:34 AM1/8/15
to tesser...@googlegroups.com
i tried with threshold filters too but i didnt get good results,best result for me was this way.Can i ask what threshould number you used for this images.

Allistair

unread,
Jan 8, 2015, 9:16:42 AM1/8/15
to tesser...@googlegroups.com
This is the image I am providing to Tesseract


It's your image with a quick threshold performed in Photoshop. Notice that you need to ensure the threshold makes all text black.

Allistair

unread,
Jan 8, 2015, 9:17:15 AM1/8/15
to tesser...@googlegroups.com
Oh, I also scaled the image up to 2000px wide. Larger images often work better with Tesseract so a pre-filter or sampling up can help.

Gokcer Gunes

unread,
Jan 8, 2015, 9:22:25 AM1/8/15
to tesser...@googlegroups.com
Yeah with more size it works better and better but it takes more time also thats why im only scaling 2x around.And can i ask did you try with only threshold or threshold+scaling

Gokcer Gunes

unread,
Jan 8, 2015, 10:16:57 AM1/8/15
to tesser...@googlegroups.com
And also can i ask if tesseract can detect point between lines,so i can crop lines and send each line as different picture to tesseract?

Allistair

unread,
Jan 8, 2015, 10:18:03 AM1/8/15
to tesser...@googlegroups.com
It works for me with just the threshold and no scaling (however, there is an error in the 2nd Kaliko where the l is misrecognised as a 1):


---

Memorystream memstream = new MemoryStream();

bitmap.Save(memstream, System.Drawing.Imaging.ImageFormat.Png);

Kalikolmage imgklk = new Ka1ikoImage(memstream);


Gokcer Gunes

unread,
Jan 8, 2015, 10:24:12 AM1/8/15
to tesser...@googlegroups.com
and also why is that? i mean at 3. line there is both  KalikoImage which is same,but first is correctly translated to text and second one is with one mistake

Allistair

unread,
Jan 8, 2015, 10:57:22 AM1/8/15
to tesser...@googlegroups.com
:) I hear you, I do. From the input image perspective I looked at the 2 Kaliko words and they match at the pixel level. Some part of Tesseract's internal classification system is falling over a bit here, I cannot tell you why, I can only say that OCR is not a perfect technology. It works on machine learning and classification of probabilities. What seems simple to us is not necessarily so to the OCR engine. It could be that by the time it gets to the 2nd Kaliko it thinks it has improved the 'l' recognition from the previous text to be a '1' - Tesseract has this adaptive classifier - but I am only guessing, maybe others have better ideas.

Gokcer Gunes

unread,
Jan 8, 2015, 11:02:46 AM1/8/15
to tesser...@googlegroups.com
And i did threshold too and i got same results with you for this inputs ,but for some other inputs threshold causes them to recognize wrong such as * recognizing as ' .My images recognizing great for single line and with 1-2 mistake for multiple lines so i need to solve that.Thanks for great response and i will be very happy if you can return me ever got idea what causes that problem:).And also a last thing  can i detect points between lines with tesseract to seperate image into multiple single line image?

Allistair

unread,
Jan 8, 2015, 11:10:21 AM1/8/15
to tesser...@googlegroups.com
No problem. Not sure I understand what you mean by detect points - a point to me is an (x, y) coordinate. Do you mean space? In any case, Tesseract is not going to do anything like that I think - you will need to perform image preprocessing to do that using Open CV or something.

Gokcer Gunes

unread,
Jan 8, 2015, 11:22:39 AM1/8/15
to tesser...@googlegroups.com
yes i mean yes like any point in space between lines,so i can say say "crop area that between y1 and y2 points".Because i think tesseract able to detect lines ,so i thought i can get mininum bottom point of first line words and top point of 2. line words

Gokcer Gunes

unread,
Jan 8, 2015, 5:16:20 PM1/8/15
to tesser...@googlegroups.com
So not possible with tesseract?

Allistair C

unread,
Jan 8, 2015, 5:21:43 PM1/8/15
to tesser...@googlegroups.com
No they best you could do is look at one of the many config params to see if you can influence the engine, e.g disable adaptive classifier (not sure if that's possible or even beneficial) but whatever is causing your issue might be targetable via config but the config is not very well documented . If you wanted to feed line by line you would need to do that outside of tesseract and may be easier than the config method.

Sent from my iPhone
<temppng.png>
Second Original Image which sended to tesseract

<temppng.png>

Gokcer Gunes

unread,
Jan 8, 2015, 5:37:12 PM1/8/15
to tesser...@googlegroups.com
Okay understand ,thanks for great help and have a nice day:)

Gokcer Gunes

unread,
Jan 12, 2015, 5:05:39 AM1/12/15
to tesser...@googlegroups.com
hi me again, is there any other forum  i can ask about turning off that adaptive modifier? i mean people who deal with config parameters etc

Allistair

unread,
Jan 12, 2015, 5:17:47 AM1/12/15
to tesser...@googlegroups.com
Try turning it off to see if anything useful happens ...

classify_enable_learning 0
classify_enable_adaptive_matcher 0

Gokcer Gunes

unread,
Jan 12, 2015, 5:22:22 AM1/12/15
to tesser...@googlegroups.com
im using c# winform  application,can i ask how to do that command in c#:)

Allistair

unread,
Jan 12, 2015, 5:25:48 AM1/12/15
to tesser...@googlegroups.com
Look up your platform's way of calling SetVariable.

Gokcer Gunes

unread,
Jan 12, 2015, 7:03:51 AM1/12/15
to tesser...@googlegroups.com
finaly i was able to change it but no luck even turning off:( ,thanks for your time:) and if i send mail to raysmith about this problem, any chance will he reply?

Allistair

unread,
Jan 12, 2015, 7:12:19 AM1/12/15
to tesser...@googlegroups.com

Gokcer Gunes

unread,
Jan 12, 2015, 7:23:17 AM1/12/15
to tesser...@googlegroups.com
okay thanks again ,then i will play with some parameters to see any changes by any luck:) have a nice day:)

Reply all
Reply to author
Forward
0 new messages