How to use osd?

8,015 views
Skip to first unread message

ogorman

unread,
Jun 22, 2011, 12:24:08 AM6/22/11
to tesseract-ocr
I have tesseract 3.0.1 installed and am trying to read some faxes that
are upside down. I have tried to use the new osd feature to detect
the page is upside down but for the life of me I cant get osd training
data to produce anything but garbage data. If i use imagemagick to
rotate image 180 degrees it does read the document flawlessly.

thanks

Matt

patrickq

unread,
Jun 22, 2011, 3:08:23 AM6/22/11
to tesseract-ocr
DetectOS is working quite well actually, in about 95% of the cases in
my experience. Care to share the images in question? You can also play
with it directly with ScanBizCards: if you have an iPhone, iPad, iPod
Touch or Android device just install it and use its import from saved
images function. The first thing it does is rotate the image so that
text is upright, using the Tess 3.01 DetectOS.

Note that DetectOS declares failure when there isn't enough text to
make a call, I haven't looked at the source code but I'd say a minimum
of 10 words is expected.

Patrick

ogorman

unread,
Jun 22, 2011, 4:34:03 AM6/22/11
to tesseract-ocr
sorry if this a double post i think google ate my last reply.

The image is here http://home.rldn.net/test.tif If I rotate the image
by hand the ocr gets the majority of the text. I imagine I am not
enabling osd on it some how. Also it messes up on all ten pages of
the fax not just cover page, the rest of the pages are just walls of
text. Also is there a way to turn on a log of the tesseract cli
program So that I can see that it rotated an image for example or
what it thought success and sureness of words was?

Thanks for the fast reply.

Matt

ogorman

unread,
Jun 22, 2011, 4:26:52 AM6/22/11
to tesseract-ocr
Thank you for the quick reply. The file is 1 meg so id rather not
spam whole list with it, it is locate at http://home.rldn.net/test.tif
The only alteration is the personal information was blacked out.
Before that happened it was just some hand written text there. I have
also tried altering that image so the very top line that is right side
up was removed. If you rotate it 180 degrees it will be read with a
very high success rate. Also is there a way with the cli tool to have
tesseract tell me that the image was upside down? that way i could
fix the image as well as having the correct ocr text for my users.

patrickq

unread,
Jun 22, 2011, 7:48:08 AM6/22/11
to tesseract-ocr
I tested it via ScanBizCards and Indeed OSD has no issues whatsover
getting it right - there is 10 times the amount of text it needs and
the image is very sharp, it's guaranteed to get it right. I am not
familiar with the command-line tools however so I can't help, I'll
just say that it should be very easy to write your own little utility
making a call to DetectOS.

Another easy solution: why don't you run Tesseract twice, first on the
original image then on the image rotated 180 degree? I assume you only
need these two possibilities because it's a FAX hence page size is
taller than it is wide. Then pick the one that yields the most
sensible text and the least gibberish characters.

Patrick

ogorman

unread,
Jun 22, 2011, 9:05:38 AM6/22/11
to tesseract-ocr
On Jun 22, 6:48 am, patrickq <patrick.questemb...@gmail.com> wrote:
> I tested it via ScanBizCards and Indeed OSD has no issues whatsover
> getting it right - there is 10 times the amount of text it needs and
> the image is very sharp, it's guaranteed to get it right. I am not
> familiar with the command-line tools however so I can't help, I'll
> just say that it should be very easy to write your own little utility
> making a call to DetectOS.
>
> Another easy solution: why don't you run Tesseract twice, first on the
> original image then on the image rotated 180 degree? I assume you only
> need these two possibilities because it's a FAX hence page size is
> taller than it is wide. Then pick the one that yields the most
> sensible text and the least gibberish characters.

That is my current method. It just has produced some edge cases where
there isnt text like a graph per say and either side produces same
amount of false positive noise. In those cases I just keep it the
same way it came in. But was hoping for a more efficient method. I
am glad the software works though I guess i might need to invest time
in building a tool to detect orientation using tesseract.

Dmitri Silaev

unread,
Jun 23, 2011, 12:35:29 AM6/23/11
to tesser...@googlegroups.com, ogo...@gmail.com
This is an interesting case. If you take a closer look to the image
you've shown us, you can notice a small text line at the top of the
fax page - a fax header line - which is upright in contrast to other
text in this document. This very text line fools Tesseract's
orientation detection algo. If you crop the image to exclude this
line, everything goes alright.

I used the following command line:
tesseract test_osd_cr.tif test_osd -psm 1

"-psm 1" stands for "Use automatic page segmentation with orientation
and script detection. (OSD)"

I used a copy of "eng.traineddata" as "osd.traineddata"

HTH

Warm regards,
Dmitri Silaev
www.CustomOCR.com

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

patrickq

unread,
Jun 23, 2011, 3:47:16 AM6/23/11
to tesseract-ocr
Hummm:
1. It would make little sense for Tesseract to get it wrong because of
so little text oriented wrongly, while all the rest of the text points
in another direction (although Tess certainly does stupid things
sometimes)
2. At least within ScanBizCards (running Tess 3.01), DetectOS DOES
work properly, test for yourself on Android or iPhone

On Jun 23, 12:35 am, Dmitri Silaev <daemons2...@gmail.com> wrote:
> This is an interesting case. If you take a closer look to the image
> you've shown us, you can notice a small text line at the top of the
> fax page - a fax header line - which is upright in contrast to other
> text in this document. This very text line fools Tesseract's
> orientation detection algo. If you crop the image to exclude this
> line, everything goes alright.
>
> I used the following command line:
> tesseract test_osd_cr.tif test_osd -psm 1
>
> "-psm 1" stands for "Use automatic page segmentation with orientation
> and script detection. (OSD)"
>
> I used a copy of "eng.traineddata" as "osd.traineddata"
>
> HTH
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>

ogorman

unread,
Jun 23, 2011, 9:17:28 AM6/23/11
to tesseract-ocr

> I used the following command line:
> tesseract test_osd_cr.tif test_osd -psm 1
>
> "-psm 1" stands for "Use automatic page segmentation with orientation
> and script detection. (OSD)"
>
> I used a copy of "eng.traineddata" as "osd.traineddata"
>
Thanks Dmitri for the advice. I had seen this as well and tested
cropping the image to not include the fax page number. I also just
tried moving the osd data around like you suggested. Either way when
i run the tesseract command on the image it just comes back with
garbage, where as if i rotate it 180 degrees with imagemagick i get
99% success rate. I am using a version I pulled out of svn 3 days ago
it reports back tesseract-3.01 is there anything I am missing to get
the results you did? Once again thanks for your help.

Matt

ogorman

unread,
Jun 23, 2011, 11:03:12 AM6/23/11
to tesseract-ocr
Also I tried changing the baseapi.cpp in the tesseract cli program in
ProcessPage function after the SetImage line i did the following
/// MY CHANGES
OSResults *orientationStruct = new OSResults();
bool gotOrientation = this->DetectOS(orientationStruct);
int bestOrientation = -1;
float bestOrientationScore = 0;
if ((gotOrientation) && (orientationStruct->orientations != NULL)) {
for (int i=0; i<4; i++) {
printf("i tried what does %d and %d\n",
bestOrientationScore,orientationStruct->orientations[i]);
if (orientationStruct->orientations[i] > bestOrientationScore) {
printf("how am i never called\n");
bestOrientation = i;
bestOrientationScore = orientationStruct->orientations[i];
}
}
}
printf("orientation %d\n", bestOrientation);
///MY CHANGES

and i always get something like this
i tried what does -1274961920 and 0
i tried what does 0 and 1
i tried what does 0 and 2
i tried what does 0 and 3
orientation -1
with the image flipped either way I get some garbage.

Dmitri Silaev

unread,
Jun 23, 2011, 1:14:33 PM6/23/11
to tesser...@googlegroups.com, ogo...@gmail.com
In general, I wouldn't advise to always use the latest SVN revision,
for stability reasons. My results were obtained by running an
executable built from revision 580. You should try it too.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Dmitri Silaev

unread,
Jun 23, 2011, 1:21:19 PM6/23/11
to tesser...@googlegroups.com, patrick.q...@gmail.com
Patrick,

Here you confuse the "DetectOS" function with the processing pipeline
invoked via command-line. The truth is that "DetectOS" *is not* (!)
called when the OSD is requested from the command line, it's only an
API wrapper having its own logic. Command-line OSD logic is somewhat
different from DetectOS's, hence the discrepancies for seemingly equal
conditions.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

ogorman

unread,
Jun 23, 2011, 3:26:38 PM6/23/11
to tesseract-ocr


On Jun 23, 12:14 pm, Dmitri Silaev <daemons2...@gmail.com> wrote:
> In general, I wouldn't advise to always use the latest SVN revision,
> for stability reasons. My results were obtained by running an
> executable built from revision 580. You should try it too.
>
Taking your advice i went and checked out
svn checkout -r 580 http://tesseract-ocr.googlecode.com/svn/trunk
tesseract-ocr-580
i then built and installed it making no changes
i then ran it over these two files
http://home.rldn.net/upsidedown.tif
http://home.rldn.net/rightsideup.tif

notice i cropped it so the fax header is gone. running
tesseract ~/junk/upsidedown.tif test -l eng -psm 1
produeces garbage
changing to rightsiduep reads fine. Can you please tell me what im
doing wrong? I believe my setup is just like yours now?

thanks

Matt

ogorman

unread,
Jun 23, 2011, 4:53:36 PM6/23/11
to tesseract-ocr, m...@rldn.net
ugh i think google groups ate my message again.

I downgraded to version 580
svn checkout -r 580 http://tesseract-ocr.googlecode.com/^Cn/trunk
tesseract-ocr-580
and installed it then i ran
tesseract ~/junk/upsidedown.tif blah -l eng -psm 1
tesseract ~/junk/rightsideup.tif blah -l eng -psm 1

the files where these
http://home.rldn.net/upsidedown.tif
http://home.rldn.net/rightsideup.tif

the right side up came in just fine. can you please tell me what im
doing wrong

ogorman

unread,
Jun 24, 2011, 9:27:34 AM6/24/11
to tesseract-ocr
Well with a lot of help from dmitri I was able to get the page to read
upside down. Using this version of tesseract, http://home.rldn.net/tesseract.exe
and copying eng.traineddata over the osd.traineddata it works. And it
even works with wine so at least for my immediate problem its solved.
But was hoping to find what is different from his version 580 and the
version i built on my gnu/linux machine. Also is there anyway to turn
on debug to tell me if the file was upside down? So that I can rotate
it with imagemagick. I plan on displaying the ocr text and the image
side by side.

thanks
Matt
Reply all
Reply to author
Forward
0 new messages