Different Results on Linux vs Windows

739 views
Skip to first unread message

Patrick Nichols

unread,
Nov 4, 2013, 2:08:16 PM11/4/13
to tesser...@googlegroups.com
Why would tesseract behave differently on linux vs windows? I am using the same version on both systems (3.01), and am using the exact same configs (everything in tessdata is identical between the two systems) and the same command. And yet on windows I get this result:

1P'1'SC)\BH6PT2'72437

Whereas on Linux I get this:

IPTSCABHGPTZV4 37

1) Is it normal for tesseract to give different results on different operating systems?
2) If so, what sort of things accounts for the differences?
3) Is it possible to get more consistent results through configuration?



Multi.tif

Greg Dunkel

unread,
Nov 4, 2013, 11:06:48 PM11/4/13
to tesser...@googlegroups.com
#2.  Even if the source code is the same, the object code -- the instructions the computer executes -- are different, as well as the input-output libraries and other system calls.

#3.  I have only used tesseract on solaris and linux;  I didn't notice much difference -- and it was a few years ago -- but what I remember was that  fine details, like commas, periods, single quotes were treated differently


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
/greg

Nick White

unread,
Nov 5, 2013, 5:56:00 AM11/5/13
to tesser...@googlegroups.com
Hi Patrick,

> 1) Is it normal for tesseract to give different results on different operating
> systems?
> 2) If so, what sort of things accounts for the differences?
> 3) Is it possible to get more consistent results through configuration?

There are examples of it behaving differently according to the
compiler and platform on the TestingTesseract wiki page, too:
http://code.google.com/p/tesseract-ocr/wiki/TestingTesseract

I very much doubt that you could make configuration changes to get
rid of the differences. I suppose it's largely due to different
choices the compilers make when presented with code that could be
interpreted and optimised more than one way.

Patrick Nichols

unread,
Nov 5, 2013, 9:28:26 AM11/5/13
to tesser...@googlegroups.com
Thanks for the info! 

Robert Komar

unread,
Nov 5, 2013, 12:38:57 PM11/5/13
to tesser...@googlegroups.com
In the past, I worked on trying to get consistent behaviour
in Monte Carlo physics simulations across platforms. There
were some differences in floating point behaviour across
architectures, but such effects are small and shouldn't
have too large an affect on the results here. If differences
in the 6th or 7th significant digit are causing totally
different results, then it's an indication that the
code is not well thought out in places.

The largest effects were caused by variable initialization.
Again, this is more of an issue with sloppy code. The
flag for warning about uninitialized variables should be
set, and any warnings should be hunted down and fixed.
From that wiki page, it looks like the code was audited
for this back in the 2.0 days, but enough code has been
added since then that new bugs can have crept in again.

So, I would say that totally different results with code
compiled with different options or different compilers
is a sign of bad code. I don't think it should just be
shrugged off as unavoidable.

Cheers,
Rob Komar

Tom Morris

unread,
Nov 6, 2013, 12:02:11 PM11/6/13
to tesser...@googlegroups.com
On Tuesday, November 5, 2013 12:38:57 PM UTC-5, rkomar wrote:
So, I would say that totally different results with code
compiled with different options or different compilers
is a sign of bad code.  I don't think it should just be
shrugged off as unavoidable.

I agree with Rob here.  Numerical/algorithm instability isn't just going to be an issue between platforms.  It indicates a fragility that could be broken by compiler upgrades and other changes on a single platform.

The test results that Nick pointed out report on gross error rates, but at the detail level, there are going to be a set of individual characters which are (mis)recognized differently as well as specific reasons for those differences.  It'll take work to isolate the problems, but it's certainly not an intractable problem.

Speaking optimistically, Ray edited that wiki page just a few days ago, so perhaps he's already on the task.

Tom

zdenko podobny

unread,
Nov 6, 2013, 3:13:52 PM11/6/13
to tesser...@googlegroups.com
I don't think so - he just removed the outdated link to UNLV tests.
Reply all
Reply to author
Forward
0 new messages