is tesseract 3.03's source tar available? need to compile on CentOS 5.6

212 views
Skip to first unread message

Jing JC

unread,
Jul 11, 2014, 7:22:41 PM7/11/14
to tesser...@googlegroups.com
google's tesseract download page listed up 3.02 only. 

I need to compile tesseract on CentOs5.6
where is the download link for tesseract 3.03

or not available yet. 

thank you

Nick White

unread,
Jul 13, 2014, 9:57:31 AM7/13/14
to tesser...@googlegroups.com
It isn't available yet. There is a -rc1 version that is available
and in the Google Drive for the project (look on the homepage of the
project for the link).

Nick

universal reseller

unread,
Jul 13, 2014, 10:08:45 AM7/13/14
to tesser...@googlegroups.com
hi nick
is google drive use tesseract 3.03 ?
i checked one english pdf document with both of my compiled tesseract 3.02
and google drive ocr and the resolt is amazing different!!

how google can reach 100% result?

Nick White

unread,
Jul 13, 2014, 10:25:56 AM7/13/14
to tesser...@googlegroups.com
On Sun, Jul 13, 2014 at 06:38:11PM +0430, universal reseller wrote:
> is google drive use tesseract 3.03 ?

It's -rc1, meaning release candidate 1. So it isn't an official
release, but rather a "testing preview" release, which should be to
what the final 3.03 will be.

> i checked one english pdf document with both of my compiled tesseract 3.02
> and google drive ocr and the resolt is amazing different!!

Different better, or worse? (hope it's better!)

> how google can reach 100% result?

Good preprocessing, and a bit of luck. ;)

>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> tesseract-ocr/
> CAC9ebrrMuTHj2gOut7ei3eCM3x0qxoopV54nJdU-hAww%2B%2BM5Ng%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

universal reseller

unread,
Jul 13, 2014, 10:50:37 AM7/13/14
to tesser...@googlegroups.com
unforgettably bad Different...
for english with 3.02 the best performance is about 90%
but with google drive ocr its 100% with my checks

also for other languages like arabic the 3.02 official language data have a results about 50%

i checked a lot of files in arabic language
and the results is not good for production yet!!

my question is who build the arabic trained file
maybe i can help him to a better train!!!....




Christopher Smeenk

unread,
Jul 14, 2014, 10:38:19 AM7/14/14
to tesser...@googlegroups.com
Hello Jing,

I found the source for v3.03 here: http://packages.ubuntu.com/trusty/tesseract-ocr
As I recall you also need to update the Leptonica image processing library.
I compiled it on Ubuntu 12.04 using the tesseract compiling instructions.

Good luck,

Chris

Nick White

unread,
Jul 14, 2014, 12:09:45 PM7/14/14
to tesser...@googlegroups.com
On Mon, Jul 14, 2014 at 07:38:19AM -0700, Christopher Smeenk wrote:
> I found the source for v3.03 here: http://packages.ubuntu.com/trusty/
> tesseract-ocr

The version called "3.03" in Ubuntu is an -rc - there is no official
3.03 release yet. As I understand it Ray & Jeff called it 3.03 so
that Ubuntu would take it for their LTS release, and it could then
be updated later in Ubuntu (when 3.03 is actually released).

It's confusing, but such is life ;)

Nick

Jing JC

unread,
Jul 14, 2014, 6:01:55 PM7/14/14
to tesser...@googlegroups.com
Hi Chris,

Thanks for the link. 

I will try compiling this on my CentOS 5.7 server. 

Jing JC

unread,
Jul 14, 2014, 6:03:57 PM7/14/14
to tesser...@googlegroups.com
I uploaded receipt images, google drive didn't do a good job on it. 

I am planning to compile the 3.03 version on the server and compare the result again by then. 
Reply all
Reply to author
Forward
0 new messages