tesseract 4.1.1 slow in aws instance centos7

45 views
Skip to first unread message

James Lian

unread,
Nov 9, 2022, 9:31:50 AM11/9/22
to tesseract-ocr
Hi all,

i have installed tesseract 4.1.1 to aws instance centos7.

We have noticed that there is slowness in comparison to our personal laptop.

Is there anything i can do to run faster?

The  ec2 instance is having 4 vcpu and 16gb memory

tesseract 4.1.1

 leptonica-1.78.0

  libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0

 Found AVX512BW

 Found AVX512F

 Found AVX2

 Found AVX

 Found FMA

 Found SSE



Giuseppe Coniglio

unread,
Nov 28, 2022, 9:49:40 AM11/28/22
to tesseract-ocr
Hi, I have same problem in my Oracle Linux Server 8.6

tesseract 4.1.1
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 1.0.0

 Found AVX2
 Found AVX
 Found FMA
 Found SSE

Zdenko Podobny

unread,
Nov 28, 2022, 9:54:26 AM11/28/22
to tesser...@googlegroups.com
Are these version information from the server or from the laptop?

General rule: use the latest stable version (5.2, 4x is unsupported)

Zdenko


po 28. 11. 2022 o 15:49 Giuseppe Coniglio <jackf...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4316115f-0ba1-46d2-bb0d-dcaa0a810114n%40googlegroups.com.

Giuseppe Coniglio

unread,
Nov 29, 2022, 9:43:12 AM11/29/22
to tesseract-ocr
In my  Oracle Linux Server 8.6  max version available is tesseract 4.1.1, in my spring boot microservices in pom.xml :

       <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>4.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.22</version>
        </dependency>

It's slow  PDDocument.load (org.apache.pdfbox.pdmodel)  either Tesseract.doOCR
Thanks
Reply all
Reply to author
Forward
0 new messages