Multiple pages in parallel?

31 views
Skip to first unread message

Matthew Lai

unread,
Mar 10, 2018, 2:50:42 PM3/10/18
to tesseract-ocr
Hello!

According to the FAQ[1], if I run tesseract on a multi-page image, it should process the pages in parallel.

I am converting a 10-page TIF (in one file) into PDF, and looking at top, it seems like tesseract never uses more than about 250% CPU (I have 16 cores / 32 threads on my machine).

Am I doing something wrong?

tesseract combined.tif out pdf
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
OSD: Weak margin (6.98) for 914 blob text block, but using orientation anyway: 0

tesseract -v (from Debian Testing):
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.1) : libpng 1.6.28 : libtiff 4.0.8 : zlib 1.2.8 : libwebp 0.5.2 : libopenjp2 2.1.2

 Found AVX
 Found SSE

Thanks!
Matthew

Reply all
Reply to author
Forward
0 new messages