Trying to use "-c tessedit_page_number=1" at the end of command to process only page two of a multipage tiff

236 views
Skip to first unread message

Laurent Sabourin

unread,
Jul 2, 2019, 3:22:20 PM7/2/19
to tesseract-ocr
I am using tesseract to extract text from a multi page tiff image, but I only want to process the second page. I am using the following command:

tesseract.exe FILE.TIF OUT --tessdata-dir ."\tessdata" -l eng --psm 1 --oem 1 -c tessedit_page_number=1

For some reason it always processes the first page no matter what page number I put in the option. If I remove that option, it processes all the pages.

Is it a known issue? Am I doing it wrong?

Thank you. 

See below for details on my environment:

Operating system:
Windows 10 Enterprise 1903 10.0.18362.175 Client

Here is my version output:
tesseract --version
tesseract 4.0.0
 leptonica-1.76.0 (May 30 2019, 11:18:56) [MSC v.1916 LIB Release x86]
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 2.0.1) : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11
 Found AVX
 Found SSE

Zdenko Podobny

unread,
Jul 2, 2019, 3:36:19 PM7/2/19
to tesser...@googlegroups.com
I guess you have tiff with jpeg compression... You need to use the latest tesseract code and leptonica >1.77

Zdenko


ut 2. 7. 2019 o 21:22 Laurent Sabourin <laurent.s...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3aa29162-5364-4cc2-9d37-44fe96248433%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Laurent Sabourin

unread,
Jul 2, 2019, 3:58:29 PM7/2/19
to tesseract-ocr
It is a G4 tiff compression not JPEG...


On Tuesday, 2 July 2019 15:36:19 UTC-4, zdenop wrote:
I guess you have tiff with jpeg compression... You need to use the latest tesseract code and leptonica >1.77

Zdenko


ut 2. 7. 2019 o 21:22 Laurent Sabourin <laurent....@gmail.com> napísal(a):
I am using tesseract to extract text from a multi page tiff image, but I only want to process the second page. I am using the following command:

tesseract.exe FILE.TIF OUT --tessdata-dir ."\tessdata" -l eng --psm 1 --oem 1 -c tessedit_page_number=1

For some reason it always processes the first page no matter what page number I put in the option. If I remove that option, it processes all the pages.

Is it a known issue? Am I doing it wrong?

Thank you. 

See below for details on my environment:

Operating system:
Windows 10 Enterprise 1903 10.0.18362.175 Client

Here is my version output:
tesseract --version
tesseract 4.0.0
 leptonica-1.76.0 (May 30 2019, 11:18:56) [MSC v.1916 LIB Release x86]
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 2.0.1) : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11
 Found AVX
 Found SSE

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Laurent Sabourin

unread,
Jul 2, 2019, 4:14:45 PM7/2/19
to tesseract-ocr
I am attaching a sample tiff that reproduce the issue, this one is a G3 compression with the same issue...
18.TIF

Zdenko Podobny

unread,
Jul 3, 2019, 1:36:25 AM7/3/19
to tesser...@googlegroups.com
I see the same behaviour on windows. Can you please create issue?

Zdenko


ut 2. 7. 2019 o 22:14 Laurent Sabourin <laurent.s...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Laurent Sabourin

unread,
Jul 3, 2019, 9:02:51 AM7/3/19
to tesseract-ocr
Just created issue #2537


On Wednesday, 3 July 2019 01:36:25 UTC-4, zdenop wrote:
I see the same behaviour on windows. Can you please create issue?

Zdenko


ut 2. 7. 2019 o 22:14 Laurent Sabourin <laurent....@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages