Parsing config to image_to-pdf_hocr for pytesseract

542 views

Skip to first unread message

Raphael Alabi

unread,

Sep 18, 2019, 12:37:20 AM9/18/19

to tesseract-ocr

How does one pass in the hocr_font_type 1 parameter to config to be able to get font type information through OCR?

I am a bit lost as to how this is done.............

Zdenko Podobny

unread,

Sep 18, 2019, 2:45:19 AM9/18/19

to tesser...@googlegroups.com

Hi,

this should work (in terms of How to use configs in pytesseract):

import os
import pytesseract
from PIL import Image

# configuration
pytesseract.pytesseract.tesseract_cmd = r"f:\win64_llvm\bin\tesseract.exe"
os.environ["TESSDATA_PREFIX"] = r"f:\Project-Personal\tessdata_best\tessdata"
custom_configs = r'-c hocr_font_type=1'

# OCR
img = Image.open(r"image.png")
hocr = pytesseract.image_to_pdf_or_hocr(img, extension='hocr', config=custom_configs)

But AFAIR tesseract 4.x does not provide info about font type.

Zdenko

st 18. 9. 2019 o 6:37 'Raphael Alabi' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):

How does one pass in the hocr_font_type 1 parameter to config to be able to get font type information through OCR?
I am a bit lost as to how this is done.............

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8e47af0f-a1b9-48aa-b4b6-352f11a8945e%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages