is there a way to scan only first word of a page?

34 views
Skip to first unread message

Vikas Sharma

unread,
Apr 19, 2019, 8:49:18 AM4/19/19
to tesseract-ocr

Hello guys,

I am trying to identify page category by recognizing the only first word on a page, but the pages can have much more text so it is taking so much time. I just wanted to limit scanning to one word only. I have tried psm option but no luck there.

Zdenko Podobny

unread,
Apr 19, 2019, 9:02:58 AM4/19/19
to tesser...@googlegroups.com
Simple answer is: no - you can not limit OCR to first word.

But you can restrict area for OCR via unz file (search forum that). If you know that your image must have text in some part of image, you can define area of your interest in unz  file.

Zdenko


pi 19. 4. 2019 o 14:49 Vikas Sharma <vikasha...@gmail.com> napísal(a):

Hello guys,

I am trying to identify page category by recognizing the only first word on a page, but the pages can have much more text so it is taking so much time. I just wanted to limit scanning to one word only. I have tried psm option but no luck there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0a3172b7-6cde-4707-b6d8-7f95a6d7bd12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lorenzo Bolzani

unread,
Apr 19, 2019, 1:17:16 PM4/19/19
to tesser...@googlegroups.com
Hi,
if the page has a fixed simple format you can crop the image leaving only the upper part. You can use imagemagick or a python script, etc.

Lorenzo

Il giorno ven 19 apr 2019 alle ore 14:49 Vikas Sharma <vikasha...@gmail.com> ha scritto:

Hello guys,

I am trying to identify page category by recognizing the only first word on a page, but the pages can have much more text so it is taking so much time. I just wanted to limit scanning to one word only. I have tried psm option but no luck there.

--
Reply all
Reply to author
Forward
0 new messages