Hyphenation postprocessing

22 views
Skip to first unread message

Lars Aronsson

unread,
Feb 5, 2023, 9:57:51 PM2/5/23
to tesser...@googlegroups.com
Is it possible to instruct tesseract for the image:

 Let us build a snow-
 man on the lawn.

to output in txt format:

 Let us build a
 snowman on the lawn.

This would almost preserve line breaks, while at
the same time making hyphenated words whole
and searchable.

It seems to me that the source has code to recognize
hyphenated words, and it should be possible to
implement this behaviour as an option.


--
Lars Aronsson (la...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/


Zdenko Podobny

unread,
Feb 7, 2023, 2:54:58 PM2/7/23
to tesser...@googlegroups.com
there is a (similar) feature request:


Zdenko


po 6. 2. 2023 o 3:57 Lars Aronsson <la...@aronsson.se> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2659e698-54b8-38cc-060e-db993aa0a1a6%40aronsson.se.
Reply all
Reply to author
Forward
0 new messages