ocropus doesn't detect word breaks correctly in Fraktur text

65 views
Skip to first unread message

wasi99

unread,
Jan 14, 2014, 7:36:51 PM1/14/14
to ocr...@googlegroups.com
I use Ocropus to detect lots of early modern text in Fraktur typeface. I'm very impressed by its high accuracy  While it recognises the single characters quite well, it doesn't seem to separate words correctly in my use-cases.

Example (pre-processed):

Output: MittlcindischenMeer gehabt-welcherso bald die Sonn untergangen kein siickgesehenxund aber durchdas essenrauer Leberen von Hünerenist zu rechtgebrachtworden ) von diserLeberdes Fisches
What i'd expect: Mittlcindischen Meer gehabt-welcher so bald die Sonn untergangen kein siick gesehen und aber durch das essen rauer Leberen von Hünerenist zu recht gebracht worden ) von diser Leber des Fisches

The spaces seem quite clear to me, compared to the character size.

Is there anything I can do to improve ocropus' behaviour concerning the spaces?


Thanks already.
Reply all
Reply to author
Forward
0 new messages