hyphenated words over line-boundaries in Text File

16 views
Skip to first unread message

Maximilian Hadersbeck

unread,
Apr 7, 2017, 7:14:42 AM4/7/17
to Unitex-GramLab
Hi all users of Unitex

We have problems wirh hyphenated words over line-boundaries in Text Files.
How can they be joined to one word. They are always two (unknown) token. 
for example:

be-
come


gets two token ....

be
come

Thank you for help

Max Hadersbeck, Munich, CIS



eric.laporte

unread,
May 23, 2017, 4:18:42 AM5/23/17
to Unitex-GramLab
Dear Max,
you could try a preprocessing graph in the Preprocessing/Replace directory to target patterns of the form <WORD>- " " <WORD> and glue the two words.
Best wishes,
Eric
Reply all
Reply to author
Forward
0 new messages