Hello:
Is it possible to have an ALTO output with the inclusion of hyphens at the end of the line?
At the moment, for the hyphens I get:
<String ID = "string_15" HPOS = "1002" VPOS = "606" WIDTH = "88" HEIGHT = "30" WC = "0.92" CONTENT = "quin -" />
</TextLine>
<TextLine ID = "line_3" HPOS = "493" VPOS = "624" WIDTH = "560" HEIGHT = "48">
<String ID = "string_16" HPOS = "493" VPOS = "624" WIDTH = "51" HEIGHT = "38" WC = "0.92" CONTENT = "tal," /> <SP WIDTH = "21" VPOS = " 624 "HPOS =" 544 "/>
I would need to have:
<String ID = "string_15" HPOS = "1002" VPOS = "606" WIDTH = "88" HEIGHT = "30" WC = "0.92" CONTENT = "quin-" SUBS_TYPE = "HypPart1" SUBS_CONTENT = "quintal" />
<HYP CONTENT = "" WIDTH = "14" HPOS = "..." VPOS = "..." />
</TextLine>
<TextLine ID = "line_3" HPOS = "493" VPOS = "624" WIDTH = "560" HEIGHT = "48">
<String ID = "string_16" HPOS = "493" VPOS = "624" WIDTH = "51" HEIGHT = "38" WC = "0.92" CONTENT = "tal," SUBS_TYPE = "HypPart2" SUBS_CONTENT = "quintal" />
Same question to "isolate" the punctuation: "." at the end of the line, ",", ";", etc. When these characters are "stuck" to the text, they make searching on the word impossible.
Thank you for your feedback (sorry if I couldn't find the answer in the discussions)