Remove Extraneous in OCRd text

13 views
Skip to first unread message

Cristiano Richers

unread,
Dec 3, 2024, 6:07:03 AM12/3/24
to TextSoap
Hello all. I'm starting to use TextSoap again after a long time.

I need to clean up the formatting of texts that have been acquired through OCR. The program adds returns to keep the original width of the text. What process should I use to replace these extra returns with a space, but not the ones that are used to separate paragraphs? This is determined by a sequence of one return followed by another / two returns in direct sequence.
Part of the text looks like this in the Clipboard Workspace:Extra paragraphs.jpg
What are the double stars in front of the paragraph returns?


Reply all
Reply to author
Forward
0 new messages