Feature request

10 views
Skip to first unread message

Warwick Lloyd

unread,
Jan 21, 2024, 9:57:45 AMJan 21
to text...@googlegroups.com
I’m a long time TextSoap user. I produce transcripts which are collections of sentences. I want to break lines at the end of each sentence to improve readability. I can do this manually or use <br> but I want to automate this process. There is another app that does it but its output is unusable.

Warwick

Mark Munz

unread,
Jan 21, 2024, 4:01:45 PMJan 21
to text...@googlegroups.com
Matching sentences can be tricky. There are a lot of rules and edge cases, so you may need to make additional adjustments.
If your sentences are well-formed, ending in a period, question, exclamation, and the start of the next sentence is capitalized, you can do something like this:

image.png
image.png

The first part (?<! .. ) is a negative look-behind. This helps prevent the period in an abbreviation from adding a return.
([.?!]\s) this second part (capture $1) looks for an end punctuation and whitespace, capturing it.
Finally ([:uppercase:]) looks for and captures the first capitalized character (which should be the start of the next sentence).

The replacement then puts a return between the end of the sentence and the starting character, including the captured strings so no text is lost.
The second match looks for a sentence ending within a quotation. Something like, "Is this the right answer?" 

This is meant as a starting point. There may be additional edge cases, depending on the text you are working with.



On Sun, Jan 21, 2024 at 6:57 AM 'Warwick Lloyd' via TextSoap <text...@googlegroups.com> wrote:
I’m a long time TextSoap user. I produce transcripts which are collections of sentences. I want to break lines at the end of each sentence to improve readability. I can do this manually or use <br> but I want to automate this process. There is another app that does it but its output is unusable.

Warwick

--
You received this message because you are subscribed to the Google Groups "TextSoap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textsoap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/textsoap/C6EC4F54-F765-49B9-9586-D0366D71F43A%40icloud.com.


--
Mark Munz
unmarked software
https://textsoap.com/

Reply all
Reply to author
Forward
0 new messages