Chopping up text for TTS

4 views
Skip to first unread message

Greg Miller

unread,
Feb 15, 2026, 9:58:33 AMFeb 15
to TextSoap
I have a long piece of text that I need to break into chunks for aa text-to-speech protocol. I have a rough character limit of 10000 including spaces.

So, can I do the following:

1. Set a marker (we'll say "----BREAK----") at the 10000 character limit, UNLESS it is in the middle of the line/paragraph?

2. If the marker is in the middle then instead put that marker either at the beginning of, or end of, the line (and then continue)?

Mark Munz

unread,
Feb 15, 2026, 3:16:17 PMFeb 15
to text...@googlegroups.com
You may need to tweak the number here, but you'll want to create a custom cleaner and use a Regex Find Replace Action
image.png
With a 10000-character limit, I used 10 fewer as the number, with the assumption that any completed word would be less than 10 additional characters. You can lower this number to 9950 or even 9900, depending on what threshold you want for the breaks.

What it does:
It matches 9990 characters (. = any character). After that, it matches one or more word characters.
Then it "replaces" it with the match + the break text.

I tested it with lower character counts, and it seems to work as expected, although there may be edge cases.


--
You received this message because you are subscribed to the Google Groups "TextSoap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textsoap+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/textsoap/c5e7cf81-84e8-4815-99c8-f01a9bd22f28n%40googlegroups.com.


--
Mark Munz
unmarked software
https://textsoap.com/

Reply all
Reply to author
Forward
0 new messages