Help Correcting a "Fast Typing Cleanup"

6 views
Skip to first unread message

Joe

unread,
Feb 26, 2025, 3:35:04 PMFeb 26
to TextSoap
I put together a group of regex transformations that I was hoping would do the following. The headings are meant for Markdown code format:
  1. Capitalize the First Letter of Each Sentence
  2. Capitalize 'i' When It's a Standalone Word
  3. Sentence Case Headings
  4. Convert Double Capitals to Single Capitals
  5. Add Periods to Sentences
For whatever reason when I run the cleaner it converts all my text into 1s and 2s. Very odd. Can I get some help formatting the cleaner? I've attached it

12d I 1212 12u. UU.I 1212 121212 I 1212 1212. UU.I 1212 12 1212 1212 12 1212e 12u. 121212d 12r 12 12121212 1212 I 121212t I 1212d 1212y 12t 1212 12121212 12.

12e 1212121212 1212y 12s 12e 121212t 121212e I 12t, 1212121212r 12d 1212. 12e 121212 I 121212 1212 1212 12t 121212t, 1212121212 12d 1212 121212s


Joe

unread,
Feb 26, 2025, 3:37:01 PMFeb 26
to TextSoap
#5 is only supposed to work in situations where there is another sentence in the paragraph prior. If it's a single line then it shouldn't add a period

Joe

unread,
Feb 26, 2025, 3:39:20 PMFeb 26
to TextSoap

Mark Munz

unread,
Feb 26, 2025, 5:52:48 PMFeb 26
to text...@googlegroups.com
I'll break these up, so we can focus on each one.

image.png
I think you can get away with this for capitalizing I. I don't think you need to use the look behind / look ahead for the break matches. and since space matches as a breakpoint, it will work for:
all i want is to do things this way.
Never have i.
(i want to do this)

If there is a scenario where it is not working, then you might go for a more complicated match.
You can use "I" as the replacement, but I wanted to demonstrate how you can use the case modifiers with a capture group.
In this case, $u indicates to capitalize the next letter ($1 is the capture group, so it's the next letter captured, which happens to be "i")





On Wed, Feb 26, 2025 at 12:39 PM Joe <commte...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "TextSoap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textsoap+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/textsoap/b6bbe783-ea2c-416e-a027-734e8bda789fn%40googlegroups.com.


--
Mark Munz
unmarked software
https://textsoap.com/

Mark Munz

unread,
Feb 26, 2025, 5:58:13 PMFeb 26
to text...@googlegroups.com

Mark Munz

unread,
Feb 26, 2025, 6:08:49 PMFeb 26
to text...@googlegroups.com
image.png
One thing to note. TextSoap uses ICU syntax, so the capture groups are specified as $1 and $2 instead of \1 and \2.

image.png


Mark Munz

unread,
Feb 26, 2025, 6:21:04 PMFeb 26
to text...@googlegroups.com
image.png
If you want to modify case of captured groups, TextSoap uses $u, $U, $l, $L, $E instead of the backslash syntax of PCRE.
Note: This is not part of the ICU standard, as it does not support this kind of case transformation.

There is a bug that \U and \u are colorized because TextSoap used to support \uhhhh and \Uhhhhhhhh -- but that proved problematic.
The next update of TextSoap will no longer highlight those in blue.

For fun, I tweaked this a bit to be more general, so it will handle period and question marks, and any Unicode character considered lowercase.

image.png

In general, lookaheads and look behinds are a bit slower, so alternatively, you could do it like this:
image.png
Here, we just capture the end character, the space, and the lowercase start of the next sentence.
Then in the replacement, we use $U which will uppercase all letters (of which there is only the first letter of the sentence)

Mark Munz

unread,
Feb 26, 2025, 6:34:34 PMFeb 26
to text...@googlegroups.com
image.png
I'll be honest. Not sure what these two are meant to do. I changed the \L and \U to $L and $U
#1 appears to change an uppercase letter at the beginning of a line to lowercase

#2 uses the same match as the capitalized action earlier but adds a period after the capitalized letter.
I don't understand how that match will come after you've capitalized the letter after a sentence.

And example of the text you're trying to change would help.

And be sure to check out Help > Regular Expression Syntax in the app for a more complete reference of the syntax (including $u,$U,$l,$L,$E)


Reply all
Reply to author
Forward
0 new messages