> On May 29, 2023, at 9:20 AM, Kim Mosley <
mrkim...@gmail.com> wrote:
> No, that doesn’t work here because there is a \n at the end of every line… and it is not soft wrapped. When I do what you suggested I get this:
How much text are you dealing with? Is this a repeating task? Do you have any control over the original texts you’re working on?
I would try to remove those line-breaks internal to the paragraphs as early in my processing as possible, or alter the form they come to me in if I can control that. Each paragraph (which might have multiple sentences) should be on one line with soft-wrapping off in the text editor (BBEdit).
Here’s a possible hacky approach, from looking at the sample text you posted:
Hacky Step 1: Insert newlines before “paragraphs”:
Search for an uppercase letter at the start of a line:
^([A-Z])
and replace it with:
\n\1
BBEdit has great documentation of the above find/replace expressions, but briefly,
^ denotes the start of a line;
[A-Z] stands for one character of the range A to Z (uppercase standard English characters, so I’m making some assumptions here);
() around the [A-Z] “captures" whatever character it found.
In the replace pattern, \n is our friend the newline, and \1 stands for what was captured in the search pattern - that first character of the line.
That will put a newline “\n” in front of what is likely to be the start of one of your paragraphs. So you won’t have to do that many times manually. But you will need to check it over, because this isn’t bullet-proof.
Step 2: Remove internal line breaks:
Once you have those newlines inserted before each set of lines that you consider a paragraph, you'll want to remove the internal line breaks with the paragraphs.
After Hacky step 1, your “paragraphs” now have single newlines internally and are separated by two newlines. So your search for the internal newlines would be for a “\n" surrounded by space-bar spaces, characters, or punctuation but no adjacent newlines.
Minimalistic search pattern:
([^\n])\n([^\n])
This finds a \n with any character NOT an \n on either side; the ^ inside the square brackets means “not”.
Replace with:
\1 \2
This will put the surrounding captured characters back with a spacebar space between them where the \n was. This might not be exactly what you need, depending on how those internal line breaks originally got into your “paragraphs”. Might be some extra spacebar spaces lying around.
So test on a copy of your file. Also turning on BBEdits “Show Invisibles” can be helpful for this work
> To view this discussion on the web visit
https://groups.google.com/d/msgid/bbedit/B4C0D58D-A103-4AE3-AD1F-BFA1DE12BB1E%40gmail.com.