Remove line returns from within a paragraph but not between paragraphs

811 views
Skip to first unread message

Gavin Brooks

unread,
Nov 3, 2021, 6:02:07 AM11/3/21
to BBEdit Talk
I have a lot of text that has been copied from a PDF, that I need to make into text files for research purposes. When the text copied over, there was a hard line return at the end of every sentences, and a blank space between paragraphs. I need to the lines within the paragraph joined but the paragraphs to remain separate and am looking for a regex to do that:

For example, the text looks like this:

This is the 1st paragraph. 
With a few lines that
need connected.

This is another paragraph with
a blank line between
it and the previous paragraph.

I want it to look like this:

This is the 1st paragraph. With a few lines that need connected.

This is another paragraph with a blank line between it and the previous paragraph.

The best that I have been able to come up with is this:

([^\r\n])\R(?=[^\r\n])

But that will also remove the last character of each line, so I get this:

This is the 1st paragraph With a few lines tha need connected.

This is another paragraph wit a blank line betwee it and the previous paragraph.

Any suggestions about how to rewrite the above so that it does not remove the final character?

Gavin



jj

unread,
Nov 3, 2021, 6:42:09 AM11/3/21
to BBEdit Talk
Hi Gavin,

You could use a capture instead of a positive lookahead assertion and skip the trailing whitespaces.

Find: (\S)\h*\R(\S)

Replace: \1 \2

HTH,

Jean Jourdain

Roger Moffat

unread,
Nov 3, 2021, 8:40:03 AM11/3/21
to bbe...@googlegroups.com
When I need to sort something like this I don’t use GREP since I’m not familiar enough with it to get it right with little effort, but you can use the basic Find and Replace

Find

\r \r (there’s a space between them because you said there was a space between the paragraphs. If you just meant there was an empty line, use \r\r)

Replace with

xxxxxxx

This preserves the gap between paragraphs as xxxxxxx, and now you have a whole bunch of lines ending in a return

Then find

\r

replace with

“ “ (a single space)

Then Find

xxxxxxx

Replace with

\r\r

Will separate out all the paragraphs again with a blank line between them. Then in case you’ve ended up with 2 spaces anywhere

Find “two spaces” (means 2 spaces, not the words in quotes)

Replace with “one space” (same as above - a single space, not the two words)

Roger



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/7d23bc98-d5b6-4739-94e8-c08f1efabb28n%40googlegroups.com.

Arasmus

unread,
Nov 3, 2021, 10:59:43 AM11/3/21
to BBEdit Talk

And if you are going to need to do it a lot, you can create a text filter that strings the three steps together. 

Gavin Brooks

unread,
Nov 3, 2021, 10:59:43 AM11/3/21
to BBEdit Talk
Hello Jean,

Thanks - that worked perfectly.

Gavin

Gavin Brooks

unread,
Nov 3, 2021, 10:59:43 AM11/3/21
to BBEdit Talk
Hello Roger,

Thank you very much for the suggestion. I had thought about doing it this way as well, but I have a lot of files to go through and wanted to see if I could find a way to work things out that didn't involve running find and replace on each file twice. Jean's suggestion worked perfectly. 

Thanks again for the suggestion - I appreciate the help.

Gavin

Jeffrey Jones

unread,
Nov 3, 2021, 1:19:21 PM11/3/21
to bbe...@googlegroups.com
On 2021 Nov 2, at 23:04, Gavin Brooks <gavin...@gmail.com> wrote:

I have a lot of text that has been copied from a PDF, that I need to make into text files for research purposes. When the text copied over, there was a hard line return at the end of every sentences, and a blank space between paragraphs. I need to the lines within the paragraph joined but the paragraphs to remain separate and am looking for a regex to do that:


I am surprised to see so much discussion of Find & Replace options, but no one has mentioned that there is a built-in command that does exactly what you ask:

Text > Remove Line Breaks

It works perfectly on your example text.

jj

unread,
Nov 4, 2021, 11:19:49 AM11/4/21
to BBEdit Talk
Text > Remove Line Breaks is indeed a very useful command. Thanks Jeffrey for pointing to it.

But Remove Line Breaks doesn't make a distinction between left aligned text and indented text as the proposed regular expression find/replace does.

In most cases, you want indented text to be left untouched.

Try both solutions on this snippet of text and look at the resulting Code example section :

--
This is some long wrapped text line, 
that is followed by some indented section.
    
    Code example:
    
    A = "a"; // A Comment.
    B = "b";
    
This is another wrapped paragraph with more text, 
blablabla, blablabla, blablabla ...
--

Jean
Reply all
Reply to author
Forward
0 new messages