How to replace contents between two specific words

1,017 views
Skip to first unread message

Nowaki A

unread,
Sep 18, 2021, 12:12:21 PM9/18/21
to BBEdit Talk
I have a thousands of word file with the contents I want to delete.
All the contents I want to delete are between exactly the same two specified words. 
How do I replace or remove the contents?

For example

AAA
***************
***************
***************
***************
***************
BBB

* = The contents with all sorts of characters. 

I want to remove or replace everything between AAA and BBB. 
How do I do that?
Please help. 

Neil Faiman

unread,
Sep 18, 2021, 1:01:50 PM9/18/21
to BBEdit Talk Mailing List
Cmd-F to open the search dialog.

Use this “Find” string:  (?s)(?<=\bAAA\b).*?(?=\bBBB\b)

Make sure “Grep” and “Case sensitive” are checked.
Make the “Replace” string empty.

Click “Replace All”.

\bAAA\b and \bBBB\b match “AAA” and “BBB” only when they appear as complete words. If that isn’t what you want, omit the \b markers.

(?<=\bAAA\b) means that the matched string must be immediately preceded by AAA, but that the AAA isn’t part of the matched string. Similarly, (?=\bBBB\b) means that the matched string must be immediately followed by BBB, but that the BBB isn’t part of the matched string.

.*? matches the string between the AAA and BBB. Using .*? instead of .* means that if there are multiple BBBs in the file, the match is only up to the first one, not the last one. Enclosing it in (?s:.*?) means that . will match line breaks as well as ordinary characters, so the matched string can span multiple lines.

Example:

Initial file contents:

hello mother
AAA
this
is ?::k
BBB
hello father

After the replacement:

hello mother
AAABBB
hello father

Note that this does exactly what you described: everything between the AAA and BBB is deleted, including all the line breaks, so they end up jammed together. If what you really meant was that the AAA and BBB should each stand alone on a line of its own, and that only the lines between them should be deleted, then this Find string will work: (?<=^AAA\n)(?s:.*?)(?=^BBB$)

(?<=^AAA\n) says that the prefix string is a line containing just AAA, and includes the line break at the end of the line, so the matched string does not include the line break. (?=^BBB$) says that the suffix string is a line containing just BBB, not including the line breaks, so the line break before the suffix string is part of the matched string.

Regards,

Neil Faiman
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/1787296b-0bc5-4c63-bf92-df97240da040n%40googlegroups.com.

Tim A

unread,
Sep 19, 2021, 9:52:43 PM9/19/21
to BBEdit Talk
On Saturday, September 18, 2021 at 10:01:50 AM UTC-7 Neil Faiman wrote:
Use this “Find” string:  (?s)(?<=\bAAA\b).*?(?=\bBBB\b)

Neil's solution encouraged me to learn about "Pattern Modifiers", e.g. (?imsx)
And I am able to parse the look around aspects of his solution... but isn't it adequate to just use ...
Search:    (?s)AAA\n.+?BBB
Replace:  AAA\nBBB

Neil Faiman

unread,
Sep 19, 2021, 10:06:05 PM9/19/21
to BBEdit Talk Mailing List
Only the OP knows exactly what the delimiter rule is — any occurrence of AAA and BBB, or as words, or as complete lines … — so the best way to code the delimiters isn’t clear, but aside from that, I agree completely. Using pre- and post-assertions to match just the string to be removed is certainly overkill for this problem.

Regards,

Neil Faiman

Tim A

unread,
Sep 19, 2021, 10:39:03 PM9/19/21
to BBEdit Talk
Overkill is a great way to learn.
Experimenting further I got my solution to fail when AAA and/or BBB was not by itself. 
Now I am trying to figure out what the   ?s:  colon business is in your solution  (?<=^AAA\n)(?s:.*?)(?=^BBB$) 

pg 197 of manual for version  12.6.7 

These options can also be set using the clustering (non-capturing) parentheses syntax defined earlier, by inserting the option letters between the “?” and “:”.

But if it is just to turn off capture then why does the match fail without the colon?



Neil Faiman

unread,
Sep 20, 2021, 7:20:26 AM9/20/21
to BBEdit Talk Mailing List
(?:subpatternby itself makes the parentheses non-capturing. But (?letters:subpattern) sets GREP options that apply just to the subpattern within the parentheses. (The alternative is (?letters) which sets GREP options that apply to everything that comes after it in the pattern.) 

The ?s option means, “Dot matches anything at all, including line breaks,” as opposed to the default, which is that dot only matches regular characters. Thus,(?s:subpattern) means that the subpattern can match across multiple lines.

Regards,
Neil Faiman
Reply all
Reply to author
Forward
0 new messages