Changing a text with Grep

mrcmrc

unread,

Oct 17, 2024, 6:14:22 AM10/17/24

to BBEdit Talk

Hi all, I would need help writing a Grep syntax to change a string of text like this:

House, Big Apple, Today, Movie

into this:

[[House]] | [[Big Apple]] | [[Today]] | [[Movie]]

Thanks for any help!

- Marco.

Brian Forte

unread,

Oct 17, 2024, 7:39:43 AM10/17/24

to bbe...@googlegroups.com

On Thu, 17 Oct 2024 03:14:21 -0700 (PDT), mrcmrc wrote:
> a Grep syntax to change a string of text
> like this:
>

> *House, Big Apple, Today, Movie*
>
> into this:
>
> *[[House]] | [[Big Apple]] | [[Today]] | [[Movie]]*

The regex below assumes

1. There are four word/phrase strings on each line.
2. Each such word/phrase string begins with a majuscule.
2. The first three such strings on each line are end
delimited by a comma.
3. The fourth string is not so-delimited.

Given this

Search for
^\*([A-Z].*),\s([A-Z].*),\s([A-Z].*),\s([A-Z].*)\*

Replace with
\*\[\[\1\]\] | \[\[\2\]\] | \[\[\3\]\] | \[\[\4\]\]\*

When I use the above on this, slightly expanded and varied, example:

*House, Big Apple, Today, Movie*
*Word, Yesterday, Peter Piper, Film*
*Content Question, Overmorrow, Helena, Still Photograph*

It returns the following:

*[[House]] | [[Big Apple]] | [[Today]] | [[Movie]]*
*[[Word]] | [[Yesterday]] | [[Peter Piper]] | [[Film]]*
*[[Content Question]] | [[Overmorrow]] | [[Helena]] | [[Still
Photograph]]*

Hope this helps.

Regards,

Brian Forte.
--
Brian Forte
<bfo...@adelaide.on.net>

mrcmrc

unread,

Oct 17, 2024, 9:36:06 AM10/17/24

to BBEdit Talk

Thanks for your kind help, Brian,
could you change the grep to work with any number of words in the string? In my example, there are four, but in reality, there could be any number of words. The other conditions remain.

- Marco.

Neil Faiman

unread,

Oct 17, 2024, 2:00:07 PM10/17/24

to BBEdit Talk Mailing List

On Oct 17, 2024, at 6:14 AM, mrcmrc <mrc...@gmail.com> wrote:

Hi all, I would need help writing a Grep syntax to change a string of text like this:

House, Big Apple, Today, Movie

into this:

[[House]] | [[Big Apple]] | [[Today]] | [[Movie]]

Below is a solution which will do almost exactly what you want.

Almost exactly, because it will give you

[[House]]|[[Big Apple]]|[[Today]]|[[Movie]]|

Note the extra vertical bar at the end of the line. The simplest thing is to follow up this up by removing the trailing vertical bars with
Find:|$
Replace: (nothing)
A BBEdit text factory makes it simple to automate doing two or more find-and-replaces .

The problem is that you want to change every “fragment” in a line to “[[fragment]]|” except for the last fragment, which you want to change to “[[fragment]]”, and there is no way to write a single regular expression find-and-replace that has chooses among different replacement patterns based on the content or context of the matched pattern.

Your example leaves a lot of details unspecified. Here are the assumption my solution makes about exactly what you want:

Divide each text line into fragments.

Each fragment is a string of text (possibly empty) that does not contain any commas, and that does not have any leading or trailing spaces.
Adjacent fragments are separated by a comma which might have spaces on either side.
Spaces at the beginning or end of the line or around a comma are ignored.

Put double square brackets around each fragment and vertical bars between the bracketed fragments.
Discard the comma/space separators and leading and trailing spaces.

If that is what you wanted, this pattern will do the job:

Find: (?x) (?# 1: Leading space) [ ]* (?# 2: Fragment) ([^\n,]*?) (?# 3: trailing space) [ ]* (?# 4: separator) (?:,|(\n))

Replace: [[\1]]|\2

It works like this:

The leading space component [ ]* matches spaces before the pattern, but doesn’t include them in the fragment. (This will only match at the start of a line.)
The capture group ([^\n,]*?) defines the actual fragments. It matches a string of characters which are not commas or end-of-lines. Note the use of the non-greedy repetition operator *?. This means that the fragment is the shortest string which matches this sub-pattern, while still allowing the remainder of the pattern to match. Trailing spaces will be matched by component 3 below but won’t be included in the fragment.
The trailing space component [ ]* matches spaces after the fragment, but doesn’t include them in the fragment.
The separator component (?:,|(\n)) matches either a comma separator or the end of the line.

Note the use of (?:…), which means that these are “grouping” parentheses, not “capturing” parentheses. The separator is part of the pattern, but it isn’t part of the fragment.
The new-line character is enclosed in capturing parentheses. This means that the pattern match for the last fragment in a line captures the new-line as capture group 2 (which is otherwise empty), and the \2 at the end of the replacement causes the newline to be included following the fragment in the replacement string.

A find-and-replace-all should match the entire text of the input line. The replacement contains each captured fragment, enclosed in doubled square brackets and a trailing vertical bar, and with the captured new-line at the end of the replacement for the last fragment.

mrcmrc

unread,

Oct 18, 2024, 3:01:48 PM10/18/24

to BBEdit Talk

Thank you very much for your help Neil! Great solution to my problem and thanks also for the detailed explanation.

- Marco.

Rob Russell

unread,

Oct 18, 2024, 3:56:50 PM10/18/24

to BBEdit Talk

that's a really good explanation.

What is the first (?x) for in your find string? I read that as searching for an x. My limited knowledge of Grep expects (?\x )

Thanks

Rob

Neil Faiman

unread,

Oct 18, 2024, 9:43:03 PM10/18/24

to BBEdit Talk Mailing List

On Oct 18, 2024, at 3:56 PM, Rob Russell <sum...@gmail.com> wrote:

that's a really good explanation.

What is the first (?x) for in your find string? I read that as searching for an x. My limited knowledge of Grep expects (?\x )

There are several regular expression modes that you can set globally with (?letter) or for a specified range with (?letter :…). The x mode basically ignores white space in the regular expression, except in character sets and when escaped with a backslash. When used with the comment construct, (?# comment), they can increase the readability of a regex enormously. I wouldn’t bother when throwing together a regex for one-time use, but if I’m going to publish one … well, you be the judge.

Looking at the expression in my previous post

(?x) (?# 1: Leading space) [ ]* (?# 2: Fragment) …

You will notice that

There are a bunch of spaces that are just there for readability, and that aren’t treated as part of the text to be matched as they normally would be, and
The spaces that are part of the text to be matched are written as [ ] (which makes a regex a lot more readable in any case).

Cheers,

Neil Faiman

Kaveh Bazargan

unread,

Oct 19, 2024, 6:36:28 AM10/19/24

to bbe...@googlegroups.com

The space ignore option is very useful as you say for commenting on complex expressions. I use it a lot, e.g. here.

--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/C97A5C90-3123-4942-9016-A034181FC7B9%40faiman.org.

--

Kaveh Bazargan PhD

Director

River Valley Technologies ● Twitter ● LinkedIn ● ORCID ● @kave...@mastodon.social

Accelerating the Communication of Research

Rob Russell

unread,

Oct 20, 2024, 2:47:22 PM10/20/24

to BBEdit Talk

That web site is a great resource, thanks.

r

Reply all

Reply to author

Forward