replacing specific characters within a grep match?

49 views
Skip to first unread message

iva...@ivanx.com

unread,
Jun 26, 2024, 12:32:54 PMJun 26
to BBEdit Talk
Hi all,

Longtime fan, first time caller. BBEdit's the best.

I know my way around BBEdit grep searches pretty well, and yet I'm finding myself stumped on what seems like it should be a pretty simple thing.

I've got a bunch of lines that contain hyperlinks to song titles. In simplified form, they look like this:

<a href='index.html#All Day Long'>All Day Long</a>
<a href='index.html#Chosen Time'>Chosen Time</a>
<a href='index.html#Sooner Than You Think'>Sooner Than You Think</a>

I chose these specific examples because they have a variable number of spaces within the song names. What I want to do is replace the spaces in the displayed text (that is, between the ">" and the "<") with "&nbsp;" but  I want to not do the same to the spaces within the href= portion of the <a tag.

I can search with something like:
>.*?</a>
and that will match only the portion I want to change.

Or if it helped, I could chunk the whole line into subpatterns, like:
^(<a.*?)(#.*?>)(.*?)(</a>)
which would then put what I want to change in \3.

But in either case, what would I put in the Replace field that would allow me to replace only the spaces with something else ("&nbsp;" in this case)? Or, is there some strategy for breaking it up into multiple finds that would get the job done? I've really racked my brain for a while on this and I feel like I must be staring at it but not seeing it.

Thanks for any ideas!

Ivan.


flet...@cumuli.com

unread,
Jun 26, 2024, 12:58:52 PMJun 26
to bbe...@googlegroups.com
There's probably a more clever way to do this but this pattern works with multiple passes. Each Replace All will replace one space per link. Eventually it should have a pass where it doesn't replace anything.

Find: (?<=>)(.*?) (.*?)(?=</a>)
Replace: \1\&nbsp;\2

It uses a look behind and a look ahead assertion to find the text between > and </a> and then replaces the first space inside.

[fletcher]

--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/d47cdc83-c4c0-4b75-aef3-f7eaed42e442n%40googlegroups.com.

Neil Faiman

unread,
Jun 26, 2024, 1:02:14 PMJun 26
to BBEdit Talk Mailing List
I tend to go for an easy repetitive solution for problems like this, rather than wasting brain power trying to find a closed-form solution. Thus, replace 

(>.*) (.*<)    (open paren, greater than, dot, star, close paren, space, open paren, dot, star, less than, close paren)

with

\1&nbsp\2

This will fix the first space in the title. Repeat until nothing more gets replaced. Done.

Cheap and ugly, but it works.

Cheers,
Neil Faiman

flet...@cumuli.com

unread,
Jun 26, 2024, 1:15:00 PMJun 26
to bbe...@googlegroups.com
Another problem with this pattern is that it will replace other spaces on a line that contains an anchor tag and other tags. You can add a quote to the first assertion but it's still going to have some spaces it encodes that it doesn't need to.

[fletcher]

jj

unread,
Jun 27, 2024, 4:46:16 AMJun 27
to BBEdit Talk
In PCRE2, the capability to perform conditional replacements directly within the replacement pattern syntax isn't directly supported.
You could do it with a scripting language (Perl, Python, PHP, JavaScript, etc.) that supports replacement callbacks.

You will not be able to do it by a single BBEdit find/replace.
But ..., if you want to stick to BBEdit and it's rich set of options, you could use a Canonize file and uses multiple regular expressions to do it.

Here is an example of Canonize file (beware that the __REPLACE_WITH_TAB__ placeholders are to be replaced by real TABs in the following snippet because the forum html won't preserve them):

    # -*- x-bbedit-canon-case-sensitive: 0; x-bbedit-canon-match-words: 0; x-bbedit-canon-grep: 1; -*-
    # End:
    # Local Variables:
    # coding: utf-8
    # indent_style: tab
    #===
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;
    (<a[^>]*>([^<\h]*?))\h__REPLACE_WITH_TAB__\1\&nbsp;


If your anchors' text have with more than 20 whitespaces, keep adding occurrences of the regular expression to the Canonize file or run the canonize multiple times.

HTH

Jean Jourdain
Reply all
Reply to author
Forward
0 new messages