Grep non-greedy finding is not working for me in this instance

67 views
Skip to first unread message

Greg Raven

unread,
Dec 18, 2024, 12:03:23 PM12/18/24
to BBEdit Talk
I use Grep in BBEdit a lot for cleaning up websites, but now there's a situation where I'm missing something.

I'm redoing a site from WordPress to static, so there are a bunch of links that look like:

<a href="/posts/post-name/index.html">Post name</a>

I need to change these so the link looks like:

<a href="/posts/post-name.html">Post name</a>

This seems as though it should be simple to find these instances with:

<a href="/posts/.+?/index\.html">

But if a paragraph contains:

<a href="/posts/post-name/index.html">Post name</a>. <a href="/search/index.html">Search here.</a>

It glumps both links (and anything / everything in between) into one found result.

I sorta got it to work with this:

<a href="/posts/[a-z].+?[a-z]/index\.html">

But there has to be a more straightforward way. What am I missing?

Rick Gordon

unread,
Dec 18, 2024, 5:58:41 PM12/18/24
to bbe...@googlegroups.com

What about:

<a href="/posts/[^>]+/index\.html">

Rick Gordon


From: Greg Raven <greg...@gmail.com>
To: BBEdit Talk <bbe...@googlegroups.com>
Date: Wed, Dec 18, 2024 10:03:23AM -0700
Subject: Grep non-greedy finding is not working for me in this instance
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
-

_______________________________________
RICK GORDON
_______________________________________

EMAIL: ri...@rickgordon.com
WWW: www.shelterpub.com

GP

unread,
Dec 18, 2024, 6:13:38 PM12/18/24
to BBEdit Talk
Hmm... I don't know what's going wrong with your usage of either grep (<a href="/posts/.+?/index\.html"> or <a href="/posts/[a-z].+?[a-z]/index\.html">). Either one just matches only the <a href="/posts/post-name/index.html"> parts of text strings including strings also containing <a href="/search/index.html">Search here.</a> in them.

The  <a href="/posts/[a-z].+?[a-z]/index\.html"> will restrict finding those stings with only lower case letters in the first and last characters of post-name but that isn't something you're having a problem with.

Could you post a short example containing a <a href post and <a href search string where your grep is glumping the two links together?

jj

unread,
Dec 19, 2024, 3:12:33 AM12/19/24
to BBEdit Talk
Hi Greg,

Could it be that there is some space or invisible character before the question mark in your pattern ?

<a href="/posts/.+?/index\.html">
                  
If it is the case that explains the greedyness.
You can check the characters in your pattern with menu: Window > Palettes > Character Inspector.

HTH,

Jean Jourdain

Message has been deleted

Mark Bowron

unread,
Dec 19, 2024, 9:04:10 AM12/19/24
to bbe...@googlegroups.com
The search term

<a href="/posts/[^>]+/index\.html">

only has two regular expressions (grep):

(1) "[^>]+" which translates to "one or more characters that are not >" and

(2) "\." which translates to the period character.  (Because it has a special meaning in grep, when all we want to do is search for a period, the "\" is required to "escape" the special meaning.)

On Thu, Dec 19, 2024 at 5:34 AM Greg Raven <greg...@gmail.com> wrote:
This works, thanks! I don't understand the syntax, but it works. Much appreciated.
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/0c965ddc-0a84-46b7-8fb1-d919c267c045n%40googlegroups.com.


--
Mark Bowron
1650 S CASINO DR #2202
LAUGHLIN, NV USA 89029-1512
Reply all
Reply to author
Forward
0 new messages