Is this possible to GREP?

142 views
Skip to first unread message

Victoria Bampton

unread,
Jan 26, 2024, 8:49:20 AM1/26/24
to BBEdit Talk
There's clearly lots of experience here... can anyone tell me if this is possible to do a find/replace GREP?

Links like those following are spread throughout a series of documents. I need to change the bit after the hash (e.g., #sub_InstallingLightroom) to remove the chunk before the underscore, put a hyphen before each uppercase character, and change the uppercase to lowercase (e.g., it becomes #installing-lightroom). That could be multiple finds/replacements, as I can deal with some of it, but I'm getting stuck on targeting and transforming those uppercase characters. Is it doable? Any bright ideas would be greatly appreciated!

<a href="/premium-classic/before-you-start.xhtml#sub_InstallingLightroom">Installing Lightroom</a>
<a href="/premium-classic/before-you-start.xhtml#box_MultipleComputers">Multiple Computers</a>
 <a href="/premium-classic/before-you-start.xhtml#sub_KeepingLightroomUpdated">Keeping Lightroom Updated</a>
<a href="/premium-classic/managing-your-photos.xhtml#sub_ManagingFoldersInLightroomAndOnTheHardDrive">Managing Folders in Lightroom and on the Hard Drive</a>
<a href="/premium-classic/managing-your-photos.xhtml#sub_ChangingTheFolderStructure">Changing the Folder Structure</a>

MediaMouth

unread,
Jan 26, 2024, 10:01:09 AM1/26/24
to bbe...@googlegroups.com
The grep experts here are always impressive.  If they can't get it done with regex, a JS solution can.

On Jan 26, 2024, at 05:49, Victoria Bampton <vict...@victoriabampton.com> wrote:

There's clearly lots of experience here... can anyone tell me if this is possible to do a find/replace GREP?
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/f3743619-5c95-4161-8107-b81affbdd6dan%40googlegroups.com.

jj

unread,
Jan 26, 2024, 10:25:46 AM1/26/24
to BBEdit Talk
Hi Victoria,

Here is a possible two pass solution:

First:

A Case sensitive Find / Replace :
   
    (?<=href=")([^#]+)#[^_]+_([A-Z][a-z]+)([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?(?=")
   
Replacement:
   
    \1#\L\2-\3-\4-\5-\6-\7-\8-\9-\10-\11
   
That will give you:

    <a href="/premium-classic/before-you-start.xhtml#installing-lightroom--------">Installing Lightroom</a>
    <a href="/premium-classic/before-you-start.xhtml#multiple-computers--------">Multiple Computers</a>
     <a href="/premium-classic/before-you-start.xhtml#keeping-lightroom-updated-------">Keeping Lightroom Updated</a>
    <a href="/premium-classic/managing-your-photos.xhtml#managing-folders-in-lightroom-and-on-the-hard-drive-">Managing Folders in Lightroom and on the Hard Drive</a>
    <a href="/premium-classic/managing-your-photos.xhtml#changing-the-folder-structure------">Changing the Folder Structure</a>    

Second:
   
Find / Replace:

    -+(?=">)
   
Let the replacement be empty:

That should work up to 10 terms camelCased query fragments.

HTH

Jean Jourdain

Victoria Bampton

unread,
Jan 26, 2024, 10:36:54 AM1/26/24
to BBEdit Talk
Oh my goodness, you are amazing! That worked perfectly. That worked perfectly and has saved me days of work. Thank you so much!

Neil Faiman

unread,
Jan 26, 2024, 10:56:40 AM1/26/24
to BBEdit Talk Mailing List
There may be a way of doing this with a single operation. Usually, when approaching a problem like this, I just find a find-and-replace that will do the job if applied repeatedly some smallish number of times.

So, we’ll start out with a pattern that will leave the “hash-mark lowercase-string underscore” prefix intact as a marker, find the first subsequent uppercase letter in the fragment (if any), lowercase it, and insert the hyphen.

Find: (?i)(href="[a-zA-Z/-]+\.x?html#)(?-i)([a-z]+?_[-a-z]*)([A-Z])
Replace: \1\2-\l\3

Breaking this down:
  • (?i) Means that the pattern is case-insensitive.
  • (href="[a-zA-Z/-]+\.x?html) matches (and captures) the initial part of a link URL string (to make sure that we don’t inadvertently transform arbitrary text in the document).
  • (?-i) switches the remainder of the pattern to case-sensitive.
  • (#[a-z]+?_[-a-z]*) matches and captures
    • A string of lowercase letters followed by an underscore
    • A possibly empty string of lowercase letters and hyphens.
  • ([A-Z]) matches and captures the first uppercase letter in the fragment.

The replacement string consists of everything up to the uppercase letter unchanged; a hyphen; and the uppercase letter, lower-cased. (“\l” means “lower-case the next character in the replacement string.)

If we apply this find-and replace to your example string

<a href="/premium-classic/before-you-start.xhtml#sub_KeepingLightroomUpdated">Keeping Lightroom Updated</a>


We will get

<a href="/premium-classic/before-you-start.xhtml#sub_-keepingLightroomUpdated">Keeping Lightroom Updated</a>


Applying it repeatedly, we get

<a href="/premium-classic/before-you-start.xhtml#sub_-keeping-lightroom-updated">Keeping Lightroom Updated</a>


So you want to apply this with a  find-and-replace-all to the entire document repeatedly until it doesn’t do anything any more. (Since it’s a no-op if it doesn’t match anything, you could just apply it five, or ten, or twenty times — whatever seems like a reasonably upper limit on the number of words in the fragment.)

Now we don’t need to marker any more, and we have an extra hyphen, so we will use a second pattern to clean that up:

Find: (?i)(href="[a-zA-Z/-]+\.x?html#)(?-i)[a-z]+?_-
Replace: \1

Do a find-and-replace-all, and this will remove the “lower-case-string underscore hyphen” from the start of each fragment:

 <a href="/premium-classic/before-you-start.xhtml#keeping-lightroom-updated">Keeping Lightroom Updated</a>


Regards,
Neil Faiman

Benjamin Irwin

unread,
Jan 26, 2024, 12:44:04 PM1/26/24
to BBEdit Talk
There are a couple of other options for search and replace.  SED and AWK are designed specifically for that purpose.  The following is an example taken from "tldr"; a great resource for helpful linux documentation and examples. 

sed

Edit text in a scriptable manner. See also: awk, ed. More information: https://www.gnu.org/software/sed/manual/sed.html.

  • Replace all apple (basic regex) occurrences with mango (basic regex) in all input lines and print the result to stdout:

{{command}} | sed 's/apple/mango/g'

  • Replace all apple (extended regex) occurrences with APPLE (extended regex) in all input lines and print the result to stdout:

{{command}} | sed -E 's/(apple)/\U\1/g'

  • Replace all apple (basic regex) occurrences with mango (basic regex) in a specific file and overwrite the original file in place:

sed -i 's/apple/mango/g' {{path/to/file}}

  • Execute a specific script [f]ile and print the result to stdout:

{{command}} | sed -f {{path/to/script.sed}}

  • Print just the first line to stdout:

{{command}} | sed -n '1p'

  • [d]elete the first line of a file:

sed -i 1d {{/path/to/file}}

  • [i]nsert a new line at the first line of a file:

sed -i '1i\your new line text\' {{/path/to/file}}


Victoria Bampton

unread,
Jan 27, 2024, 9:24:18 AM1/27/24
to BBEdit Talk
Thank you very much, especially for the breakdown of what the different elements do. I've learned a few new tricks from this thread, so I'll be watching the group carefully in future. 

Victoria

Kaveh Bazargan

unread,
Jan 27, 2024, 6:05:34 PM1/27/24
to bbe...@googlegroups.com
Hi Victoria

This is a great group indeed. I find regex101 very useful for playing around. I know BBEdit has a great regex playground too. In regex101 you can save a search with a unique URL and it'll always be there, e.g. for others to refer too. I put one of the examples above here. I have a lot of regex patterns that an external program uses and I just have a sheet with links to lots of regex101 pages. 

All the best.

Regards
Kaveh



--
Kaveh Bazargan PhD
Director
Accelerating the Communication of Research
  https://rivervalley.io/gigabyte-wins-the-alpsp-scholarly-publishing-innovation-award-using-river-valleys-publishing-technology/
Reply all
Reply to author
Forward
0 new messages