Regex help - matching unique URLs and deleting HTML encoding

30 views
Skip to first unread message

Sammy Banawan

unread,
Jan 26, 2020, 7:56:00 PM1/26/20
to BBEdit Talk
I have <a href="https://www.facebook.com/somename1?fref=profile_friend_list&amp;hc_location=profile_browser" data-gt="{&quot;engagement&quot;:{&quot;eng_type&quot;:&quot;1&quot;,&quot;eng_src&quot;:&quot;2&quot;,&quot;eng_tid&quot;:&quot;100000180221704&quot;,&quot;eng_data&quot;:[]}}" data-hovercard="/ajax/hovercard/user.php?id=100000180221704&amp;extragetparams=%7B%22hc_location%22%3A%22profile_browser%22%7D" data-hovercard-prefer-more-content-show="1">Some Name 1</a>

and I just want to keep Some Name 1. The HTML I want to strip entirely.

Obviously, the file has 
<a href="https://www.facebook.com/somename2?fref=profile_friend_list&amp;hc_location=profile_browser" data-gt="{&quot;engagement&quot;:{&quot;eng_type&quot;:&quot;1&quot;,&quot;eng_src&quot;:&quot;2&quot;,&quot;eng_tid&quot;:&quot;100000180221704&quot;,&quot;eng_data&quot;:[]}}" data-hovercard="/ajax/hovercard/user.php?id=100000180221704&amp;extragetparams=%7B%22hc_location%22%3A%22profile_browser%22%7D" data-hovercard-prefer-more-content-show="1">Some Name 2</a>

etc. as well.

I can't figure out the regex pattern for this. Any help would be appreciated. 

Christian Boyce

unread,
Jan 26, 2020, 11:05:49 PM1/26/20
to bbe...@googlegroups.com
Hi Iain:

This works (maybe not the cleanest way but it works):

Search for this:

(<a href=)(.*)>(.*)(</a>)

Replace with this:

\3


--
Christian Boyce
Christian Boyce and Associates
Mac, iPhone, and iPad Consultants

For appointments, please call the office: 424-354-3548.
We do not make appointments by email or text. We're old-fashioned that way.


Current location and temperature: ‌‌‌
San Antonio, United States: ☀️ +59°F

ctfishman

unread,
Jan 27, 2020, 9:25:11 AM1/27/20
to BBEdit Talk
Find:
<a [^>]+>(.*?)</a>

Replace:
\1

This will find the a tag, followed by the shortest string possible, followed by the closing tag. If you don’t have the question mark it won’t work correctly if there is more than one link on a line.

Sammy Banawan

unread,
Jan 27, 2020, 9:25:16 AM1/27/20
to BBEdit Talk
I figured out an even easier way - just search for anything in angle brackets and delete it. <.*?> and replace with nothing. I should have thought of that before. Thanks for the help!

Ken Corey

unread,
Jan 27, 2020, 9:31:34 AM1/27/20
to BBEdit Talk
Which is great if you don't have angle brackets in your text... And you *know* the html is uniform.

Is there a risk of multiple items on one line? Or malformed html?

Regexps with html can be unpredictable...

-Ken

Jeffrey Jones

unread,
Jan 27, 2020, 12:39:18 PM1/27/20
to bbe...@googlegroups.com
On 2020 Jan 26, at 23:18, Sammy Banawan <sban...@gmail.com> wrote:
>
> I figured out an even easier way - just search for anything in angle brackets and delete it. <.*?> and replace with nothing. I should have thought of that before. Thanks for the help!


If you want to remove ALL tags, try Markup > Utilities > Remove Markup

Sammy Banawan

unread,
Jan 27, 2020, 12:39:18 PM1/27/20
to BBEdit Talk
This text is very predictable. Just a lot of lines with everything but the profile name unneeded.
Reply all
Reply to author
Forward
0 new messages