Regex help - matching unique URLs and deleting HTML encoding
30 views
Skip to first unread message
Sammy Banawan
unread,
Jan 26, 2020, 7:56:00 PM1/26/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BBEdit Talk
I have <a href="https://www.facebook.com/somename1?fref=profile_friend_list&hc_location=profile_browser" data-gt="{"engagement":{"eng_type":"1","eng_src":"2","eng_tid":"100000180221704","eng_data":[]}}" data-hovercard="/ajax/hovercard/user.php?id=100000180221704&extragetparams=%7B%22hc_location%22%3A%22profile_browser%22%7D" data-hovercard-prefer-more-content-show="1">Some Name 1</a>
and I just want to keep Some Name 1. The HTML I want to strip entirely.
Obviously, the file has
<a href="https://www.facebook.com/somename2?fref=profile_friend_list&hc_location=profile_browser" data-gt="{"engagement":{"eng_type":"1","eng_src":"2","eng_tid":"100000180221704","eng_data":[]}}" data-hovercard="/ajax/hovercard/user.php?id=100000180221704&extragetparams=%7B%22hc_location%22%3A%22profile_browser%22%7D" data-hovercard-prefer-more-content-show="1">Some Name 2</a>
etc. as well.
I can't figure out the regex pattern for this. Any help would be appreciated.
Christian Boyce
unread,
Jan 26, 2020, 11:05:49 PM1/26/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to bbe...@googlegroups.com
Hi Iain:
This works (maybe not the cleanest way but it works):
Search for this:
(<a href=)(.*)>(.*)(</a>)
Replace with this:
\3
--
Christian Boyce
Christian Boyce and Associates
Mac, iPhone, and iPad Consultants
For appointments, please call the office: 424-354-3548.
We do not make appointments by email or text. We're old-fashioned that way.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BBEdit Talk
Find:
<a [^>]+>(.*?)</a>
Replace:
\1
This will find the a tag, followed by the shortest string possible, followed by the closing tag. If you don’t have the question mark it won’t work correctly if there is more than one link on a line.
Sammy Banawan
unread,
Jan 27, 2020, 9:25:16 AM1/27/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BBEdit Talk
I figured out an even easier way - just search for anything in angle brackets and delete it. <.*?> and replace with nothing. I should have thought of that before. Thanks for the help!
Ken Corey
unread,
Jan 27, 2020, 9:31:34 AM1/27/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BBEdit Talk
Which is great if you don't have angle brackets in your text... And you *know* the html is uniform.
Is there a risk of multiple items on one line? Or malformed html?
Regexps with html can be unpredictable...
-Ken
Jeffrey Jones
unread,
Jan 27, 2020, 12:39:18 PM1/27/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to bbe...@googlegroups.com
On 2020 Jan 26, at 23:18, Sammy Banawan <sban...@gmail.com> wrote:
>
> I figured out an even easier way - just search for anything in angle brackets and delete it. <.*?> and replace with nothing. I should have thought of that before. Thanks for the help!
If you want to remove ALL tags, try Markup > Utilities > Remove Markup
Sammy Banawan
unread,
Jan 27, 2020, 12:39:18 PM1/27/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BBEdit Talk
This text is very predictable. Just a lot of lines with everything but the profile name unneeded.