Deleting emoji characters

835 views
Skip to first unread message

Tim A

unread,
Oct 24, 2020, 11:03:21 AM10/24/20
to BBEdit Talk

How do I strip emoji characters from a body of text?

😳 🤣🤣

♥️

🥰


Gregory Shenaut

unread,
Oct 24, 2020, 11:51:35 AM10/24/20
to BBEdit Talk
I cannot answer this question; however, I believe that this document may provide some answers: <https://www.unicode.org/reports/tr51/tr51-13.html>


On Oct 24, 2020, at 07:45 , Tim A <timaa...@gmail.com> wrote:

How do I strip emoji characters from a body of text?

😳 🤣🤣

♥️

🥰



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/d7824fb2-b1f4-4d18-a621-495a281b96f3n%40googlegroups.com.

Rod Buchanan

unread,
Oct 24, 2020, 11:51:35 AM10/24/20
to bbe...@googlegroups.com

Text -> Zap Gremlins… should work, though depending on the content it may remove more than just the emojis.

-- 
Rod

On Oct 24, 2020, at 9:45 AM, Tim A <timaa...@gmail.com> wrote:

How do I strip emoji characters from a body of text?

😳 🤣🤣

♥️

🥰



jj

unread,
Oct 24, 2020, 3:53:06 PM10/24/20
to BBEdit Talk
Tim,

BBEdit is PCRE2 16-bit based so maybe a unicode regular expression can help you. But Unicode is tricky and defining what applies as an emoji depends on many factors and possible combinations.

This regular expression seems to catch many emoticons, emojis, etc. but I don't know if it is exhaustive.
At least it seems to work in BBEdit and can be modulated to select specific Unicode blocks.

```
(*UTF)(?x)
[\N{U+2700}-\N{U+27BF}]    (?# Dingbats 192)
|[\N{U+1F000}-\N{U+1F02F}] (?# Mahjong Tiles 44)
|[\N{U+1F030}-\N{U+1F09F}] (?# Domino Tiles 100)
|[\N{U+1F0A0}-\N{U+1F0FF}] (?# Playing Cards 82)
|[\N{U+1F100}-\N{U+1F1FF}] (?# Enclosed Alphanumeric Supplement 193)
|[\N{U+1F200}-\N{U+1F2FF}] (?# Enclosed Ideographic Supplement 64)
|[\N{U+1F300}-\N{U+1F5FF}] (?# Miscellaneous Symbols and Pictographs 768)
|[\N{U+1F600}-\N{U+1F64F}] (?# Emoticons 80)
|[\N{U+1F650}-\N{U+1F67F}] (?# Ornamental Dingbats 48)
|[\N{U+1F680}-\N{U+1F6FF}] (?# Transport and Map Symbols 110)
|[\N{U+1F700}-\N{U+1F77F}] (?# Alchemical Symbols 116)
|[\N{U+1F780}-\N{U+1F7FF}] (?# Geometric Shapes Extended 101)
|[\N{U+1F800}-\N{U+1F8FF}] (?# Supplemental Arrows-C 148)
|[\N{U+1F900}-\N{U+1F9FF}] (?# Supplemental Symbols and Pictographs 244)
|[\N{U+1FA00}-\N{U+1FA6F}] (?# Chess Symbols 98)
|[\N{U+1FA70}-\N{U+1FAFF}] (?# Symbols and Pictographs Extended-A 16)
```
Best regards,

Jean Jourdain

Tim A

unread,
Oct 24, 2020, 10:18:23 PM10/24/20
to BBEdit Talk
Thanks all for the quick and informative responses . The "Text -> Zap Gremlin" of Rod did the trick!

@lbutlr

unread,
Mar 22, 2021, 5:01:39 AM3/22/21
to BBEdit Talk
On 24 Oct 2020, at 18:02, Tim A <timaa...@gmail.com> wrote:
> Thanks all for the quick and informative responses . The "Text -> Zap Gremlin" of Rod did the trick!

Be careful, when you are dealing with a UTF-8 document, this may zap more tan you want.

Forexample, I often use ¹ and ² in documents and simply removing them will cause the footnotes in the text to lose their context.

There are, of course, many other UTF-8 characters that may be zapped, including ñ and √ and ç that may cause problems if removed.

If you change 2√x to 2/x by zapping, that could be very bad.

--
Stupid men are often capable of things the clever would not dare to
contemplate... --Feet of Clay

Reply all
Reply to author
Forward
0 new messages