Converting to non-latin entities in BBEdit?

Omar KN

unread,

Feb 25, 2021, 3:29:25 AM2/25/21

to BBEdit Talk

I know that UTF-8 will accommodate non-latin entities (ä ö ü å … ), but if I want to convert a page to HTML-entities how would I do this?

UG has this:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

HTML Entities

When this option is set, the Translate Text to HTML command will convert all extended characters in the current document into HTML entities, using either names or the code (in decimal or hexadecimal). You can specify whether the tool should ignore < and >. This is useful when translating text already marked up as HTML. You can also specify that all Unicode text should be converted to entities.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

but how to access this?

/

with best regards, Omar KN, Stockholm, Sweden

Omar KN

unread,

Feb 25, 2021, 3:34:13 AM2/25/21

to BBEdit Talk

If I use "Translate Text to HTML" on a page - because there a somewhere a few non-latin entities (ä ö ü å … ), then everything will be converted, including the HTML tags!

jj

unread,

Feb 25, 2021, 2:30:15 PM2/25/21

to BBEdit Talk

Omar,

If you want to selectively convert some characters to html entities, you could use the BBEdit Canonize command: Menu 'Text' > 'Canonize ...' with a canonize file in the form:

```
# -*- x-bbedit-canon-case-sensitive: 1; x-bbedit-canon-match-words: 0; x-bbedit-canon-grep: 1; -*-
À \À

Á \Á
...

```

See the BBEdit manual for more info on the Canonize command.

On each line put a character and its entity separated by a TAB.

Here is an example for latin diacritics that you can edit at your convenience.

Best regards,

Jean Jourdain

Omar KN

unread,

Feb 25, 2021, 4:07:06 PM2/25/21

to BBEdit Talk

This was the solution, thank you Jean!

Letters that include the hashtag # have to be double excluded:

\&\#7692;

/

with best regards, Omar KN, Stockholm, Sweden

Reply all

Reply to author

Forward