International WordSet

7 views
Skip to first unread message

Carlo Hogeveen

unread,
Jan 13, 2023, 5:29:47 AM1/13/23
to Semware @ GoogleGroups

This email intends to both inform and request comments if I got something wrong.

My previous topic was a tool for a handier way to case words (upper/lower/flip/capitalize).

TSE allows us to configure which characters it should assume can be part of a word.
It does so in Options -> Full Configuration -> Command/Format Options -> WordSet.

Default TSE comes configured with the WordSet "0-9A-Z_a-z".
This covers English letters and the digits and underscore, that can be part of words in programming languages.
Those of us having to deal with old programming languages have added a leading "-" (hyphen/minus), making the WordSet "-0-9A-Z_a-z ".

The GUI and Linux versions of TSE come with full ANSI (Windows-1252) compatibility, which TSE default presents with the "Courier New" font.

ANSI covers the alphabets of these natural languages:
Danish, Dutch, English, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Norwegian, Portuguese, Spanish, and Swedish.

That means, that those of who encounter other languages than English (and who does not, these days), need to expand TSE's WordSet configuration.

For ANSI I came up with "-0-9A-Z_a-zŠŒŽšœžŸÀ-ÖØ-öø-ÿ" for TSE's WordSet.
TSE changed that to "-0-9A-Z_a-zŠŒŽšœžŸÀ-ÖØ-öø-\d255", which is the same.

Carlo



Harald Mezger

unread,
Jan 14, 2023, 4:51:24 AM1/14/23
to sem...@googlegroups.com
Dear Carlo,

that TSE wordset would need a few more entries, if all accented characters in the German language are to be covered.

For comparison, here's the one I am using:  0-9A-Z_a-zÄÖÜßäöü

Hope it helps.


--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semware/002101d92739%24f3f725e0%24dbe571a0%24%40ecarlo.nl.

Carlo Hogeveen

unread,
Jan 14, 2023, 5:31:37 AM1/14/23
to sem...@googlegroups.com

Harald,

For example, in a TSE WordSet “A-Z” means “all the letters from A to Z”.
Likewise, "À-Ö" means all the letters from À to Ö.
What "from À to Ö" means can be seen in TSE's Util -> "ASCII Chart ..." menu.
I therefore think all German letters are covered too in my suggested international WordSet "-0-9A-Z_a-zŠŒŽšœžŸÀ-ÖØ-öø-ÿ".

Reminder to all: The context of this topic is the Linux version of TSE and the GUI version with an ANSI compatible font like Courier New. The Console version of TSE has its own mess.

Harald, thank you for taking the time to check my suggestion.

Carlo


Aside, pet peeve:
TSE's menu is definitely not an "ASCII Chart", because it has more than the first 128 character codes.
For GUI and Linux TSE "ANSI Chart" would be closer, but ANSI only uses 218 of the 256 character codes.
http://www.alanwood.net/demos/ansi.html
I propose "Character Chart" as a nicely neutral term, that also covers all the code pages that the Console version of TSE can work with.


Reply all
Reply to author
Forward
0 new messages