Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unicode support: How to read and post with any mix of Unicode characters

18 views
Skip to first unread message

Ralph Fox

unread,
Feb 3, 2019, 12:10:40 AM2/3/19
to
Unicode support: How to read and post with any mix of Unicode characters

1. REQUIREMENTS

1.1 Agent - recommended Agent 7 or later.
1.2 Notepad++
2.3 The file "cp65001.cod" from below.


2. INITIAL ONE-TIME SETUP

All of these setup steps are required before it will work.
Doing half these steps will not get it half working.

2.1 Place a copy of the file "cp65001.cod" into your Agent
program folder (the folder where agent.exe is).
2.2 Restart Agent so that it loads the new .cod file
"cp65001.cod".
2.3 Go to "Tools >> Options >> Languages" and create a new
language (say) "Unicode English" based on your existing
language (say) "English". For example, if your default
language is "English (International)", then
2.3.1 Select "English (International)"
2.3.2 Click the "Add" button
2.3.3 In the 'Name' box, type
Unicode English (Internat)
2.3.4 Check that the 'Based on' setting is set to your
existing language (say) "English (International)".
2.4 IMPORTANT: Adjust the settings for the new language
"Unicode English" as below:
2.4.1 Set 'Codepage' to "Unicode FULL UTF-8 (Codepage cp65001)"
2.4.2 Set 'Send Usenet As' to "Unicode FULL UTF-8 (us-ascii, UTF-8)"
2.4.3 Set 'Send Email As' to "Unicode FULL UTF-8 (us-ascii, UTF-8)"


3. READING A RECEIVED UNICODE MESSAGE
How to read a received message in Unicode with any mix of Unicode characters

3.1 In Agent, select the received message.
3.2 Use "Edit >> Language" to change the message's language
to "Unicode English".
Any non-ASCII (non-7bit) Unicode characters will now look
funny, like a sequence of accented characters. Do not
worry about this.
3.3 In Notepad++, open a new, empty tab (File >> New)
3.4 Use "Encoding >> Encode in ANSI" to set the new, empty
tab's encoding to ANSI.
3.5 Copy and paste the entire message body from Agent to the
new Notepad++ tab.
The non-ASCII Unicode characters will still look funny.
3.6 In Notepad++, use "Encoding >> Encode in UTF-8" to set the
tab's encoding to UTF-8. DO NOT USE "Convert to UTF-8".
Now all the non-ASCII Unicode characters will appear as
they should.
3.7 Read the message in Notepad++, with all of the Unicode
characters in the message,


4. POSTING A NEW UNICODE MESSAGE
How to post a new message in Unicode with any mix of Unicode characters

4.1 In Notepad++, open a new, empty tab (File >> New)
4.2 Use "Encoding >> Encode in UTF-8" to set the new, empty
tab's encoding to UTF-8.
4.3 Write your message in the new tab. You can use any mix of
Unicode characters in your message.
4.4 In Agent, open a new compose window
("New Usenet Post" or "New Email Message").
4.5 IMPORTANT: From the compose window, go to
"Message >> Properties >> Language" and change the message's
language to "Unicode English", the new language you set up.
4.5 Go back to Notepad++ and use "Encoding >> Encode in ANSI" to
change the encoding to ANSI. Do NOT use "Convert to ANSI".
Any non-ASCII (non-7bit) Unicode characters will now look
funny, like a sequence of accented characters. Do not
worry about this.
4.6 Select all the text (Ctrl+A) in Notepad++, then copy and
paste it from Notepad++ into the Agent compose window.
The non-ASCII Unicode characters will still look funny.
4.7 Now send the message from Agent.
Provided (a) the new message's language was set to
"Unicode English", and (b) the language "Unicode English"
was set up correctly, then the non-ASCII characters will be
converted back to Unicode when the message is uploaded.


5. REPLYING TO A UNICODE MESSAGE
How to post a reply to a message in Unicode with any mix of Unicode characters

5.1 In Agent, select the received message.
5.2 Use "Edit >> Language" to change the received message's
language to "Unicode English".
Any non-ASCII (non-7bit) Unicode characters will now look
funny, like a sequence of accented characters. Do not
worry about this.
5.3 Use Agent's "Reply to Message" to open a compose window with
the received message quoted.
Any non-ASCII Unicode characters will still look funny.
5.4 From the compose window, use "Message >> Properties >> Language"
to check that the reply's language is already set to
"Unicode English". If it is not, then you need to change
it in step 5.2.
5.5 In Notepad++, open a new, empty tab (File >> New)
5.6 Use "Encoding >> Encode in ANSI" to set the new, empty
tab's encoding to ANSI.
5.7 Copy and paste the entire message body from Agent's compose
window to the new Notepad++ tab.
Any non-ASCII Unicode characters will still look funny.
5.8 In Notepad++, use "Encoding >> Encode in UTF-8" to set the
tab's encoding to UTF-8. DO NOT USE "Convert to UTF-8".
Now all the non-ASCII Unicode characters will appear as
they should.
5.9 Write your reply in the Notepad++ tab. You can use any mix
of Unicode characters in your message.
5.10 When you have finished writing your reply, use
"Encoding >> Encode in ANSI" to change the Notepad++ encoding
back to ANSI. Do NOT use "Convert to ANSI".
Any non-ASCII Unicode characters will now look funny again.
Do not worry about this.
5.11 Select all the text (Ctrl+A) in Notepad++, then copy and
paste it from Notepad++ back into the Agent compose window.
The non-ASCII Unicode characters will still look funny.
5.12 Now send the message from Agent.


SUMMARY

A. Copy and paste between Agent and Notepad++ when
* The Notepad++ encoding is set to ANSI,
* and the message's language in Agent is set to "Unicode English".

B. Read and write in Notepad++ when
* The Notepad++ encoding is set to UTF-8.

C. To change the Notepad++ encoding for this particular purpose
* Use "Encoding >> Encode in [...]";
* Do NOT use "Encoding >> Convert to [...]".

D. If you are replying to a message which contains Unicode characters
* Change the message's language to "Unicode English"
_before_ you hit "Reply".
* Do not hit reply first, and then change the reply's
language to "Unicode English". Doing this will not
handle Unicode characters in the quoted text.


OTHER ISSUES

I If you have modified the file "cp65001.cod" and it is not working
properly, try it again with the unmodified "cp65001.cod" below.

II Do not add "utf8" to "cp65001.cod".
* "cp65001" replaces "utf8" in the .cod file "cp65001.cod".
* "cp65001" correctly handles all Unicode characters
(in Win2K and later).
* Agent's built-in "utf8" does not correctly handle Unicode
characters beyond the first 64K code points.

III Additional "Outbound-Charsets:" may not work successfully with
messages containing non-ASCII (non-7bit) Unicode characters.
* Agent can have problems converting multibyte characters
to other charsets for sending.
* In codepage 65001 (UTF-8), all non-ASCII (non-7bit) Unicode
characters are multibyte characters.

IV Additional charsets in "Inbound-Charsets:", except for Agent's
"utf8", should be OK.


FILE

File cp65001.cod
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COPY ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name: cp65001
Description: Unicode FULL UTF-8
Version: 1
Codepage: 65001
Charset: UTF-8, csUTF8, UNICODE-1-1-UTF-8
Inbound-Charsets: ascii, utf7, cp65001, iso-8859-1, iso-8859-2, iso-8859-3, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-9, iso-8859-10, iso-8859-11, iso-8859-13, iso-8859-14, iso-8859-15, cp437, cp850, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp1361, cp54936, iso-2022-cn, iso-2022-jp, koi8, koi8u, koi8ru, macroman, norwegian, swedish, tis-620, viscii
Outbound-Charsets: cp65001, Unicode FULL UTF-8, ascii, cp65001
Outbound-Charsets: utf7, Unicode UTF-7#STR_OCS_UTF7, ascii, utf7
Outbound-Charsets: ascii, ASCII Only#STR_OCS_ASCII, ascii
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COPY ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--
Kind regards
Ralph
🦊

Jaimie Vandenbergh

unread,
Feb 3, 2019, 1:09:42 PM2/3/19
to
On Sun, 03 Feb 2019 18:10:36 +1300, Ralph Fox <-rf-nz-@-.invalid> wrote:

>Unicode support: How to read and post with any mix of Unicode characters

Very cool, thank you Ralph. Neat little fox character!

It is a shame it's not possible to persuade Agent to show unicode
itself, but a round trip via Notepad++ is a lot simpler than just
guessing.

Cheers - Jaimie
--
The square root of rope is string. -- Core 3, Valve
0 new messages