Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[SvarDOS] UTF8TOCP usage

3 views
Skip to first unread message

Roland White

unread,
Apr 20, 2022, 11:54:58 AM4/20/22
to
Hey, I found UTF8TOCP, a program which looks exactly like what I'm
chasing!

Err... ooops:

UTF8TOCP.COM 437 ANYTEXT.TXT

only /shows/ ANYTEXT.TXT converted, but does not write it (as starry-eyed
supposed #-).

However: At last, my goal is to convert e-mail messages like this...
=======================================================================
Return-Path: <roland...@web.de>
Received: from mout.web.de (212.227.15.4 [212.227.15.4])
by firemail.de (b1gMailServer) with ESMTPS id 18B701E9
for <r...@firemail.de>; Wed, 20 Apr 2022 09:53:18 +0200 (CEST)
Received-SPF: Pass
identity=roland...@web.de; client-ip=212.227.15.4;
helo=mout.web.de
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de;
s=dbaedf251592; t=1650441198;

[...]

X-UI-Sender-Class: c148c8c5-30a9-4db5-a2e7-cb6cc037b8f9
Received: from localhost ([88.69.145.141]) by smtp.web.de (mrweb005
[213.165.67.108]) with ESMTPSA (Nemesis) id 1MtPre-1nursa0DJ0-00uidC for
<r...@firemail.de>; Wed, 20 Apr 2022 09:53:18 +0200
From: Roland White <roland...@web.de>
To: <r...@firemail.de>
Subject: TEST
Date: Wed, 20 Apr 2022 09:53:17 +0200
MIME-Version: 1.0
Message-ID: <04fc8792-9963-43e4...@web.de>
User-Agent: Trojita/0.7; Qt/5.12.7; xcb; Linux;
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:Zxl7Z47RsPdte/pXDZaHnvBb/ZvVzICdHm73VeihrPfIxawC5UK

[...]

=C3=84hnlich =C3=BCberflie=C3=9Fenden =C3=96lfa=C3=9F=C3=BCberl=C3=A4ufen?
R=C3=
=B6tlich?
=======================================================================

... to make it readable with PMAIL 3.50, which is suitable only for
7bit US-ASCII, AFAIS. For a clueless chap like me the line:

Content-Type: text/plain; charset=utf-8; format=flowed

appears obviously wrong to be UTF-8 for the whole message, so UTF8TOCP.COM
does no conversion at all.

Whew! - I've no idea how to make this work...

TIA for hints,

R-

--
Wieso, weshalb, warum?
Wer mich fragt bleibt dumm.

Mateusz Viste

unread,
Apr 20, 2022, 12:33:00 PM4/20/22
to
On 20 Apr 2022 15:54:57 GMT Roland White wrote:

> UTF8TOCP.COM 437 ANYTEXT.TXT
>
> only /shows/ ANYTEXT.TXT converted, but does not write it (as
> starry-eyed supposed #-).

How about using the standard redirector?

utf8tocp 437 anytext.txt > anytext2.txt

> to make it readable with PMAIL 3.50, which is suitable only for
> 7bit US-ASCII,

utf8tocp won't help there, as it converts UTF-8 to 8-bit codepages. Are
you sure that PMAIL handles only 7-bit ("low ASCII") characters?

> Content-Type: text/plain; charset=utf-8; format=flowed
>
> appears obviously wrong to be UTF-8 for the whole message, so
> UTF8TOCP.COM does no conversion at all.

utf8tocp converts UTF-8 text to a codepage (and vice versa), it knows
nothing about header declarations. That's for you to fix manually. Or
leave it as-is, perhaps PMAIL will ignore the declaration if it assumes
a specific codepage anyway.

Mateusz

Roland White

unread,
Apr 22, 2022, 2:19:03 AM4/22/22
to
Mateusz Viste <mat...@xyz.invalid> schrieb:
> On 20 Apr 2022 15:54:57 GMT Roland White wrote:
>
>> UTF8TOCP.COM 437 ANYTEXT.TXT
>>
>> only /shows/ ANYTEXT.TXT converted, but does not write it (as
>> starry-eyed supposed #-).
>
> How about using the standard redirector?
>
> utf8tocp 437 anytext.txt > anytext2.txt

Of course the first step. But...
anytext.txt:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
anytext2.txt
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.

Changes only on screen, not written.

>> to make it readable with PMAIL 3.50, which is suitable only for
>> 7bit US-ASCII,
>
> utf8tocp won't help there, as it converts UTF-8 to 8-bit codepages. Are
> you sure that PMAIL handles only 7-bit ("low ASCII") characters?

Mmh. No. However: Incoming E-Mails with umlauts are not really readable.
"Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS there is no
option available to change.

It looks like a file PEGASUS.DE or so is compulsory in that case; cit.:
"
Pegasus Mail v3.x is fully internationalizable: all text, strings
and data structures are stored in PEGASUS.RSC and loaded at runtime.
At the time of release for 3.x, translations are under way for the
following languages: Dutch, German, Czech, French, Spanish, Finnish
and Portuguese. [...]
When international versions of Pegasus Mail are made available, they
will appear as replacements for PEGASUS.RSC; these files should be
copied into the same directory as PMAIL.EXE
"
...but probably not existing. According to

http://www.dendarii.co.uk/FAQs/pmail-addons.html#dos

things getting easier downgrading to 3.20 - wherever this version could be
found in the abyss.

R-

Mateusz Viste

unread,
Apr 22, 2022, 7:40:07 AM4/22/22
to
On 22 Apr 2022 06:19:02 GMT Roland White wrote:

> Of course the first step. But...
> anytext.txt:
> <9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
> anytext2.txt
> <9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
>
> Changes only on screen, not written.

You mean that utf8tocp does no data processing at all, ie. data is
left unchanged?

But anytext.txt was supposed to be UTF-8, while what you show above is
not UTF-8... What is it exactly that you are trying to achieve?

> Mmh. No. However: Incoming E-Mails with umlauts are not really
> readable. "Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS
> there is no option available to change.

Sounds simply like a codepage mismatch. Ie. you receive emails in one
codepage, and your DOS system uses a different codepage.

utf8tocp can possibly help converting the codepages, but you need to
know exactly what the source and target codepages are.

Mateusz

Roland White

unread,
Apr 23, 2022, 3:09:34 AM4/23/22
to
Mateusz Viste <mat...@xyz.invalid> schrieb:
> On 22 Apr 2022 06:19:02 GMT Roland White wrote:
>
>> Of course the first step. But...
>> anytext.txt:
>> <9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
>> anytext2.txt
>> <9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.
>>
>> Changes only on screen, not written.
>
> You mean that utf8tocp does no data processing at all, ie. data is
> left unchanged?

I mean in this example UTF8TOCP does data processing only on screen.

Detailed Elegy:

"type c:\texte\anytext.txt" shows:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren.

"c:\trullala\utf8tocp.com 437 anytext.txt >anytext2.txt" shows on screen:
Überflüssige Ölfaßüberläufe, Ähren, verölte, ätzende Faßdauben

"type c:\texte\anytext2.txt" shows afterwards:
<9A>berfl<81>ssige <99>lfa<E1><81>berl<84>ufe, <94>lhaltige <8E>hren

> But anytext.txt was supposed to be UTF-8, while what you show above is
> not UTF-8...

Therefore I wonder why it appears correctly converted on screen.

> What is it exactly that you are trying to achieve?

Readable e-mails.

>> Mmh. No. However: Incoming E-Mails with umlauts are not really
>> readable. "Ölfaßüberläufe" is just a chunk of box drawings. And AFAIS
>> there is no option available to change.
>
> Sounds simply like a codepage mismatch. Ie. you receive emails in one
> codepage, and your DOS system uses a different codepage.

My DOS system is SvarDOS. Corresponding to autoexec.bat:
MODE CON CP PREPARE=((858) %DOSDIR%\CPI\EGA.CPX)
MODE CON CP SELECT=858

> utf8tocp can possibly help converting the codepages, but you need to
> know exactly what the source and target codepages are.

I expected that a file declared by automatism as UTF-8 /is/ UTF-8,
regardless if it's an e-mail body, a rubbish text file or Schiller's Lied
von der Glocke.

R-

--
Du hälst Wiederspruch im vorraus für Standart? - Also mir Macht daß
keiner weiß, ich habe Rückrat!

Mateusz Viste

unread,
Apr 23, 2022, 3:28:24 AM4/23/22
to
On 23 Apr 2022 07:09:32 GMT Roland White wrote:

> I mean in this example UTF8TOCP does data processing only on screen.

That would be very strange, but maybe I am misunderstanding something.
Can you please send me by email the original file ("anytext.txt") so I
can look at it and try to reproduce the exact issue on my side?
mateusz - at - viste - punkt - fr

Mateusz

Mateusz Viste

unread,
Apr 23, 2022, 10:20:10 AM4/23/22
to
Hello Roland,

Thanks for the file. What you have sent me is a simple UTF-8 encoded
phrase. I passed it through utf8tocp and it did work as expected.

See here: https://imgpile.com/i/5OTVhF

I'm using utf8tocp v0.9.4 under SvarDOS. My system runs under codepage
437, while yours is set to CP 858, but this shouldn't matter since
German glyphs are at the same positions in both these pages.

So I can only wonder what happens on your side...

Mateusz

Roland White

unread,
Apr 24, 2022, 2:24:12 AM4/24/22
to
Mateusz Viste <mat...@xyz.invalid> schrieb:
...waddadamf^H^HMea culpa!

As an old oafish Linux User I used nothing else but LESS.EXE for viewing.
On the other hand being unsure concerning common knowledge I wrote TYPE in
my posting instead of LESS. And yes, effects are different.

I'm really sorry!

Mateusz Viste

unread,
Apr 24, 2022, 3:55:42 AM4/24/22
to
On 24 Apr 2022 06:24:11 GMT Roland White wrote:

> As an old oafish Linux User I used nothing else but LESS.EXE for
> viewing. On the other hand being unsure concerning common knowledge I
> wrote TYPE in my posting instead of LESS. And yes, effects are
> different.

Ah, so that's where those weird <9A> <81> etc strings were coming from.

LESS.EXE seems indeed to be a poor viewer, as it does not display 8-bit
characters, only their hex values. I guess the lesson here is that it's
better to use native DOS tools instead of ports from some exotic
systems. ;-)

You might also give a try to FoxType, if you'd like to read UTF-8 files
directly.

Mateusz

0 new messages