Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

BYTESIZE(CHAR)=1 in FileRd.m3??? WideText interfaces?

2 views
Skip to first unread message

Dan Connolly

unread,
Nov 17, 1994, 12:18:14 PM11/17/94
to

[I posted about this a while ago, but I think my news software
ate it.]

I have been in the middle of discussions about character sets,
encodings, Unicode, UTF-8, ad nauseum in the HTML forums, so I
noticed the following code in FileRd.m3:

http://www.research.digital.com/SRC/m3sources/html/rw/src/Common/FileRd.m3

n := rd.sourceH.read(
SUBARRAY(LOOPHOLE(ADR(rd.buff[0]), ByteArrayPtr)^, 0,
NUMBER(rd.buff^)), mayBlock := NOT dontBlock)

This code seems to assume that NUMBER(rd.buff^) = BYTESIZE(rd.buff^),
which implies, since rd.buff^ is an ARRAY OF CHAR, that
BYTESIZE(CHAR)=1.

In the language definition, it says:

CHAR An enumeration containing at least 256 elements

The first 256 elements of type CHAR represent characters in the
ISO-Latin-1 code, which is an extension of ASCII.


Nowhere (that I can find) does it say that BYTESIZE(CHAR)=1.

Would someone care to characterize the above code as:

(1) an isolated defect in the libm3 code -- easily fixed.

(2) a pervasive defect in libm3 -- lots of work to fix it.

(3) by design -- libm3 is not designed to work on platforms
where BYTESIZE(CHAR)>1.

(4) correct -- the language definition guarantees BYTESIZE(CHAR)=1,
and I just didn't see it.

My guess is (2) or (3).


So... has anybody worked on applications involving multibyte character
encodings or wide characters? Has anyone developed a WideText, WideWr
interface or some such? How about UTF-8 -> Unicode tranlations?

Doesn't Windows-NT use a 16-bit character representation in many of
its data structures? I wonder what type is used for those data structures...

(surf surf surf... ah!)

* UNICODE (Wide Character) types

TYPE
WCHAR = Ctypes.unsigned_short; (* wc, 16-bit UNICODE character *)

which is: [-16_8000 .. 16_7fff]


(by the way... doesn't WinNT.i3 border on copyright infringement,
or divulging trade secrets or somesuch?)

--
Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<conn...@hal.com> http://www.hal.com/%7Econnolly

Bill Kalsow

unread,
Nov 18, 1994, 9:38:48 AM11/18/94
to
In article <CONNOLLY.94...@ulua.hal.com>, conn...@ulua.hal.com (Dan Connolly) writes:

> n := rd.sourceH.read(
> SUBARRAY(LOOPHOLE(ADR(rd.buff[0]), ByteArrayPtr)^, 0,
> NUMBER(rd.buff^)), mayBlock := NOT dontBlock)
>
> This code seems to assume that NUMBER(rd.buff^) = BYTESIZE(rd.buff^),
> which implies, since rd.buff^ is an ARRAY OF CHAR, that
> BYTESIZE(CHAR)=1.

It also assumes that BYTESIZE(CHAR)=BYTESIZE(File.Byte).

A better version might be

n := rd.sourceH.read(
SUBARRAY(LOOPHOLE(ADR(rd.buff[0]), ByteArrayPtr)^, 0,

BITSIZE(rd.buff^) DIV BITSIZE (File.Byte)),
mayBlock := NOT dontBlock)

>
> Would someone care to characterize the above code as:
>
> (1) an isolated defect in the libm3 code -- easily fixed.
>
> (2) a pervasive defect in libm3 -- lots of work to fix it.
>
> (3) by design -- libm3 is not designed to work on platforms
> where BYTESIZE(CHAR)>1.
>
> (4) correct -- the language definition guarantees BYTESIZE(CHAR)=1,
> and I just didn't see it.

It's (1). There isn't much unsafe code and there's even less
that must deal with the impedence mismatch between CHAR and File.Byte.

But, my experience is that much of the unsafe code is buggy.
Programmers who grew up with 8-bit characters on byte-addressed machines
(myself included) are quite sloppy with NUMBER, BYTESIZE, and ADRSIZE.

> So... has anybody worked on applications involving multibyte character
> encodings or wide characters? Has anyone developed a WideText, WideWr
> interface or some such? How about UTF-8 -> Unicode tranlations?

I haven't heard of any Unicode based work in Modula-3.

> Doesn't Windows-NT use a 16-bit character representation in many of
> its data structures? I wonder what type is used for those data structures...

Windows-NT offers both 8 and 16-bit versions of most of its interfaces.
The Modula-3 veneer provides access to both versions. The version 3.4
compiler retains the 8-bit CHAR.

> (by the way... doesn't WinNT.i3 border on copyright infringement,
> or divulging trade secrets or somesuch?)

I don't believe so. It's a translation of a public API. It contains
no trade secrets.

- Bill Kalsow

0 new messages