Google Groups Home
Help | Sign in
Message from discussion RfD: XCHAR wordset
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Alex McDonald  
View profile
 More options Jul 16 2007, 8:39 am
Newsgroups: comp.lang.forth
From: Alex McDonald <b...@rivadpm.com>
Date: Mon, 16 Jul 2007 05:39:13 -0700
Local: Mon, Jul 16 2007 8:39 am
Subject: Re: RfD: XCHAR wordset
On Jul 16, 12:28 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Alex McDonald <b...@rivadpm.com> writes:
> >Bernd Paysan wrote:

> >[snipped]

> >Unfortunately, on first analysis, this is one proposal that Win32Forth
> >will not be adopting any time soon.

> >Windows is UTF-16, which is not ASCII compliant. Although Windows
> >provides APIs to translate from locale to locale, there is no method in
> >Win32Forth to automatically identify which parameters would be require
> >to be translated from XHCARS to UTF-16 and back; the programmer would be
> >responsible for coding the conversions.

> I don't see that you are any worse off with xchars in this situation
> than with chars.

The au would be 16bits, with a max of 127 characters in a counted
string. This might be considered too short.  It would be a pretty big
change as well, as there are a good few COUNTs and C@ in a lot of
Win32Forth code.

I didn't see an X-STRING-SIZE (a poor name, I know) in Bernd's
proposal; for conversion between encodings I would have thought it
useful.

As a general note, it's worth following the Unicode 5.0 standard for
malformed Unicode; to throw an error in all such cases. The XCHARS
standard should be explicit about which Unicode processing standard it
adheres to (or insist that the implementor name the standard).

> >We would need something like the proposal Anton made at EuroForth 2006
> >(http://dec.bournemouth.ac.uk/forth/euro/ef06/ertl06.pdf, A Portable C
> >Function Call Interface), with extensions to identify string pointers,
> >before implementing this.

> For strings my approach in the C interface is that one needs to
> convert explicitly.  Even without Unicode, you already have the
> problem of needing zero-termination in C and explicit length counts in
> Forth.  Hmm, maybe we need some support words for the conversion.

There's also a Java style null ("modified UTF-8"), encoded as 0xc0
0x80. It has some advantages, as C won't stop on it when using
strlen(), and strings with imbedded nulls can be correctly passed to C
(for instance, when using C to write to file).

Win32Forth makes sure strings are null terminated (and the programmer
needs to be aware of this when allocating buffers for string handling;
they need to be one byte longer than required by the string).


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google