Thx,
Conor
--
http://groups.google.com/group/Hardhats
To unsubscribe, send email to Hardhats+u...@googlegroups.com
-- GT.M - Rock solid. Lightning fast. Secure. No compromises.
[KSB2] No, there is not such a built in function. I'd suggest opening a
pipe to another GT.M process, operating in M mode, to which you pass the
string and have it zwrite the string back.
However, GT.M will tolerate printable non-ASCII characters in string
literals and in comments in M code
GTM>set x="ϨϩϪϫϬϭϮϯϰϱϲϳϴϵ϶Ϸ"
GTM>zwr x
x="ϨϩϪϫϬϭϮϯϰϱϲϳϴϵ϶Ϸ"
GTM>write "Hello" ; ;zero;eins;deux;tres;quattro;пять;ستة;सात;捌;ஒன்பது
Hello
GTM>
Regards
-- Bhaskar
On 12/03/2010 01:05 PM, fmql wrote:
> Thanks all. Just to clarify, fill in some blanks ...
>
> 1) is there a portable way for a program to know the character set of
> its VM? I see zchset ($ZCHSET="M" or $ZCHSET="UTF-8" ) is in GT/M but
> this "Z" variable doesn't seem to be in Cache?
[KSB3] I can't speak for other MUMPS implementations, but one sneaky
possibility might be $l($c(256)):
GTM>write $zchset
M
GTM>w $length($c(255))
1
GTM>w $length($c(256))
0
GTM>
and
GTM>write $zchset
UTF-8
GTM>w $length($c(255))
1
GTM>w $length($c(256))
1
GTM>
> 2) what is the difference between the character sets "ASCII" and "M".
> Is "M" Latin-1? It certainly seems to decode as latin-1 (though Sam
> insists it's not!)
[KSB] ASCII and the official MUMPS character sets only ascribe meanings
to $c(0) through $c(127). In M mode $c(128) through $c(255) are simply
byte values whose meaning is whatever the application chooses to use for
them. Encodings in the ISO-8859 family are commonly used by applications
running on GT.M. But the encodings for different languages for $c(128)
through $c(255) are different.
I hope this clarifies rather than confuses!
> 3) Can someone with character set UTF-8 tell me if $ASCII, despite its
> name, works on characters after 255? Is X=$A($C(X) when X > 255? I get
> -1 for X if X > 255 in my systems but that's because they are "M".
[KSB] Yes, although the name $ASCII() is something of a misnomer, we
retained it for upward compatibility when we added Unicode support.
GTM>w $ascii("ϵ")
1013
GTM>w $ascii($char(999))
999
GTM>
> On a utf-8, does the (misnamed?) ASCII keeps working ala "ord" in
> python? Python has "chr" ala "$c" and "ord", its opposite, which is
> equivalent to "$a" for the ascii characters. What happens when you go
> to utf-8? Does $a still act like "ord"?
>
> 4) then on serializing the utf for 128 on into the form \u#### ...
> If the answer to 3 is yes - $a keeps working on utf-8 systems - then I
> just need I $A(STRVAL(I)) > 128 S SERIALVALUE="\u"_NEXTHEX. (BTW, I'm
> right in thinking there is no formal support for hex numbers in MUMPS,
> right?)
[KSB] I am sorry to have to take away this excuse for you to write code
(I personally find programming to be therapeutic)! 8-)
The utility functions %UTF2HEX and %HEX2UTF are described in the Unicode
support technical bulletin. For all GT.M documentation, go to
http://fis-gtm.com and click on the User Documentation tab. Source code
is included in the distribution if you want to modify them.
Regards
-- Bhaskar
--
GT.M - Rock solid. Lightning fast. Secure. No compromises.
_____________
Thanks all. Just to clarify, fill in some blanks ...
1) is there a portable way for a program to know the character set of
its VM? I see zchset ($ZCHSET="M" or $ZCHSET="UTF-8" ) is in GT/M but
this "Z" variable doesn't seem to be in Cache?
2) what is the difference between the character sets "ASCII" and "M".
Is "M" Latin-1? It certainly seems to decode as latin-1 (though Sam
insists it's not!)
3) Can someone with character set UTF-8 tell me if $ASCII, despite its
name, works on characters after 255? Is X=$A($C(X) when X > 255? I get
-1 for X if X > 255 in my systems but that's because they are "M".
On a utf-8, does the (misnamed?) ASCII keeps working ala "ord" in
python? Python has "chr" ala "$c" and "ord", its opposite, which is
equivalent to "$a" for the ascii characters. What happens when you go
to utf-8? Does $a still act like "ord"?
4) then on serializing the utf for 128 on into the form \u#### ...
If the answer to 3 is yes - $a keeps working on utf-8 systems - then I
just need I $A(STRVAL(I)) > 128 S SERIALVALUE="\u"_NEXTHEX. (BTW, I'm
right in thinking there is no formal support for hex numbers in MUMPS,
right?)
Thanks again,
Conor