Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

utf-32?

21 views
Skip to first unread message

Satoshi Imai

unread,
Jul 31, 2003, 12:03:59 PM7/31/03
to
Hello.

Do you have a plan to support the encoding of utf-32.

ID3tag may use various Unicode encodings at some future date?

Now Does Tcl8.4.x support utf-8 and utf-16 encoding?

-----
Satoshi Imai

Aric Bills

unread,
Jul 31, 2003, 12:54:23 PM7/31/03
to
8.4 has a utf-8 and a "unicode" encoding--I think the latter is equivalent
to utf-16. Can someone verify that?

[encoding names] will give you a list of the currently supported encodings.
[fconfigure -encoding] is the command you'd use to make sure you were
reading/writing a file in the desired format.

-Aric


"Satoshi Imai" <s-i...@japan.interq.or.jp> wrote in message
news:71dec71a.03073...@posting.google.com...

Michael Schlenker

unread,
Jul 31, 2003, 4:47:58 PM7/31/03
to
Satoshi Imai wrote:

If i understand it correctly, Tcl 8.4 supports UTF-8 with up to 4 bytes.
The encoding named "Unicode" is UCS-2, so not full UTF-16 AFAIK

I don't know of plans to support UTF-32.

Read:
http://sourceforge.net/tracker/index.php?func=detail&aid=578030&group_id=10894&atid=110894

For some clarification of terms see http://www.unicode.org/glossary/
http://czyborra.com/utf/

Michael

Joe English

unread,
Jul 31, 2003, 8:01:19 PM7/31/03
to
Aric Bills wrote:
>8.4 has a utf-8 and a "unicode" encoding--I think the latter is equivalent
>to utf-16. Can someone verify that?

Tcl's (badly-named) "unicode" encoding is actually UCS-2, not UTF-16.

(The difference is that UTF-16 can represent characters up to
0x10FFFF by using "surrogate pairs", while UCS-2 only supports
the "basic multilingual plane", up to 0xFFFF).


--Joe English

jeng...@flightlab.com

Satoshi Imai

unread,
Jul 31, 2003, 11:58:33 PM7/31/03
to
Hello.

Michael Schlenker <sch...@uni-oldenburg.de> wrote...


> If i understand it correctly, Tcl 8.4 supports UTF-8 with up to 4 bytes.
> The encoding named "Unicode" is UCS-2, so not full UTF-16 AFAIK

If I understand it correctly, There are a lot of UTF-16.

UCS-2 ("Unicode")?
UTF-16 (with BOM)
UTF-16LE (without BOM)
UTF-16BE (without BOM)
etc.?

Which is the encoding named "Unicode"?

How about UTF-32 and UCS-4?

-----
Satoshi Imai

Michael Schlenker

unread,
Aug 1, 2003, 3:10:42 AM8/1/03
to
Satoshi Imai wrote:
> Hello.
>
> Michael Schlenker <sch...@uni-oldenburg.de> wrote...
>
>>If i understand it correctly, Tcl 8.4 supports UTF-8 with up to 4 bytes.
>>The encoding named "Unicode" is UCS-2, so not full UTF-16 AFAIK
>
>
> If I understand it correctly, There are a lot of UTF-16.
>
> UCS-2 ("Unicode")?
> UTF-16 (with BOM)
> UTF-16LE (without BOM)
> UTF-16BE (without BOM)
> etc.?
>
> Which is the encoding named "Unicode"?
UCS-2 as others said before.

>
> How about UTF-32 and UCS-4?
Read the discussion in the sourceforge link i sent in the last answer.

Michael

Aric Bills

unread,
Aug 1, 2003, 1:37:51 PM8/1/03
to
Joe and Michael,

Thanks for the clarification.

Aric

"Joe English" <jeng...@flightlab.com> wrote in message
news:bgcak...@enews4.newsguy.com...

Satoshi Imai

unread,
Aug 2, 2003, 6:23:24 AM8/2/03
to
Hello.

Michael Schlenker <sch...@uni-oldenburg.de> wrote...


> > How about UTF-32 and UCS-4?
> Read the discussion in the sourceforge link i sent in the last answer.

I'm sorry about that.
I understand.
Thank you.

-----
Satoshi Imai

Michael Schlenker

unread,
Aug 2, 2003, 7:27:24 AM8/2/03
to
Satoshi Imai wrote:
> Hello.
>
> Michael Schlenker <sch...@uni-oldenburg.de> wrote...
>
>>>How about UTF-32 and UCS-4?
>>
>>Read the discussion in the sourceforge link i sent in the last answer.
>
>
> I'm sorry about that.
Didn't want to sound rude. I was just to lazy to look up the link for
the quite long discussion about UCS-4 in the sourceforge bugtracker again.

Regards,
Michael Schlenker

BTW. If you want/need UTF-16/UTF-32 support for Tcl, I think the
maintainers would be happy about constructive critique (read: Patches),
or some other form of feedback. You could log this as a Feature Request
on Sourceforge for example, perhaps with hints on how to do it. Or as
always you could do it yourself and contribute. If you know C
programming, i urge you to take a look at the tcl core code and see how
well it is written and easy to read.

Joe English

unread,
Aug 2, 2003, 3:06:00 PM8/2/03
to
Michael Schlenker wrote:

>BTW. If you want/need UTF-16/UTF-32 support for Tcl, I think the
>maintainers would be happy about constructive critique (read: Patches),
>or some other form of feedback. You could log this as a Feature Request
>on Sourceforge for example, perhaps with hints on how to do it. Or as
>always you could do it yourself and contribute.

Tcl can already be compiled with support for UCS-4.
IIRC there are still a few problems with Tk, these are
being fixed as time goes by.

But Tcl can't really switch to full internal 32-bit Unicode
until 9.0, due to stubs compatibility requirements.


--Joe English

jeng...@flightlab.com

Satoshi Imai

unread,
Aug 2, 2003, 8:52:56 PM8/2/03
to
Hello.

Michael Schlenker <sch...@uni-oldenburg.de> wrote...


> BTW. If you want/need UTF-16/UTF-32 support for Tcl, I think the
> maintainers would be happy about constructive critique (read: Patches),

At least, I want UTF-16 support for Tcl.
Because ID3tag already uses UTF-16 and UTF-16BE and UTF-16LE.

[ID3 tag version 2.4.0 - Main Structure]
http://www.id3.org/id3v2.4.0-structure.txt

-----
Satoshi Imai

Jeff Hobbs

unread,
Aug 6, 2003, 1:02:17 AM8/6/03
to
Satoshi Imai wrote:

>>If i understand it correctly, Tcl 8.4 supports UTF-8 with up to 4 bytes.
>>The encoding named "Unicode" is UCS-2, so not full UTF-16 AFAIK
>
>
> If I understand it correctly, There are a lot of UTF-16.
>
> UCS-2 ("Unicode")?
> UTF-16 (with BOM)
> UTF-16LE (without BOM)
> UTF-16BE (without BOM)
> etc.?
>
> Which is the encoding named "Unicode"?

UCS-2. ucs2le and ucs2be are actually added by default on Tk/unix for
X server compat reasons. Perhaps these should be added as general
default encodings.

> How about UTF-32 and UCS-4?

UCS-4 can now be enabled with 8.4.4/8.5, but it is not recommended, as
you will be creating a core that is incompatible with other
distributions. I am not aware of any Japanese characters that require
UCS-4 mode though. What is the need for it?

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

0 new messages