Korean character set info

Kj

unread,

Apr 21, 2004, 8:03:03 PM4/21/04

to perl6-i...@perl.org

Hello folks,

This will be of interest to only a few people, but it will be good to
have it in the archives for when we need it.

Here is a list of Korean character sets that represent hangul (Korean
symbols) and hanja (Sino-Korean):

- EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
- ISO-2022 comes in both -JP and -KR versions.
- johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits for
leading consonant, vowel, optional consonant(s) at the end
http://trade.chonbuk.ac.kr/~leesl/code/johap.gif

The URL above goes to a useful table for working with johab. I do
know it is a legacy charset, but I don't know how much it is still
used. Technically, ASCII is legacy, too. :)

Do we have any local experts on Japanese charsets? If not, I can do
a little bit of research there, too.

Cheers,

~kj

Dan Sugalski

unread,

Apr 22, 2004, 11:31:21 AM4/22/04

to kj, perl6-i...@perl.org

At 6:03 PM -0600 4/21/04, kj wrote:
>Hello folks,
>
> This will be of interest to only a few people, but it will be good
>to have it in the archives for when we need it.
>
> Here is a list of Korean character sets that represent hangul
>(Korean symbols) and hanja (Sino-Korean):
>
>- EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
>- ISO-2022 comes in both -JP and -KR versions.
>- johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits
>for leading consonant, vowel, optional consonant(s) at the end
> http://trade.chonbuk.ac.kr/~leesl/code/johap.gif

Ah, cool. Looks like that stuff's in the O'reilly CJKV book (which I
desperately want a second edition of) but that book's a bit slanted
towards Chinese and Japanese.

> The URL above goes to a useful table for working with johab. I do
>know it is a legacy charset, but I don't know how much it is still
>used. Technically, ASCII is legacy, too. :)

Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
lives, nobody's got much standing to call a character set "Legacy" :)

> Do we have any local experts on Japanese charsets? If not, I can
>do a little bit of research there, too.

There, at least, I can get access to folks who've done work, and I
can get by enough myself that I'm not too worried.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Jeff Clites

unread,

Apr 22, 2004, 11:51:59 AM4/22/04

to Dan Sugalski, kj, perl6-i...@perl.org

On Apr 22, 2004, at 8:31 AM, Dan Sugalski wrote:

> At 6:03 PM -0600 4/21/04, kj wrote:
>
>> The URL above goes to a useful table for working with johab. I do
>> know it is a legacy charset, but I don't know how much it is still
>> used. Technically, ASCII is legacy, too. :)
>
> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
> lives, nobody's got much standing to call a character set "Legacy" :)

Unicode is an actively evolving standard. It's far from legacy.

JEff

Dan Sugalski

unread,

Apr 22, 2004, 12:01:52 PM4/22/04

to Jeff Clites, kj, perl6-i...@perl.org

That evolution is what does it--every deployed version of Unicode is
legacy, as there's always something to supplant it. Which arguably
makes things worse in some cases--I'm waiting for us to run into
problems when we start handing Unicode 4.0-compatible text off to
system services expecting 3.x or 2.x code. Made worse in some ways
because almost nobody'll notice, since most everyone we have doing
stuff can get by with what the 2.0 standard provides.

Jarkko Hietaniemi

unread,

Apr 22, 2004, 12:17:18 PM4/22/04

to perl6-i...@perl.org, Dan Sugalski, kj, perl6-i...@perl.org

> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
> lives, nobody's got much standing to call a character set "Legacy" :)

I suggest Parrot's native character set to be cuneiform.

Jeff Clites

unread,

Apr 22, 2004, 12:18:49 PM4/22/04

to Dan Sugalski, kj, perl6-i...@perl.org

On Apr 22, 2004, at 9:01 AM, Dan Sugalski wrote:

> At 8:51 AM -0700 4/22/04, Jeff Clites wrote:
>> On Apr 22, 2004, at 8:31 AM, Dan Sugalski wrote:
>>
>>> At 6:03 PM -0600 4/21/04, kj wrote:
>>>
>>>> The URL above goes to a useful table for working with johab. I
>>>> do know it is a legacy charset, but I don't know how much it is
>>>> still used. Technically, ASCII is legacy, too. :)
>>>
>>> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
>>> lives, nobody's got much standing to call a character set "Legacy"
>>> :)
>>
>> Unicode is an actively evolving standard. It's far from legacy.
>
> That evolution is what does it--every deployed version of Unicode is
> legacy, as there's always something to supplant it. Which arguably
> makes things worse in some cases--I'm waiting for us to run into
> problems when we start handing Unicode 4.0-compatible text off to
> system services expecting 3.x or 2.x code. Made worse in some ways
> because almost nobody'll notice, since most everyone we have doing
> stuff can get by with what the 2.0 standard provides.

Take a look at the following two pages for information on how the
Unicode standard deals with change. It's exceedingly conservative, and
designed specifically so that the sorts of problems you seem to be
worrying about, in fact do not exist. The point of revisions is mainly
to add new characters, and of course a system based on an older
revision of the standard will not know about these characters, but
since day 1 systems have needed to deal gracefully with unassigned code
points. It's a non-problem.

http://www.unicode.org/faq/cope_change.html
http://www.unicode.org/standard/stability_policy.html

Unicode has been carefully designed with this sort of stability to
change (or, backwards-compatibility, if you will) in mind.

JEff

Chromatic

unread,

Apr 22, 2004, 1:00:45 PM4/22/04

to Jarkko Hietaniemi, perl6-i...@perl.org

... but only for constants.

-- c

George R

unread,

Apr 22, 2004, 3:07:12 PM4/22/04

to Dan Sugalski, kj, perl6-i...@perl.org

Dan Sugalski wrote:

> At 6:03 PM -0600 4/21/04, kj wrote:
>
>> Hello folks,
>>
>> This will be of interest to only a few people, but it will be good
>> to have it in the archives for when we need it.
>>
>> Here is a list of Korean character sets that represent hangul
>> (Korean symbols) and hanja (Sino-Korean):
>>
>> - EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
>> - ISO-2022 comes in both -JP and -KR versions.
>> - johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits for
>> leading consonant, vowel, optional consonant(s) at the end
>> http://trade.chonbuk.ac.kr/~leesl/code/johap.gif
>
>
> Ah, cool. Looks like that stuff's in the O'reilly CJKV book (which I
> desperately want a second edition of) but that book's a bit slanted
> towards Chinese and Japanese.
>
>> The URL above goes to a useful table for working with johab. I do
>> know it is a legacy charset, but I don't know how much it is still
>> used. Technically, ASCII is legacy, too. :)
>
>
> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
> lives, nobody's got much standing to call a character set "Legacy" :)
>
>> Do we have any local experts on Japanese charsets? If not, I can
>> do a little bit of research there, too.
>
>
> There, at least, I can get access to folks who've done work, and I can
> get by enough myself that I'm not too worried.

I don't agree with the Unicode legacy comment... :-(

But if you want to see another source of mapping tables, you can try
this one: http://oss.software.ibm.com/icu/charset/index.html

I'm sure Dan and others are aware of ICU's charset repository. It
contains mapping tables that I have been able collect from various
platforms. Others may find it useful too.

Unicode can also represent the hangul and hanja characters.

George

Jarkko Hietaniemi

unread,

Apr 22, 2004, 4:21:33 PM4/22/04

to Chromatic, perl6-i...@perl.org

>>>Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
>>>lives, nobody's got much standing to call a character set "Legacy" :)
>>
>>I suggest Parrot's native character set to be cuneiform.
>
>
> ... but only for constants.

Yeah, I was going to propose the Phaistos disc signs for the variable
variables.

grapheus

unread,

Apr 23, 2004, 3:09:47 AM4/23/04

to

j...@iki.fi (Jarkko Hietaniemi) wrote in message news:<4088294D...@iki.fi>...

The Phaistos Disk's signs have been codified by Evans in 1910, and
everybody except ignorant kooks use it.
(BTW, the Phaistos Disk has also been *definitely* deciphered by J.
Faucounau)

grapheus

Bryan C. Warnock

unread,

Apr 24, 2004, 1:40:40 PM4/24/04

to perl6-i...@perl.org

On Thu, 2004-04-22 at 12:18, Jeff Clites wrote:
> Unicode is an actively evolving standard. It's far from legacy.

On Thu, 2004-04-22 at 15:07, George R wrote:
> I don't agree with the Unicode legacy comment... :-(

Creating tomorrow's legacy today. :-)

--
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)