This will be of interest to only a few people, but it will be good to
have it in the archives for when we need it.
Here is a list of Korean character sets that represent hangul (Korean
symbols) and hanja (Sino-Korean):
- EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
- ISO-2022 comes in both -JP and -KR versions.
- johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits for
leading consonant, vowel, optional consonant(s) at the end
http://trade.chonbuk.ac.kr/~leesl/code/johap.gif
The URL above goes to a useful table for working with johab. I do
know it is a legacy charset, but I don't know how much it is still
used. Technically, ASCII is legacy, too. :)
Do we have any local experts on Japanese charsets? If not, I can do
a little bit of research there, too.
Cheers,
~kj
Ah, cool. Looks like that stuff's in the O'reilly CJKV book (which I
desperately want a second edition of) but that book's a bit slanted
towards Chinese and Japanese.
> The URL above goes to a useful table for working with johab. I do
>know it is a legacy charset, but I don't know how much it is still
>used. Technically, ASCII is legacy, too. :)
Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
lives, nobody's got much standing to call a character set "Legacy" :)
> Do we have any local experts on Japanese charsets? If not, I can
>do a little bit of research there, too.
There, at least, I can get access to folks who've done work, and I
can get by enough myself that I'm not too worried.
--
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
> At 6:03 PM -0600 4/21/04, kj wrote:
>
>> The URL above goes to a useful table for working with johab. I do
>> know it is a legacy charset, but I don't know how much it is still
>> used. Technically, ASCII is legacy, too. :)
>
> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
> lives, nobody's got much standing to call a character set "Legacy" :)
Unicode is an actively evolving standard. It's far from legacy.
JEff
That evolution is what does it--every deployed version of Unicode is
legacy, as there's always something to supplant it. Which arguably
makes things worse in some cases--I'm waiting for us to run into
problems when we start handing Unicode 4.0-compatible text off to
system services expecting 3.x or 2.x code. Made worse in some ways
because almost nobody'll notice, since most everyone we have doing
stuff can get by with what the 2.0 standard provides.
I suggest Parrot's native character set to be cuneiform.
> At 8:51 AM -0700 4/22/04, Jeff Clites wrote:
>> On Apr 22, 2004, at 8:31 AM, Dan Sugalski wrote:
>>
>>> At 6:03 PM -0600 4/21/04, kj wrote:
>>>
>>>> The URL above goes to a useful table for working with johab. I
>>>> do know it is a legacy charset, but I don't know how much it is
>>>> still used. Technically, ASCII is legacy, too. :)
>>>
>>> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
>>> lives, nobody's got much standing to call a character set "Legacy"
>>> :)
>>
>> Unicode is an actively evolving standard. It's far from legacy.
>
> That evolution is what does it--every deployed version of Unicode is
> legacy, as there's always something to supplant it. Which arguably
> makes things worse in some cases--I'm waiting for us to run into
> problems when we start handing Unicode 4.0-compatible text off to
> system services expecting 3.x or 2.x code. Made worse in some ways
> because almost nobody'll notice, since most everyone we have doing
> stuff can get by with what the 2.0 standard provides.
Take a look at the following two pages for information on how the
Unicode standard deals with change. It's exceedingly conservative, and
designed specifically so that the sorts of problems you seem to be
worrying about, in fact do not exist. The point of revisions is mainly
to add new characters, and of course a system based on an older
revision of the standard will not know about these characters, but
since day 1 systems have needed to deal gracefully with unassigned code
points. It's a non-problem.
http://www.unicode.org/faq/cope_change.html
http://www.unicode.org/standard/stability_policy.html
Unicode has been carefully designed with this sort of stability to
change (or, backwards-compatibility, if you will) in mind.
JEff
... but only for constants.
-- c
> At 6:03 PM -0600 4/21/04, kj wrote:
>
>> Hello folks,
>>
>> This will be of interest to only a few people, but it will be good
>> to have it in the archives for when we need it.
>>
>> Here is a list of Korean character sets that represent hangul
>> (Korean symbols) and hanja (Sino-Korean):
>>
>> - EUC-KR (KSC 5601, renamed to KS X 1001) or Microsoft's superset UHC
>> - ISO-2022 comes in both -JP and -KR versions.
>> - johab is a legacy 16-bit encoding, leading bit = 1 + 3 * 5 bits for
>> leading consonant, vowel, optional consonant(s) at the end
>> http://trade.chonbuk.ac.kr/~leesl/code/johap.gif
>
>
> Ah, cool. Looks like that stuff's in the O'reilly CJKV book (which I
> desperately want a second edition of) but that book's a bit slanted
> towards Chinese and Japanese.
>
>> The URL above goes to a useful table for working with johab. I do
>> know it is a legacy charset, but I don't know how much it is still
>> used. Technically, ASCII is legacy, too. :)
>
>
> Ah, at this point Unicode's legacy too. Besides, as long as RAD-50
> lives, nobody's got much standing to call a character set "Legacy" :)
>
>> Do we have any local experts on Japanese charsets? If not, I can
>> do a little bit of research there, too.
>
>
> There, at least, I can get access to folks who've done work, and I can
> get by enough myself that I'm not too worried.
I don't agree with the Unicode legacy comment... :-(
But if you want to see another source of mapping tables, you can try
this one: http://oss.software.ibm.com/icu/charset/index.html
I'm sure Dan and others are aware of ICU's charset repository. It
contains mapping tables that I have been able collect from various
platforms. Others may find it useful too.
Unicode can also represent the hangul and hanja characters.
George
Yeah, I was going to propose the Phaistos disc signs for the variable
variables.
The Phaistos Disk's signs have been codified by Evans in 1910, and
everybody except ignorant kooks use it.
(BTW, the Phaistos Disk has also been *definitely* deciphered by J.
Faucounau)
grapheus
On Thu, 2004-04-22 at 15:07, George R wrote:
> I don't agree with the Unicode legacy comment... :-(
Creating tomorrow's legacy today. :-)
--
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)