XCCS-unknown_Japanese.txt

4 views
Skip to first unread message

Ron Kaplan

unread,
Apr 7, 2026, 2:09:31 AM (10 days ago) Apr 7
to Medley Interlisp core
As discussed in the 4/6/2026 tecnical meeting, that file in Unicode/xerox/ appears to have the character sets 167-172 that are listed in the 1987 and 1990 XCCS standard document as Fuji Xerox Private characters.

I looked at a handful of glyphs in the file, trying to see if I could see any correspondence to the glyphs in the document tables. I couldn't find any matches, at least not in any of the obvious places.

The comment that I wrote in the file (2021) says that these came from the XCCStoUni binary file, but that I couldn't find a correspondence then either. But it also says that the character set numbers were assigned by virtue of their sequence in that file, which may not be correct.

It would probably be a mistake to throw these into the MCCS space as nominal charsets 167-172, because those are defined in the documents with different glyphs, even if we don't have those mappings. We could renumber these unkowns and add them as new character sets in an undefined part of the MCCS codespace, but that seems rather pointless.

So I propose to do nothing.





Herb Jellinek

unread,
Apr 7, 2026, 3:15:05 PM (10 days ago) Apr 7
to lisp...@googlegroups.com
If by "do nothing" you mean that we shouldn't include XCCS codepoints in charsets 167q-172q, I agree.

That aside, it seems the XCCS2Uni mystery deepens!  If possible, I
'd like to mention that file as a source of our mappings in our submission to IANA.  Ron, can you contact the person you collaborated with in creating the mapping tables and ask him about the provenance of that file?  My Google searches have been unavailing.

Separately, I'm interested in and puzzled by the fact that there's an expansive Fuji Xerox Private Use Area (167q-172q) distinct from the much smaller "public" Private Use Area, 376q, and further that the FX Private Use Area contains assigned glyphs.  That's a funny kind of "private."

            Herb

Nick Briggs

unread,
Apr 7, 2026, 4:27:28 PM (10 days ago) Apr 7
to Herb Jellinek, Lisp Core
Looking at the character images in the XCCS standard, I found one (and the only) character in FX private use set 173q, which has a Unicode mapping:


#  "173" JIS

0x7B21 0xFA4C  #  社


Quizzing Claude about why it initially said the character was something (graphically) completely different it said:

U+FA4C is actually  again — but this time as a compatibility character from a different source than U+FA2D. Let me be upfront though: I'm not certain enough of the exact legacy standard that U+FA4C maps to off the top of my head, and I may have been somewhat confabulating specific details in my previous answer too.

What I can say reliably:

  • The CJK Compatibility Ideographs block contains multiple separate encodings of the same underlying character precisely because different legacy standards (JIS X 0208, JIS X 0213, Big5, CNS 11643, etc.) each independently encoded variants or duplicates, and Unicode needed a separate codepoint for each source to preserve round-trip fidelity with each one.
  • So it's entirely plausible that 社 appears at U+793E (canonical), U+FA2D, and U+FA4C because three different legacy standards treated it as a distinct encodable entity.
Looking at 172q, 0x7A in the Unknown file, there are FX private use entries for the assigned range, as well as a lot that are NOT in the standard document (there shouldn't be anything outside the left [top] range, 0x21-0x7E).


0x7A21 0x8D0A #  贊

0x7A22 0x7248 #  版

0x7A23 0x6B98 #  殘

0x7A24 0x6D6A #  浪

0x7A25 0x596A #  奪

0x7A26 0x4EC1 #  仁

0x7A27 0x812B #  脫

0x7A28 0x6B32 #  欲

0x7A29 0x594F #  奏

0x7A2A 0x6ED1 #  滑

0x7A2B 0x7246 #  牆

0x7A2C 0x964D #  降

0x7A2D 0x6E21 #  渡

0x7A2E 0x50C5 #  僅

0x7A2F 0x96F6 #  零

0x7A30 0x6A21 #  模

0x7A31 0x5F69 #  彩

0x7A32 0x5509 #  唉

0x7A33 0x5C64 #  層

0x7A34 0x6E1B #  減

0x7A35 0x5E3D #  帽

0x7A36 0x795D #  祝

0x7A37 0x7FFB #  翻

0x7A38 0x5922 #  夢

0x7A39 0x6101 #  愁

0x7A3A 0x6D25 #  津

0x7A3B 0x79FB #  移

0x7A3C 0x5C3C #  尼

0x7A3D 0x5C0A #  尊

0x7A3E 0x5E7C #  幼

0x7A3F 0x8A69 #  詩

0x7A40 0x4ED4 #  仔

0x7A41 0x5805 #  堅

0x7A42 0x8AA0 #  誠

0x7A43 0x80DE #  胞

0x7A44 0x69AE #  榮

0x7A45 0x6DFA #  淺

0x7A46 0x70CF #  烏

0x7A47 0x5DE7 #  巧

0x7A48 0x5510 #  唐

0x7A49 0x96F7 #  雷

0x7A4A 0x7169 #  煩

0x7A4B 0x7164 #  煤

0x7A4C 0x5BF8 #  寸

0x7A4D 0x6D89 #  涉

0x7A4E 0x8F38 #  輸

0x7A4F 0x8332 #  茲

0x7A50 0x95B1 #  閱

0x7A51 0x8836 #  蠶

0x7A52 0x518A #  冊

0x7A53 0x4E4F #  乏

0x7A54 0x85DD #  藝

0x7A55 0x559A #  喚

0x7A56 0x90F5 #  郵

0x7A57 0x7267 #  牧

0x7A58 0x5DE8 #  巨

0x7A59 0x614C #  慌

0x7A5A 0x6843 #  桃

0x7A5B 0x4F5B #  佛

0x7A5C 0x596E #  奮

0x7A5D 0x74F6 #  瓶

0x7A5E 0x8FFD #  追

0x7A5F 0x7D1B #  紛

0x7A60 0x6C57 #  汗

0x7A61 0x5192 #  冒

0x7A62 0x59CA #  姊

0x7A63 0x96EA #  雪

0x7A64 0x585E #  塞

0x7A65 0x57DF #  域

0x7A66 0x6BEB #  毫

0x7A67 0x6DB2 #  液

0x7A68 0x667A #  智

0x7A69 0x5446 #  呆

0x7A6A 0x6E90 #  源

0x7A6B 0x8CDE #  賞

0x7A6C 0x67F4 #  柴

0x7A6D 0x6C60 #  池

0x7A6E 0x51C6 #  准

0x7A6F 0x72D0 #  狐

0x7A70 0x8377 #  荷

0x7A71 0x8A2A #  訪

0x7A72 0x63DA #  揚

0x7A73 0x5049 #  偉

0x7A74 0x81E3 #  臣

0x7A75 0x66AB #  暫

0x7A76 0x5E72 #  干

0x7A77 0x64E1 #  擡

0x7A78 0x632F #  振

0x7A79 0x4F0A #  伊

0x7A7A 0x8805 #  蠅

0x7A7B 0x7387 #  率

0x7A7C 0x62FE #  拾

0x7A7D 0x7B56 #  策

0x7A7E 0x8CA7 #  貧


I don't believe that the mappings are correct here - looking at, for example, 0x7A76, the U+5E72 character ()is NOT what's in the printed standard -- that looks like
PastedGraphic-1.png

about which Claude says:

The image is a bit blurry but the character appears to be  (U+8FBB), a Japanese-specific character (tsuji) meaning "crossroads" or "street corner." It's composed of 辶 (the "movement/road" radical) with 十 (ten/cross) inside, visually representing an intersection.

Notably, 辻 is a kokuji — a character invented in Japan, not inherited from Chinese. It doesn't exist in classical Chinese and has no Chinese reading.

It's extremely common in Japanese surnames, e.g. 辻󠄀 (Tsuji).


However when pushed about the missing 2nd tick on the left, it says:

You're describing a character where the left component — the 辶 radical itself — has two short strokes at the top left instead of one.

This points to 辶 vs 辶 — the distinction between the one-dot and two-dot forms of the "road/movement" radical (辶, Kangxi radical 162). In some script traditions (particularly older or more formal Chinese styles), this radical is written with two dots at the top left rather than one.

So the character you're thinking of would be visually similar to 辻 but with the two-dot form of 辶, giving it an extra tick on the left.

This is indeed a recognised glyph variation in the Unihan world, and it's one of the cases handled by variation selectorsor by regional font differences — Japanese fonts typically render 辶 with one dot, while some Chinese and older Japanese sources use two dots.

However I want to be honest here: I'm not certain whether this specific variant of 辻 (two-dot 辶 + 十) has its own dedicated codepoint, or whether it's handled purely as a font/variation-selector matter like the right-side tick variant. It's quite possible it exists somewhere in the CJK Unified Ideographs Extension blocks.


and pushing it further:

This is very illuminating. Looking at the radical 162 table, under +2 residual strokes (not +3 as I initially said — 十 is 2 strokes), we can see:

辻JP appears in the +2 residual strokes row Wikipedia, confirming 辻 is radical 162 + 2 residual strokes.

Now, for your actual question — the two-dot form of radical 162 is  (U+2ECD, "CJK RADICAL WALK ONE"), and this is the traditional printing form. The one-dot form was introduced in post-war Japanese simplification. The table in the Wikipedia article shows that the two-dot form corresponds to U+8FB6 with variation selector U+E0100 or U+E0102, or the radical codepoint U+2ECD. Wikipedia

So the direct answer to your question is: there does not appear to be a separate standalone codepoint in the CJK Unified Ideographs block for a two-dot-radical version of . The two-dot variant is handled via:

  • U+8FBB (辻) + variation selector U+E0100 or U+E0102 for the two-dot glyph form
  • The distinction is purely a glyph/font matter, not a separate codepoint

The two-dot form ⻍ (4 strokes in dictionaries) was the traditional printing form, replaced in post-war Japan by the one-dot form ⻌ for jōyō kanji, but this change did not uniformly apply to hyōgai kanji like 辻.

So... I think at least one use of the FX private area is historical variants, where it is important to maintain the visual appearance (such as if we had to maintain PastedGraphic-2.png  vs  PastedGraphic-3.png)

-- Nick




--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lispcore/50c5164b-b18a-44cf-b7da-fd93b76e3059%40newscenter.com.

Herb Jellinek

unread,
Apr 7, 2026, 5:34:41 PM (10 days ago) Apr 7
to lisp...@googlegroups.com
I'm glad you pushed back on the "'crossroads' or 'street corner'" assertion, because, as you saw, the stroke count is one of the things that distinguishes characters.  That the radical may have had a historical form with a different number of strokes certainly complicates things.  Your theory about how FX might have used their PUA is definitely plausible.

That's one of several questions we could clear up if we knew people who worked on the XCCS spec - at FX, RX, or Xerox Corp proper.

            Herb

Matt Heffron

unread,
Apr 7, 2026, 6:02:47 PM (10 days ago) Apr 7
to Ron Kaplan, Medley Interlisp core
So, by "do nothing" you mean to ignore those mappings until/if we can identify the correct Unicode characters from the images in the XCCS documents? I.e., for now they are NOT part of the XCCS and MCCS mappings.

(We probably need a person fluent in reading Japanese to identify those actual characters, and then we might be able to look for them in the Unicode names.)

Were all the _other_ language mappings to Unicode actually verified? Shouldn't that be done before submitting to IANA?

Matt
--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lispcore/F43EEC16-F5A4-4A8C-B6C6-DC09EF820E7A%40post.harvard.edu.

Matt Heffron

unread,
Apr 7, 2026, 6:24:04 PM (10 days ago) Apr 7
to Herb Jellinek, lisp...@googlegroups.com

Since Fuji Photo Film Co., Ltd. was established in 1934, some of the historical variants may have been included to represent corporate documents from before the post-war simplification (that was mentioned for at least one character in Nick’s analysis).

 

about which Claude says:

 

The image is a bit blurry but the character appears to be  (U+8FBB), a Japanese-specific character (tsuji) meaning "crossroads" or "street corner." It's composed of (the "movement/road" radical) with (ten/cross) inside, visually representing an intersection.

Notably,  is a kokuji — a character invented in Japan, not inherited from Chinese. It doesn't exist in classical Chinese and has no Chinese reading.

It's extremely common in Japanese surnames, e.g. 󠄀 (Tsuji).



However when pushed about the missing 2nd tick on the left, it says:

You're describing a character where the left component — the radical itself — has two short strokes at the top left instead of one.

This points to  vs  — the distinction between the one-dot and two-dot forms of the "road/movement" radical (, Kangxi radical 162). In some script traditions (particularly older or more formal Chinese styles), this radical is written with two dots at the top left rather than one.

So the character you're thinking of would be visually similar to but with the two-dot form of , giving it an extra tick on the left.

This is indeed a recognised glyph variation in the Unihan world, and it's one of the cases handled by variation selectorsor by regional font differences — Japanese fonts typically render with one dot, while some Chinese and older Japanese sources use two dots.

However I want to be honest here: I'm not certain whether this specific variant of (two-dot + ) has its own dedicated codepoint, or whether it's handled purely as a font/variation-selector matter like the right-side tick variant. It's quite possible it exists somewhere in the CJK Unified Ideographs Extension blocks.

 

and pushing it further:

 

This is very illuminating. Looking at the radical 162 table, under +2 residual strokes (not +3 as I initially said — is 2 strokes), we can see:

JP appears in the +2 residual strokes row Wikipedia, confirming is radical 162 + 2 residual strokes.

Now, for your actual question — the two-dot form of radical 162 is  (U+2ECD, "CJK RADICAL WALK ONE"), and this is the traditional printing form. The one-dot form was introduced in post-war Japanese simplification. The table in the Wikipedia article shows that the two-dot form corresponds to U+8FB6 with variation selector U+E0100 or U+E0102, or the radical codepoint U+2ECD. Wikipedia

So the direct answer to your question is: there does not appear to be a separate standalone codepoint in the CJK Unified Ideographs block for a two-dot-radical version of . The two-dot variant is handled via:

  • U+8FBB () + variation selector U+E0100 or U+E0102 for the two-dot glyph form
  • The distinction is purely a glyph/font matter, not a separate codepoint

The two-dot form (4 strokes in dictionaries) was the traditional printing form, replaced in post-war Japan by the one-dot form for jōyō kanji, but this change did not uniformly apply to hyōgai kanji like .

So... I think at least one use of the FX private area is historical variants, where it is important to maintain the visual appearance (such as if we had to maintain   vs  )

 

-- Nick

 

 

 

 

On Apr 7, 2026, at 12:14, Herb Jellinek <jell...@newscenter.com> wrote:

 

If by "do nothing" you mean that we shouldn't include XCCS codepoints in charsets 167q-172q, I agree.

That aside, it seems the XCCS2Uni mystery deepens!  If possible, I'd like to mention that file as a source of our mappings in our submission to IANA.  Ron, can you contact the person you collaborated with in creating the mapping tables and ask him about the provenance of that file?  My Google searches have been unavailing.

Separately, I'm interested in and puzzled by the fact that there's an expansive Fuji Xerox Private Use Area (167q-172q) distinct from the much smaller "public" Private Use Area, 376q, and further that the FX Private Use Area contains assigned glyphs.  That's a funny kind of "private."

            Herb

On 4/6/26 11:09 PM, Ron Kaplan wrote:

As discussed in the 4/6/2026 tecnical meeting, that file in Unicode/xerox/ appears to have the character sets 167-172 that are listed in the 1987 and 1990 XCCS standard document as Fuji Xerox Private characters.
 
I looked at a handful of glyphs in the file, trying to see if I could see any correspondence to the glyphs in the document tables.  I couldn't find any matches, at least not in any of the obvious places.
 
The comment that I wrote in the file (2021) says that these came from the XCCStoUni binary file, but that I couldn't find a correspondence then either.  But it also says that the character set numbers were assigned by virtue of their sequence in that file, which may not be correct.
 
It would probably be a mistake to throw these into the MCCS space as nominal charsets 167-172, because those are defined in the documents with different glyphs, even if we don't have those mappings.  We could renumber these unkowns and add them as new character sets in an undefined part of the MCCS codespace, but that seems rather pointless.
 
So I propose to do nothing.
 
 
 
 
 

 

 

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lispcore/50c5164b-b18a-44cf-b7da-fd93b76e3059%40newscenter.com.

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lispcore/D9EBE39C-9F63-450C-BDE7-362C37B4C4E9%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.

image001.png
image002.png
image003.png

Matt Heffron

unread,
Apr 7, 2026, 9:12:43 PM (10 days ago) Apr 7
to Medley Interlisp core
While digging around in Unicode for another train of thought, I wonder if some of these characters might fall into the CJK Compatibility Ideographs.
Claude probably would have found any in the main block U+F900–U+FAFF (512 code points). But what about in [CJK Compatibility Ideographs Supplement](https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs_Supplement) block (U+2F800–U+2FA1F), or CJK Unified Ideographs Extensions?

Reply all
Reply to author
Forward
0 new messages