kanjidic regular and nanori readings

9 views
Skip to first unread message

Stuart McGraw

unread,
Sep 8, 2025, 2:14:45 PMSep 8
to edict-...@googlegroups.com
I'm posting here rather than on Github issues because I am not sure if this is an issue or not.

In the current kanjidic2.xml file, there are three character entries that have the same reading listed as both a regular reading and a nanori reading:

十 U+5341 そ
門 U+9580 と
斎 U+658E とき

My understanding is that a nanori reading is one that occurs in names and *does not* occur in normal words as a regular reading. So isn't having a reading listed as both essentially contradictory?

Do those entries need correction with the duplicated reading being assigned as a regular or nanori reading but not both?

-- Stuart

Jim Breen

unread,
Sep 8, 2025, 7:01:58 PMSep 8
to edict-...@googlegroups.com
On Tue, 9 Sept 2025 at 04:14, Stuart McGraw <jmdi...@mtneva.com> wrote:
>
> I'm posting here rather than on Github issues because I am not sure if this is an issue or not.

Correct; it's not a policy/procedure issue. More a matter of data correctness.

> In the current kanjidic2.xml file, there are three character entries that have the same reading listed as both a regular reading and a nanori reading:
>
> 十 U+5341 そ
> 門 U+9580 と
> 斎 U+658E とき
>
> My understanding is that a nanori reading is one that occurs in names and *does not* occur in normal words as a regular reading. So isn't having a reading listed as both essentially contradictory?

Yes, it's an error. Those "nanori" readings were added about 30 years
ago based on the extraction of readings from enamdict entries and
matching them against the regular on/kun set. It was an imperfect
process, and I'm not surprised that some errors happened.

> Do those entries need correction with the duplicated reading being assigned as a regular or nanori reading but not both?

I've removed the nanori versions from the source file.

There are not many good references for name readings. One is Shibano's
"JIS漢字字典", which has name readings for most kanji. If I had a spare
lifetime or two I'd compare them with waht's in kanjidic.

Cheers

Jim

>
> -- Stuart
>
> --
> You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/7de6e9b0-cfb9-4f93-bb64-065887be94bc%40mtneva.com.



--
Jim Breen
https://www.edrdg.org/~jwb/ http://www.jimbreen.org/

Stuart McGraw

unread,
Sep 9, 2025, 12:48:17 AMSep 9
to edict-...@googlegroups.com
Thanks! And thanks for the clarification about issues. I guess my earlier kanjidic error note would also have been better placed here. Noted for the future.

-- Stuart

Stephen Kraus

unread,
Sep 18, 2025, 3:42:16 PM (12 days ago) Sep 18
to edict-...@googlegroups.com
Speaking of nanori readings, I see there are only three in kanjidic2 which are split into stems and okurigana. I don't think they need to be split, and it would make the data more consistent if the splits were removed.

て.る
ただ.す
すす.む

侊 is used, for example, in the name of the actor 柴田 侊彦(しばた てるひこ).
匩 is an itaiji for 匡, for which we have ただ.す as a normal 'ja_kun' reading.
邁 is used, for example, in the name of the writer 
西部 邁(にしべ すすむ).

--
You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.

Jim Breen

unread,
Sep 19, 2025, 1:11:34 AM (12 days ago) Sep 19
to edict-...@googlegroups.com
I'll  fix them when I get back home. In the case of
匩 I think it was a misclassification.

Jim

Stephen Kraus

unread,
Sep 28, 2025, 3:06:23 AM (3 days ago) Sep 28
to edict-...@googlegroups.com
I've been working with the kanjidic2 data recently and found another small data error.

There are about 68 thousand <dic_ref> elements in the kanjidic2.xml file. I found that there are only 5 character entries which contain more than one <dic_ref> element with the same "dr_type" attribute. I wondered if these records were incorrect.

character dr_type dic_ref Page # in source
nelson_n 1664 393
3689 734
nelson_n 1665 393
6002 1064
nelson_n 1665 393
6003 1064
nelson_n 1666 394
6005 1064
oneill_names 1429A 125
1496 128

For "nelson_n" (The New Nelson Japanese-English Character Dictionary), I found that some of the <dic_ref> values are indeed incorrect. The numbers 1664, 1665, and 1666 are actually all assigned to the character 弁. The other <dic_ref> values listed above are fine.

For "oneill_names" ("Japanese Names" by P.G. O'Neill), the values are fine. Both 1429A and 1496 are assigned to the character 眞  in the source.

It's a pretty small issue, but I figured I'd report what I found since I went through all the effort.

Jim Breen

unread,
Sep 29, 2025, 2:11:18 AM (yesterday) Sep 29
to edict-...@googlegroups.com
Thanks, Stephen. I'll look into those when I get home. The New Nelson codes were a contribution and I didn't check them. It could be due to an editting error with my files.
Reply all
Reply to author
Forward
0 new messages