kanjidic regular and nanori readings

Stuart McGraw

unread,

Sep 8, 2025, 2:14:45 PM9/8/25

to edict-...@googlegroups.com

I'm posting here rather than on Github issues because I am not sure if this is an issue or not.

In the current kanjidic2.xml file, there are three character entries that have the same reading listed as both a regular reading and a nanori reading:

十 U+5341 そ
門 U+9580 と
斎 U+658E とき

My understanding is that a nanori reading is one that occurs in names and *does not* occur in normal words as a regular reading. So isn't having a reading listed as both essentially contradictory?

Do those entries need correction with the duplicated reading being assigned as a regular or nanori reading but not both?

-- Stuart

Jim Breen

unread,

Sep 8, 2025, 7:01:58 PM9/8/25

to edict-...@googlegroups.com

On Tue, 9 Sept 2025 at 04:14, Stuart McGraw <jmdi...@mtneva.com> wrote:
>
> I'm posting here rather than on Github issues because I am not sure if this is an issue or not.

Correct; it's not a policy/procedure issue. More a matter of data correctness.

> In the current kanjidic2.xml file, there are three character entries that have the same reading listed as both a regular reading and a nanori reading:
>
> 十 U+5341 そ
> 門 U+9580 と
> 斎 U+658E とき
>
> My understanding is that a nanori reading is one that occurs in names and *does not* occur in normal words as a regular reading. So isn't having a reading listed as both essentially contradictory?

Yes, it's an error. Those "nanori" readings were added about 30 years
ago based on the extraction of readings from enamdict entries and
matching them against the regular on/kun set. It was an imperfect
process, and I'm not surprised that some errors happened.

> Do those entries need correction with the duplicated reading being assigned as a regular or nanori reading but not both?

I've removed the nanori versions from the source file.

There are not many good references for name readings. One is Shibano's
"JIS漢字字典", which has name readings for most kanji. If I had a spare
lifetime or two I'd compare them with waht's in kanjidic.

Cheers

Jim

>
> -- Stuart
>
> --
> You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/7de6e9b0-cfb9-4f93-bb64-065887be94bc%40mtneva.com.

--
Jim Breen
https://www.edrdg.org/~jwb/ http://www.jimbreen.org/

Stuart McGraw

unread,

Sep 9, 2025, 12:48:17 AM9/9/25

to edict-...@googlegroups.com

Thanks! And thanks for the clarification about issues. I guess my earlier kanjidic error note would also have been better placed here. Noted for the future.

-- Stuart

Stephen Kraus

unread,

Sep 18, 2025, 3:42:16 PM9/18/25

to edict-...@googlegroups.com

Speaking of nanori readings, I see there are only three in kanjidic2 which are split into stems and okurigana. I don't think they need to be split, and it would make the data more consistent if the splits were removed.

侊	て.る
匩	ただ.す
邁	すす.む

侊 is used, for example, in the name of the actor 柴田侊彦（しばたてるひこ).
匩 is an itaiji for 匡, for which we have ただ.す as a normal 'ja_kun' reading.
邁 is used, for example, in the name of the writer 西部邁（にしべすすむ）.

--
You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/4b9cff79-86a7-48ef-8f3c-b28f61ecbf92%40mtneva.com.

Jim Breen

unread,

Sep 19, 2025, 1:11:34 AM9/19/25

to edict-...@googlegroups.com

I'll fix them when I get back home. In the case of

匩 I think it was a misclassification.

Jim

Jim Breen

https://www.edrdg.org/~jwb/ http://www.jimbreen.org/

To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/CAA54iXwJ%2BS97cBTNc6JM5UUvXtne9mC7TsDA2xtOURKen16QOg%40mail.gmail.com.

Stephen Kraus

unread,

Sep 28, 2025, 3:06:23 AM9/28/25

to edict-...@googlegroups.com

I've been working with the kanjidic2 data recently and found another small data error.

There are about 68 thousand <dic_ref> elements in the kanjidic2.xml file. I found that there are only 5 character entries which contain more than one <dic_ref> element with the same "dr_type" attribute. I wondered if these records were incorrect.

character	dr_type	dic_ref	Page # in source
瓣	nelson_n	1664	393
瓣	nelson_n	3689	734
辨	nelson_n	1665	393
辨	nelson_n	6002	1064
辧	nelson_n	1665	393
辧	nelson_n	6003	1064
辯	nelson_n	1666	394
辯	nelson_n	6005	1064
眞	oneill_names	1429A	125
眞	oneill_names	1496	128

For "nelson_n" (The New Nelson Japanese-English Character Dictionary), I found that some of the <dic_ref> values are indeed incorrect. The numbers 1664, 1665, and 1666 are actually all assigned to the character 弁. The other <dic_ref> values listed above are fine.

For "oneill_names" ("Japanese Names" by P.G. O'Neill), the values are fine. Both 1429A and 1496 are assigned to the character 眞 in the source.

It's a pretty small issue, but I figured I'd report what I found since I went through all the effort.

To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/CABHGxq7pmp8ciYeZUVtr69GsVgz_DXQD552Ogr3xrDOMgfCx1A%40mail.gmail.com.

Jim Breen

unread,

Sep 29, 2025, 2:11:18 AM9/29/25

to edict-...@googlegroups.com

Thanks, Stephen. I'll look into those when I get home. The New Nelson codes were a contribution and I didn't check them. It could be due to an editting error with my files.

Jim

Jim Breen

https://www.edrdg.org/~jwb/ http://www.jimbreen.org/

To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/CAA54iXycBHtw%2B_NQJrgetQMtVt6o7dUKr-G7_NK%2BJ6SkhG_k3g%40mail.gmail.com.

Jim Breen

unread,

Oct 6, 2025, 10:39:53 PM10/6/25

to edict-...@googlegroups.com

OK, those stray New Nelson codes have been removed from the master file. The next distribution will not have them.