Possible simplification of readings in JMdict

17 views
Skip to first unread message

Jim Breen

unread,
Jul 26, 2021, 4:20:01 AM7/26/21
to edict-...@googlegroups.com
When I first designed the JMdict structure in the late 90s I was keen
to provide for back-compatibility with the older EDICT format, as a
number of sites, etc. were using it. One aspect of that was to make
sure the kanji and reading parts matched precisely. For example, in a
(hypothetical) entry with a kanji part of "何を食べる;ナニを食べる" the reading
part had to have both なにをたべる and ナニをたべる, with restrictions tying the
readings to the matching kanji forms.

This approach has led on occasion to some rather complex and ugly
entries, and it's appropriate to ask whether it's really worth doing.
Does it really matter? A recent example of this is the 喉が渇く entry (*),
where some variants were added containing ノド in place of 喉 The reading
part of that entry now contains:
のどがかわく[喉が渇く,のどが渇く,喉が乾く,のどが乾く,喉がかわく];ノドがかわく[ノドが渇く,ノドが乾く]

Would it really matter if it just had "のどがかわく"? Looking up the entry
using kana alone would/should find it (provided the developers matched
both kana forms.)

A simplification like this would only apply to those sorts of mixed
terms. Entries where having readings fully or partly in katakana are
considered appropriate would not be affected.

Any views on this?

Jim

(*) https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1277350
--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/

Chris Vasselli

unread,
Jul 26, 2021, 10:05:40 AM7/26/21
to edict-...@googlegroups.com
Thanks Jim, I can see how this would be a nice simplification.

I want to clarify a little bit about your proposal, especially your last paragraph. Are you saying that, in your hypothetical ナニを食べる for example, if there were real-world usage of the reading ナニをたべる as a surface form then that form would be included in the database, but if not, it would be excluded?

At first blush, I imagine as long as there is a consistent and well-documented understanding of what the presence/absence of the form means, and all forms that  actually occur in real world text still appear in the database, then as a consumer of JMdict that shouldn’t be too hard to adapt to.

Chris
--
You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/CABHGxq6osUfyXqGP%2BvfJRHAOZPmOhr6ENEqFv0FS_pPN2H1agQ%40mail.gmail.com.

Jim Breen

unread,
Jul 28, 2021, 8:17:28 PM7/28/21
to edict-...@googlegroups.com
On Tue, 27 Jul 2021 at 00:05, Chris Vasselli <clin...@gmail.com> wrote:
[...]
> I want to clarify a little bit about your proposal, especially your last paragraph. Are you saying that, in your hypothetical ナニを食べる for example, if there were real-world usage of the reading ナニをたべる as a surface form then that form would be included in the database, but if not, it would be excluded?

Yes, something like that. The [nokanji] cases would stay, of course.

To give a real example. An entry such as:
鉛筆
【 えんぴつ; エンピツ (nokanji) 】
would stay as it is, but:
鉛筆削り; えんぴつ削り; エンピツ削り
【 えんぴつけずり (鉛筆削り, えんぴつ削り); エンピツけずり (エンピツ削り) 】
would see the reading field change to just:
【 えんぴつけずり】

> At first blush, I imagine as long as there is a consistent and well-documented understanding of what the presence/absence of the form means, and all forms that actually occur in real world text still appear in the database, then as a consumer of JMdict that shouldn’t be too hard to adapt to.

Yes, it's a bit of a trade-off between precision and visual clutter.

Jim
> To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/1eb92748-732d-4c0e-9614-2f9020af1484%40Spark.

Jim Breen

unread,
Jul 30, 2021, 6:58:04 PM7/30/21
to edict-...@googlegroups.com
This has just come up in the edits, with アコヤ貝 being proposed for
addition to the 阿古屋貝 entry. I propose to approve it without adding
アコヤがい as a reading.

Jim

Jim Breen

unread,
Oct 11, 2021, 12:03:12 AM10/11/21
to edict-...@googlegroups.com
Well, there seem to be no objections and it's already being
implemented, so let's go with it.

I have added an explanation in the Editorial Policy page at:
https://www.edrdg.org/wiki/index.php/Editorial_policy#Reading_Field_Simplification

Jim
Reply all
Reply to author
Forward
0 new messages