xref conflicts

Jeremy Legron

unread,

Mar 6, 2022, 11:24:35 AM3/6/22

to EDICT-JMdict

I don't know whether this has already been discussed but there are some ambiguous situations regarding the xref elements.

It is indicated in the document type declaration that "the target keb or reb must not contain a centre-dot". However some targets do have the center dot, for example: "ルポルタージュ・ライター", "文禄・慶長の役" and "タックス・ヘイブン". I don't think this is very much a problem for these examples as they can be handled with a few tweaks but the wording of the document type declaration is misleading.

A more annoying problem is that some xref don't have a precise enough target. An example is "<xref>元・もと・1</xref>", there exists several entries with 元 as a keb and もと as a reb.

Is there a way to get around this problem?
I guess we could had the targeted ent_seq in the xref element to get ride of these ambiguous situations, but it would take a lot of time considering how large this dictionary is.

Stuart McGraw

unread,

Mar 6, 2022, 4:26:29 PM3/6/22

to edict-...@googlegroups.com, Jeremy Legron

Hi　Jeremy,

Jim will probably comment further but I can provide some info. There is a proposed update to the JMdict XML described at:

http://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation

which will provide a sequence number with each xref that will disambiguate the currently ambiguous ones (as you suggest). It also will dispense with the center dot as a kanji-kana separator.

Unfortunately I am the blocking factor on this. Quite a bit of work on it was done in last 3 months or so. The good news is that about 75-80% of it is complete. The bad news is my time to work on it will be limited for the next couple months. However I hope to have it done by summer.

Hope this helps.

-- Stuart

Jim Breen

unread,

Mar 6, 2022, 6:22:42 PM3/6/22

to edict-...@googlegroups.com

As Stuart pointed out, a proper solution to the problem is in the pipeline.

For the cases such as 文禄・慶長の役 we can fix this in the database. I have
edited the 3 entries you mentioned. Let me know if you see others.
Cases such as 元・もと will need to wait for the revision.

Cheers

Jim

> --
> You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/606e27fc-e5df-47eb-bb88-a25bbf30b65bn%40googlegroups.com.

--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/

Jeremy Legron

unread,

Mar 7, 2022, 3:18:28 AM3/7/22

to EDICT-JMdict

Thank you very much for you response and your work.

The incoming solution seems like a nice structural improvement, I'm looking forward to it.

The only other cross-reference with a centre-dot that I have been able to identity is ロイス・ディーツ症候群.

Justin Kautz

unread,

Dec 27, 2024, 7:09:21 PM12/27/24

to EDICT-JMdict

Following up on this to check the current status.

I've seen that the XREF is updated to the new format on the web interface, however the database file itself still uses the old format. As of right now when I import the database file, I have to query the online page when I find an XREF with a conflict in order to properly assign the correct sequence value, which makes the process of updating a local version of the database complex and much slower.

Thanks.

Jim Breen

unread,

Dec 27, 2024, 11:55:12 PM12/27/24

to edict-...@googlegroups.com

The NG version of the JMdict XML distribution is still some way off. Although the sequence numbers of xref targets are in the maintenance database, there's no simple way to get them into the XML distribution.

Jim

To view this discussion visit https://groups.google.com/d/msgid/edict-jmdict/046695e5-980b-423c-97de-07c66021b1d0n%40googlegroups.com.

Reply all

Reply to author

Forward