You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EDICT-JMdict
I don't know whether this has already been discussed but there are some ambiguous situations regarding the xref elements.
It is indicated in the document type declaration that "the target keb or reb must not contain a centre-dot". However some targets do have the center dot, for example: "ルポルタージュ・ライター", "文禄・慶長の役" and "タックス・ヘイブン". I don't think this is very much a problem for these examples as they can be handled with a few tweaks but the wording of the document type declaration is misleading.
A more annoying problem is that some xref don't have a precise enough target. An example is "<xref>元・もと・1</xref>", there exists several entries with 元 as a keb and もと as a reb.
Is there a way to get around this problem? I guess we could had the targeted ent_seq in the xref element to get ride of these ambiguous situations, but it would take a lot of time considering how large this dictionary is.
Stuart McGraw
unread,
Mar 6, 2022, 4:26:29 PM3/6/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to edict-...@googlegroups.com, Jeremy Legron
Hi Jeremy,
Jim will probably comment further but I can provide some info. There is a proposed update to the JMdict XML described at:
which will provide a sequence number with each xref that will disambiguate the currently ambiguous ones (as you suggest). It also will dispense with the center dot as a kanji-kana separator.
Unfortunately I am the blocking factor on this. Quite a bit of work on it was done in last 3 months or so. The good news is that about 75-80% of it is complete. The bad news is my time to work on it will be limited for the next couple months. However I hope to have it done by summer.
Hope this helps.
-- Stuart
Jim Breen
unread,
Mar 6, 2022, 6:22:42 PM3/6/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to edict-...@googlegroups.com
As Stuart pointed out, a proper solution to the problem is in the pipeline.
For the cases such as 文禄・慶長の役 we can fix this in the database. I have
edited the 3 entries you mentioned. Let me know if you see others.
Cases such as 元・もと will need to wait for the revision.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EDICT-JMdict
Thank you very much for you response and your work.
The incoming solution seems like a nice structural improvement, I'm looking forward to it.
The only other cross-reference with a centre-dot that I have been able to identity is ロイス・ディーツ症候群.
Justin Kautz
unread,
Dec 27, 2024, 7:09:21 PM12/27/24
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EDICT-JMdict
Following up on this to check the current status.
I've seen that the XREF is updated to the new format on the web interface, however the database file itself still uses the old format. As of right now when I import the database file, I have to query the online page when I find an XREF with a conflict in order to properly assign the correct sequence value, which makes the process of updating a local version of the database complex and much slower.
Thanks.
Jim Breen
unread,
Dec 27, 2024, 11:55:12 PM12/27/24
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to edict-...@googlegroups.com
The NG version of the JMdict XML distribution is still some way off. Although the sequence numbers of xref targets are in the maintenance database, there's no simple way to get them into the XML distribution.