Feedback on new pitch accent proposal requested

7 views

Skip to first unread message

Stuart McGraw

unread,

Jun 22, 2026, 4:43:21 PMJun 22

to edict-...@googlegroups.com

The original XML-NG proposal (https://www.edrdg.org/jmwsgi/web/doc/2026-03-xmlng.html#_pitch_accent_elements) included a way to optionally associate a set of pitch accent specifications with each reading of an entry. That proposal was incorporated in early versions of the XML-NG implementation, but, after discussion of the need to allow for dependence on an entry's sense and/or part-of-speech and the possible inclusion of other information, it was decided to postpone the pa implementation until those details were decided.

I have put together a new pa demo implementation that provides some of those features and would appreciate feedback. It can be trialed at:

https://www.edrdg.org/jmdemo/?svc=jmdemo

The demo accesses a throw-away copy of the jmdictdb database so feel free to make any edits you wish.

The syntax for the new [pa] tags is described in the help file available from the Help link in the navigation bar or directly at:

https://www.edrdg.org/jmdemo/srchform.py?svc=jmdemo#readings
https://www.edrdg.org/jmdemo/srchform.py?svc=jmdemo#syn_pa

The new syntax is the same as the old syntax but now allows an accent to be restricted to specific senses or PoS (part-of-speech) by following the accent value with a sense number, PoS tag name, or both (separated by "/"), in square brackets:

一杯: いっぱい[pa=1[1,2],pa=0[3]]

This says that when 一杯 is used as in senses 1 or 2 it is pronounced with an accent on mora 1 but when used as sense 3, it is unaccented.

自由: じゆう[pa=0[n],pa=1[adj-na]]

This says 自由 is unaccented when used as a noun but accent is on the first mora when used as an na-adjective. In the JMdictDB database, PoSs don't exist outside the context of a sense so if an entry has PoS [n] tags only on senses 1 and 3, [pa=0[n]] has the same effect as [pa=0[1/n,3/n]] and will be shown in the latter form on the entry display page. It will NOT automatically create a new restriction if an [n] tag is added to an existing sense or a new [n] sense is added.

When two different accents can apply, the intended interpretation is the narrower one takes precedence over the wider one: a sense restricted accent takes precedence over a non-restricted accent and a sense/pos restricted accent takes precedence over a sense restricted accent.

Accents can be qualified with a register (currently only "s" (standard, default) or "m" (modern):

[pa/s=0,pa/m=1[1/adj-i]]

Some questions:
1. Is there a need for additional pa register valuess beyond "standard" and "modern"?
2. If no register is specified, is defaulting to "standard" ok or should there be an "unspecified" default?
3. Is there a need for a reading to have multiple accents be applicable (in absence of, or with common, sense/pos restrictions)? That is, should [pa=0,pa=1] or [pa=0[2/n],pa=1[2/n]] be allowed?
1. If so, should there be a provision for discriminating information, eg a "note" field, or "more/less common" flags?
2. If there isn't, how should the reader interpret the existence of two different accents? That both accents are equally acceptable?
4. What about multiple readings pronounced identically, eg:
2835589 かりパチ (借りパチ)；かりぱち (借りぱち)；カリパチ (nokanji)
Right now each requires a separate, identical pa tag.
5. Is there a need/desire to include nasalization or devoicing information?
6. Is there any other pronunciation information that should be included?
7. What should the XML look like (DTD def)?

Still being worked on:
1. Need to detect and prohibit conflicts. E.g., if an entry has two senses, both with [n] tags, then this should be rejected:
[pa=0[n],pa=1[2]]
2. Need to normalize pa and restrictions. E.g., if the tag "[pa=1[1,2]]" is given and the entry has only two senses the restrictions should be ignored.
3. Need to integrate stagr info when evaluating sense restrictions?
4. XML generation (pending DTD def.)
5. The bulk updater needs to be updated.
6. Depending on amount of additional info added, reading/accent display may need revision.

This note has been posted to both the Edict mail list and the Github issue for the pitch accent discussion at https://github.com/JMdictProject/JMdictIssues/issues/171/. I think the currently favored discussion venue is Github, but I'll see any replies to the list too.

-- Stuart

Reply all

Reply to author

Forward

0 new messages