Proper way to specify different senses of same word (cmt attribute or co tag?)

63 views
Skip to first unread message

Joshua Lotz

unread,
Nov 27, 2016, 8:30:25 PM11/27/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
I am creating an English --> foreign language dictionary and am wondering what is the proper way to indicate different "senses" of a word in xdxf.

For example, the English word house could mean:

1) a building
2) a legislative assembly

Are these senses intended to be specified in the cmt attribute of the def tag? For example:

       <ar>
            <def>
            <k>house</k>
            <def cmt="a building">
                <deftext>beet</deftext>
            </def>
            <def cmt="a legislative assembly">
                <deftext>mažles</deftext>
            </def>
            </def>
        </ar>

My goal is to create the dictionary in xdxf and then convert it to pdf for printing with PrinceXML, so I would this texts ("a building", "a legislative assembly") to actually be printed before each definition (like GoldenDict does):

house
n.
1. (a building) beet
2. (a legislative assembly) mažles

But it seems a bit inconvenient to extract these texts if they are "hidden" as attributes within the def tags.

Perhaps I should specify the senses in their own co tags within each def?

            <def>
                <co>a building</co>
                <deftext>beet</deftext>
            </def>

Lenny Soshinskiy

unread,
Nov 28, 2016, 1:20:28 PM11/28/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
Proper way to do it is to put it inside <def> tag. Why is it hard to extract? What software/suites do you use? Also, <k> has has to be outside of any <def> tag, e.g. between <ar> and <def>.

Joshua Lotz

unread,
Nov 28, 2016, 2:46:47 PM11/28/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
OK, you mean that it I should use the cmt= attribute inside the <def> tag? I was trying to keep it simple and just style the raw XML with CSS without doing any transformation and then convert to pdf with PrinceXML. But since the actual text I want to print would be inside the attribute tag I will need to add xslt to the workflow to do the transformation.

How should I handle the case where I need two levels of distinction in the comments? For example, let's say the word house has two different senses and the first sense can be translated in two different was (for formal and informal language). So I need one level of comments to distinguish the different meanings of the word and another level to distinguish different levels of formality.

house n.
1. (buildling) beet [informal], manzil [formal]
2. (legislative assembly) mažles

Is it approproate to include the first comment as an attribute <def cmt="building"> and then add a third level of <def> tags and within those tags enclose include tags like <co>informal</co>? Is there any semantic difference between using the cmt attribute the <co> tag?

Lenny Soshinskiy

unread,
Nov 28, 2016, 3:03:18 PM11/28/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
Putting a third <def> is a totally normal practice, when there is need for that.
Also, if you do are not intending to publish this dictionary and it is only for purposes of conversion, you may do and have whatever structure/format you like.
As for me, I would do it like this:

    <ar>
        <k>house</k>
        <def>
            <def cmt="a building">
                <deftext>
                    <abbr>inf.</abbr> <dtrn>beet</dtrn>, <abbr>frml.</abbr> <dtrn>manzil</dtrn>
                </deftext>
            </def>
            <def cmt="a legislative assembly">
                <deftext>mažles</deftext>
            </def>
        </def>
    </ar>

By the way, I am dying to know, what language this is and what type dictionary you are creating :)

Joshua Lotz

unread,
Nov 28, 2016, 3:40:53 PM11/28/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
Great, thanks for the suggestion. My main usage for the dictionary is to search in GoldenDict, but I'd like to be able to print it easily to pdf, too.

I tried both variants to see how they would look in GoldenDict and they both look fine. I guess it comes down to whether or not I want to treat the formal and informal words as separate definitions with their own example sentences (i.e. a third level of hierarchy), or if I can live with just listing the different translations all together on the same level and differentiate them with inline abbreviations. The former is more clear, but the latter is more space-efficient.



The dictionary is for Levantine Arabic, the variety spoken in Syria.

Lenny Soshinskiy

unread,
Nov 29, 2016, 4:54:40 PM11/29/16
to XDXF - The Extensible (XML) Dictionary Exchange Format
I would say that <div> is supposed to contain one cluster of the same meaning. In my opinion, if beet and manzil represent the same thing then they have to belong to the same <div>.
Reply all
Reply to author
Forward
0 new messages