xmi:id attribute

89 views
Skip to first unread message

sonja aits

unread,
May 18, 2022, 1:40:08 PM5/18/22
to inception-users
Hi,
could someone please explain to me how the xmi:id attribute in the CAS xmi file for named entity annotations is generated (I am using the pre-existing Inception NER layer)?

I can see that in some cases when two annotators labelled the same entity in the same document the number is the same but in other cases the id is shared among different entities with different class and position in the same document, e.g. here are two examples from the same annotated document:

SAME NUMBER FOR SAME ENTITY
line from xmi file of annotator 1
<type3:NamedEntity xmi:id="4375" sofa="1" begin="404" end="416" value="SYMP" identifier="http://www.ukp.informatik.tu-darmstadt.de/inception/1.0#node1fuui58tax31"/>
line from xmi file of annotator 2
    <type3:NamedEntity xmi:id="4375" sofa="1" begin="404" end="416" value="SYMP"/>

SAME NUMBER FOR DIFFERENT ENTITIES
line from xmi file of annotator 1
    <type3:NamedEntity xmi:id="4531" sofa="1" begin="418" end="422" value="NEG-NT"/>

line from xmi file of annotator 2 (same document annotated as by annotator 1)
    <type3:NamedEntity xmi:id="4531" sofa="1" begin="509" end="515" value="SYMP"/>

Is this normal, or some kind of error?

Cheers
Sonja


Piotr Banski

unread,
May 18, 2022, 1:57:08 PM5/18/22
to incepti...@googlegroups.com, sonja aits

Hi Sonja,

It appears to be an XMI-internal mechanism to give a measure of identity to some constructs within a single XMI file. It most probably has no connection whatsoever to how Inception data is represented or exposed to the user. My guess is that it's perfectly normal and that it can't be relied on for data manipulation performed across XMI exports.

HTH,

  Piotr

--
You received this message because you are subscribed to the Google Groups "inception-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inception-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/inception-users/4883a957-0cae-4cf0-a237-8bb87f085b3en%40googlegroups.com.
-- 
Piotr Bański, Ph.D.
Senior Researcher,
Leibniz-Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany

Richard Eckart de Castilho

unread,
May 19, 2022, 3:07:26 AM5/19/22
to incepti...@googlegroups.com, sonja aits
Hi all,


> On 18/05/2022 19:40, sonja aits wrote:
>>
>> could someone please explain to me how the xmi:id attribute in the CAS xmi file for named entity annotations is generated (I am using the pre-existing Inception NER layer)?

As Piotr said: the IDs are intended only for internal reference within a particular XMI file.

> On 18. May 2022, at 19:57, Piotr Banski <ban...@ids-mannheim.de> wrote:
>
> It appears to be an XMI-internal mechanism to give a measure of identity to some constructs within a single XMI file. It most probably has no connection whatsoever to how Inception data is represented or exposed to the user. My guess is that it's perfectly normal and that it can't be relied on for data manipulation performed across XMI exports.

Although technically it is a bit more complex, you should imagine that the IDs are (re)generated during export by iterating over the annotation structures that are in the document at that particular time.

Note that users may add/remove annotations while they are working, so if you do two exports at different times, it can happen that an XMI ID that was assigned to annotation X the first time is assigned to annotation Y the second time.

Let's imagine we would add a new kind of annotation feature called "Auto-ID" that you could add to a layer (similar to adding a String feature) and it would add an auto-incremented ID to every annotation. Even in such a case, you would not be able to use it to compare annotations across users.

When we compare annotations across users (e.g. in curation or agreement calculation), we do so qualitatively, i.e. by looking at the layer type, feature values, etc. to decide if two annotations are comparable or equivalent.

Cheers,

-- Richard
Reply all
Reply to author
Forward
0 new messages