multi-token named entity in free word order languages

3 views
Skip to first unread message

Agustin Dei

unread,
Oct 29, 2025, 11:59:49 AMOct 29
to inception-users


Hi, are there any recommendations for annotation of named entities in free word languages ? For instance the multi token named entity is broken by another token :

"Porcius tamen Latro..."

How can I tag it so that I am able to have the BIO tag  B-Per, O, I-Per ?

Richard Eckart de Castilho

unread,
Oct 30, 2025, 1:49:27 AMOct 30
to inception-users
Hi,
So the think with BIO encoding is that it encodes a sequence of tokens with a single label.
The idea is that a sequence classifier can learn at which token a label begins (B),
continues (I) and which which tokens do not have a label at all (O).

So this is valid BIO encoding:

```
John[PER-B] Smith[PER-I] runs[O].
```

But this is not valid BIO encoding:

```
John[PER-B] bloody [O] Smith[PER-I]
```

The I always has to follow the B. It cannot appear on its own.

Thus, BIO encoding is not suitable for the annotation of discontinuous segments.

INCEpTION currently also does not have a direct support for discontinuous segments.
The best way you can currently model discontinuous segments is by:

Defining two layer, e.g.

* Entity (span)
* Entity fragment (span)

Then you would add a "Link (Entity fragment)" feature to the Entity layer.

In that way you could annotate like this:

```
John[Entity] bloody Smith[Entity Fragment] -- (link "Smith" to "John" via "fragments" feature)
```

You could even configure recommenders for the Entity and Entity fragment layers.
There is currently no built-in recommender that supports the link features like "fragments" though.

I hope that helps.

Best regards,

-- Richard

Reply all
Reply to author
Forward
0 new messages