Inserting (empty or filled) tokens during annotation

10 views
Skip to first unread message

Roland Meyer

unread,
Sep 20, 2021, 6:19:04 AM9/20/21
to inception-users
Dear all, 

is it possible to insert tokens into corpus texts during annotation? 

Background: We are looking for a way to handle ellipsis annotation via inserting empty categories (as suggested in the enhanced UD analysis – https://universaldependencies.org/u/overview/enhanced-syntax.html#ellipsiswhich mostly behave like usual tokens, e.g. can be linked in coreference chains and the like. Inserting an empty span does not look quite right – it seems to be attached to a token rather than behave as a separate one, or am I mistaken there?

Thanks for suggestions, best,

Roland 



Richard Eckart de Castilho

unread,
Sep 22, 2021, 2:46:38 AM9/22/21
to inception-users
Hi Roland,

> On 20. Sep 2021, at 12:19, 'Roland Meyer' via inception-users <incepti...@googlegroups.com> wrote:
>

> is it possible to insert tokens into corpus texts during annotation?
>
> Background: We are looking for a way to handle ellipsis annotation via inserting empty categories (as suggested in the enhanced UD analysis – https://universaldependencies.org/u/overview/enhanced-syntax.html#ellipsis) which mostly behave like usual tokens, e.g. can be linked in coreference chains and the like. Inserting an empty span does not look quite right – it seems to be attached to a token rather than behave as a separate one, or am I mistaken there?

we have started looking into making tokens and sentences editable objects in INCEpTION,
but the feature is far from usable yet.

At the moment, working with zero-width spans is the only option for ellipsis annotation. Zero-width spans are not attached to tokens. The UI only allows creating zero-width spans at the beginning or end of a token or even within a token. You cannot create an annotation on the white space between tokens. But that doesn't imply that the span is attached to the token. That said, a zero-width span does not behave like a token either. Currently, a token is an invisible annotation type and a few other annotations like POS, Lemma and a few others internally hook up to the token. The UD importer/exporter knows how to work with these internal Token, POS or Lemma types. But if I remember correctly, the UD importer/exporter currently does not support ellipsis. You can build custom annotation layers which allow
zero-width annotations and also define relations over them and such, but you can then only
export the data as UIMA CAS XMI or WebAnno TSV and a conversion to/from CoNLL-U would need to happen externally e.g. using a Python script with the help of the DKPro Cassis library for XMI files.

To create a zero-width span on a layer that supports it, press "shift" and click at the position where you want to insert the span.

Here are some issues related to making tokens/sentences editable objects:

https://github.com/inception-project/inception/projects/53

Cheers,

-- Richard
Reply all
Reply to author
Forward
0 new messages