Import annotations and treat these as coming from an Inception user

309 views
Skip to first unread message

Matthew Turner

unread,
Jun 3, 2022, 2:42:44 PM6/3/22
to inception-users
We have what might be a strange use case. Or perhaps I just don't know the natural way to think about our case in the context of Inception. Either way there was not an obvious solution that I saw in the manual.

We have developed a machine learning system that applies annotations to documents. The training materials for this ML system are WebAnno TSV 3.3 annotated documents that have been annotated and curated in Inception as part of a project. We chose WebAnno as a format because we are using a custom layer to hold our annotations.

Because we are working with these WebAnno files as inputs, we have developed code that allows us to write the ML system's annotations in (different) WebAnno files. To be clear, these are distinct from the training files.

What we want to do is load those ML system annotations into Inception, in a new project, and treat these as the annotations from a specific user (let's call this user "robot"). Then we anticipate either (i) having some human users mark up the same documents so that we can compare the concordance measures among the annotators, or (ii) doing some sort of modification/curation where a human "corrects" the ML system's annotations using Inception, and this result is then exported and compared with the ML system's annotations directly. In both cases we want to estimate performance metrics for the ML system, but we might be doing this with built-in Inception metrics (case i) or externally (case ii).

Is there a way to do this? It seems (somewhat) related to the advice in this thread but using different file formats.

Perhaps more importantly, is there a more natural way, in the context of Inception, to carry out this sort of activity? or is this not a natural fit for this software?

Thanks!
Matt

Richard Eckart de Castilho

unread,
Jun 5, 2022, 8:25:31 AM6/5/22
to inception-users
Hi Matt,

I'll reply below. I may rephrase what you wrote to reflect how I understand your scenario. If the rephrasing sounds off, please correct me.

> On 3. Jun 2022, at 20:42, Matthew Turner <matthew.t...@gmail.com> wrote:
>
> We have developed a machine learning system that applies annotations to documents. The training materials for this ML system are WebAnno TSV 3.3 annotated documents that have been annotated and curated in Inception as part of a project. We chose WebAnno as a format because we are using a custom layer to hold our annotations.

You are using INCEpTION to manually annotate data that is the used to train a ML system. As an export format, you found WebAnno TSV 3.3 to be convenient. Sounds good so far. I would usually recommend using CAS XMI in conjunction with the DKPro Cassis library (Python) or the UIMA library (Java), but if WebAnno TSV 3.3 is sufficient for your use case, it is perfectly fine.

> Because we are working with these WebAnno files as inputs, we have developed code that allows us to write the ML system's annotations in (different) WebAnno files. To be clear, these are distinct from the training files.

You use your ML system to annotate new texts and then write out the annotations in the WebAnno TSV 3 format.

> What we want to do is load those ML system annotations into Inception, in a new project, and treat these as the annotations from a specific user (let's call this user "robot"). Then we anticipate either (i) having some human users mark up the same documents so that we can compare the concordance measures among the annotators, or

For case (i), I understand that you human users should not be able to see the automatically created annotation. However, you would later like to use the INCEpTION facilities such as curation/agreement to compare the human-created annotation to the robot annotations.

For this use-case, you would need to import the unannotated texts into INCEpTION. In particular, you should
- strip all annotations from the ML-annotated WebAnno TSV 3.3 files that you have generated
- import these files into the project as documents
- create a "robot" user in the system
- add the user to your project
- enable the remote API
- upload the original non-stripped ML-annotated WebAnno TSV 3.3 files as annotation documents of the "robot" user through the remote API
- it is important that the texts/character offsets in the stripped WebAnno TSV 3.3 and in the non-stripped ML-annotated WebAnno TSV 3.3 are the same!

Now you can annotate the texts as usual through the web UI and curators can compare the human-made annotations to the "robot"'s annotations.

> (ii) doing some sort of modification/curation where a human "corrects" the ML system's annotations using Inception, and this result is then exported and compared with the ML system's annotations directly.

For case (ii), I understand that the humans should see the automatically generated annotations and they should modify them if they find an error. You accept the risk that the humans might by biased by the annotations they see. Also, you accept that you have no knowledge if a human user actually has seen/acknowledged a ML-generated annotations unless you somehow require the humans to "sign the annotations off", e.g. be introducing a feature that the humans need to set on every annotation they have accepted without modification.

For this use-case, you would import the ML-annotated files directly as documents into INCEpTION. The annotators will see the ML-generated annotations, be able to modify or delete them, or be able to add new annotations in case they think the ML system as missed something.

You can compare/curate the human-corrected annotations using curation/agreement as usual.

There is a second alternative for case (ii) which is based on the project setup you would do for use case (i). You could:
- enable the *experimental* curation sidebar on the annotation page
- have all annotators in the project have the "curator" role
- tell the annotators to configure the curation sidebar such that the curation data goes to their own user and not to the "curation" user
- tell the annotators to only enable the "robot" user
- WARNING: annotators should review these settings every time they annotate a document to make sure they do not accidentally write to the "curation" document!
- WARNING: there is no guarantee that an annotator might not peek into the annotations of a different annotator when the curation sidebar is active!

The annotators will then be able to see the annotations of the "robot" user on the annotation page as "suggestions". They can accept these suggestions and thereby transfer them as annotations to their own user. Then can also then modify these annotations. They can also create new annotations if the robot missed something. They can however not mark a robot-generated annotation as wrong or invalid unless you add a feature to make such a mark to your layer configuration.

Eventually, you can compare the human-reviewed annotations to each other or to the robot's annotations using the usual means.

Best try the approaches and check what works best for you.

Did you find the explanations helpful? Do you think they should go to the users' manual and if so to which section?

Best,

-- Richard

Matthew Turner

unread,
Jun 6, 2022, 4:48:20 PM6/6/22
to inception-users
Hello Richard --

It will likely take us a little while to unpack all of this, but your description indicated that you definitely understood what we were looking to do here. We will do some experimenting and get back to you. I am not sure if this case is applicable enough that it should be added to the manual, but if there were an appendix or something that gives more details/examples of the API usage, that might be useful, we only accidentally discovered that many functions need the API.

Thanks!
Matt

Reply all
Reply to author
Forward
0 new messages