difficulty exporting knowledge base labels

15 views
Skip to first unread message

sonja aits

unread,
Sep 24, 2021, 9:32:25 AM9/24/21
to inception-users
Hi,
I am testing Inception for Named entity annotation using the default Named entity layer. I have manually added a new tagset and a small knowledge base in OWL format and labelled the entities in my document.
However, when I export the project as CAS XMI (XML 1.1) format, I cannot see the terms from the knowledge base in the "identifier" whereas the term under "value" is correct. Instead, it looks like this:
    <type3:NamedEntity xmi:id="696" sofa="1" begin="24" end="35" value="SYMP" identifier="http://www.ukp.informatik.tu-darmstadt.de/inception/1.0#node1fgbrpnu2x3"/>

Does anyone have a suggestion as to how I can correct that?
Cheers,
Sonja

Richard Eckart de Castilho

unread,
Sep 24, 2021, 9:39:35 AM9/24/21
to inception-users
Hi,

> On 24. Sep 2021, at 15:32, sonja aits <sonja...@gmail.com> wrote:
>
> <type3:NamedEntity xmi:id="696" sofa="1" begin="24" end="35" value="SYMP" identifier="http://www.ukp.informatik.tu-darmstadt.de/inception/1.0#node1fgbrpnu2x3"/>

The term in the identifier is the URI of the concept in your knowledge base. INCEpTION stores concept features using the URIs. That for example allows that the label property in the knowledge base can simply be changed without having up update all the annotated documents.

However, it also means that if you want to resolve the identifier to the label from the knowledge base, you'd currently have to export both the annotated text and the KB and then process the exported annotated texts and replace the identifier with the label stored under that identifier in the knowledge base.

The alternative to using a concept feature that connects to a knowledge base (with these URIs) is to use a string feature with a tagset (like the value feature of the pre-defined Named Entity layer does).

Does that help?

-- Richard

Richard Eckart de Castilho

unread,
Sep 24, 2021, 9:40:52 AM9/24/21
to inception-users
On 24. Sep 2021, at 15:39, Richard Eckart de Castilho <richard...@gmail.com> wrote:
>
> The term in the identifier is the URI of the concept in your knowledge base. INCEpTION stores concept features using the URIs. That for example allows that the label property in the knowledge base can simply be changed without having up update all the annotated documents.

Mind that in a knowledge base, the label of a concept is not necessarily unique. Consider how many "Thomas Miller"s are out there e.g. in Wikidata. The unique ID is this URI that you have observed.

-- Richard

sonja aits

unread,
Sep 24, 2021, 9:53:26 AM9/24/21
to inception-users
great, that solves it as we have a very small knowledge base for our project which has no duplicate terms. so, then I can write a script that finds the matching URI in the knowledge base ttl file that is also included in the exported zip and replace.
thanks for the help!
sonja
Reply all
Reply to author
Forward
0 new messages