> On 24. Sep 2021, at 14:21, Michaela Polónyová <michaela....@flashnews.com
> we are working on a Czech/English annotation project currently using Brat and hoping to switch to a different tool. We've been experimenting with INCEpTION and are a bit struggling to reach the state of things we particularly need. Would it be possible to get in touch with you and get some hints?
> Features we are looking for:
> • annotation of named entities (person, organization, location, etc.) and entity linking to wikidata > how to set it up in combination with a recommender/active learning?
* To learn the class (PER, ORG, etc.) go to the project settings and to there to the recommender tab.
* Press "create"
* Choose "Named entity" as the layer
* Choose "value" as the feature
* Choose "Multi-token Sequence Classifier (OpenNLP NER)" as tool
If you want a recommender for linking as well, repeat the process but this time choosing "identifier" as the feature.
> • could we search for wikidata items via copying url for specific ids (e.g. https://www.wikidata.org/wiki/Q12028233
and preserving the description of the item)
Search by URL in principle works, but I can see that the result is "lost" in the list of other results. Also, you'd have to use `https://www.wikidata.org/entity/Q1202823`
instead of `https://www.wikidata.org/wiki/Q12028233`
. But I can see that doesn't work as smoothly as one would expect. I have opened an issue:
> • can we play with wikidata search relevancy? e.g. when looking up Czech republic in wikidata in named entity layer, we see the relevant item as 11th in the list
This is not configurable, but if you are open to editing the code:
> • is it possible to label the whole document with categories from a defined set (politics, sports, etc.)
INCEpTION can do it, yes, but you have to enable the feature first:
Reason this is behind a feature flag is that certain functions of INCEpTION do not yet support document-level annotations. E.g. there are no recommenders for the document-level annotations and also no curation functionality. Also, export is limited to UIMA CAS XMI.
User-friendliness of document-level annotations will improve in the upcoming v21.0, but the limitations regarding curation, recommenders and export remain.
> • when annotating English documents, we want to annotate only the stem, not e.g. possessive 's, as in the underlined cases here: UFC's or McGregor-Poirier fight
You can configure your custom layers for "character" granularity. Then you can create sub-token annotations. However, I would possibly recommend annotating the whole word because if you enable sub-token annotations, then it's easy to unintenionally miss or include characters. You can still create an other annotation on a "Stem" layer on the same word and write the stem string into a feature on that layer.