Correcting OCR errors and tokenization in INCEpTION

7 views
Skip to first unread message

Andrew Janco

unread,
Oct 26, 2021, 3:25:52 AM10/26/21
to inception-users
Is there any way to correct OCR errors in the text during annotation? Similarly, is it possible to correct tokenization errors? This capability would save us a lot of time.  Thank you for your help and talk soon! 

Best,
 Andy  

Richard Eckart de Castilho

unread,
Oct 26, 2021, 5:55:53 AM10/26/21
to inception-users
Hi Andy,

> On 26. Oct 2021, at 09:25, Andrew Janco <aja...@haverford.edu> wrote:
>
> Is there any way to correct OCR errors in the text during annotation?

Imagine one annotator would change the text, what would that mean to another annotator
independently working on the same text? Since they are independent, the other would and
should not see the changes. Worse, the other annotator might make a different change.
Eventually, both annotators would end up annotating actually different texts. How then
to compare the annotations between the two, e.g. to calculate an inter-rater agreement?

These questions are difficult to handle in a generic way.
Thus, for the time being, INCEpTION assumes that the text is fixed and unchangeable.

Currently, in an INCEpTION-based workflow, you have two options dealing with the situation:

1) edit the text before importing into INCEpTION - all annotators have the same text then

2) instead of actually modifying the text, annotators can make annotations that indicate
how they would modify the text. These change suggestions can be compared and curated.
Then the agreed-upon changes could be applied to the texts and after that a second round
of annotation starts where these texts are then annotated - again all annotators have
the same text then

> Similarly, is it possible to correct tokenization errors? This capability would save us a lot of time. Thank you for your help and talk soon!

We have started looking into making tokens and sentences editable objects in INCEpTION, but the feature is far from usable yet.

Here are some issues related to making tokens/sentences editable objects:

https://github.com/inception-project/inception/projects/53

Cheers,

-- Richard

Andrew Janco

unread,
Oct 26, 2021, 10:07:48 AM10/26/21
to inception-users
Thank you Richard, this information is very helpful. We'll keep an eye out for the new features and provide a workaround in the meantime.  Otherwise, we're all very impressed by inception and it's working beautifully. Best, Andy
Reply all
Reply to author
Forward
0 new messages