Questions on agreement score

Calliope Bandera

unread,

Apr 3, 2024, 2:11:11 PM4/3/24

to inception-users

Hello,

I had a few questions regarding the agreement score on span annotation.

Is there any way to know more in detail how krippendorff's alpha (unitizing) is actually computed?

Is there a way to obtain a unique agreement score for when there are more than 2 people (would doing the average of every pairwise score work)?

Is it possible in any way to relax the agreement calculation to count the overlap instead of exact match as an agreement?

Thank you,

Calliope Bandera

Richard Eckart de Castilho

unread,

Apr 3, 2024, 3:26:19 PM4/3/24

to inception-users

Hi Calliope

> On 3. Apr 2024, at 20:11, Calliope Bandera <calliop...@gmail.com> wrote:
>
> Is there any way to know more in detail how krippendorff's alpha (unitizing) is actually computed?

In INCEpTION 32.0 you will be able to export a tabular comparison of either a particular pair of annotators or even all annotators. While that does not explain specifically how the agreement is calculated, it gives you insight into the data on which the agreement is calculated.

In INCEpTION 31.x, you can also export a pairwise comparison by clicking on a score in the agreement table, but only if you select a non-unitizing measure.

The krippendorff's alpha (unitizing) basically takes the feature value and the position of the associated annotation and pours that into the algorithm. Unlike the non-unitizing measures, it does not filter out e.g. stacked annotations or such because the measure is supposed to be able to deal with them.

https://github.com/inception-project/inception/blob/e7c09321c3677024e9804dd9b0d4e45b97ee4bb2/inception/inception-agreement/src/main/java/de/tudarmstadt/ukp/clarin/webanno/agreement/measures/krippendorffalphaunitizing/KrippendorffAlphaUnitizingAgreementMeasure.java#L48-L99

The data then goes into the DKPro Statistics library:

https://github.com/dkpro/dkpro-statistics/blob/dkpro-statistics-2.2.1/dkpro-statistics-agreement/src/main/java/org/dkpro/statistics/agreement/unitizing/KrippendorffAlphaUnitizingAgreement.java

In INCEpTION 31 upwards, the agreements are calculated per document and then averaged across the documents. In earlier versions, the Krippendorff's agreements were calculated a bit differently by mapping all annotations into a large virtual document and then calculating agreement on that one - however, that proved to be too inefficient for large numbers of documents, so now the per-document-agreement-then-average approach is used.

Is there something special you would like to know about the computation, potentially some extra information that INCEpTION might be able to show in the UI?

> Is there a way to obtain a unique agreement score for when there are more than 2 people (would doing the average of every pairwise score work)?

All of the agreement measures in DKPro Statistics support calculating agreement between two annotators. There are some that also support more than two, but I don't remember off the top of my head which ones they are. Using a measure that actually supports multiple annotators is likely a better choice than averaging over pairwise agreements.

At the moment, INCEpTION does not offer calculating agreement over more than two annotators,
but you can give your thumbs up subscribe to the respective feature request to be notified when this functionality is added:

https://github.com/inception-project/inception/issues/1734

That said, you could e.g. us the multi-annotator diff export from INCEpTION 32.0 soon and process that to calculate multi-annotator agreement externally.

> Is it possible in any way to relax the agreement calculation to count the overlap instead of exact match as an agreement?

The non-unitizing measures only consider exact matches because to my knowledge the algorithms would not be able to deal with partial overlaps. Are familiar with agreement measures and do you have a suggestion on how to extend them towards partial overlap?

I believe it is exactly the idea of the unitizing Krippendorff's Alpha measure to overcome this issue and also handle partial overlaps. That is why we can quite directly dump annotation data into that measure while for the non-unitizing measures, we first have to create a representation that says exactly which annotations to compare with which others - which then leads to the exact match requirement.

Cheers,

-- Richard

Torsten Zesch

unread,

Apr 4, 2024, 4:03:28 AM4/4/24

to incepti...@googlegroups.com

In addition to what Richard already said, maybe Gamma would be suitable metric for your use case?
https://aclanthology.org/J15-3003.pdf

You could access your project with inceptalytics (https://github.com/catalpa-cl/inceptalytics) and compute it there.

-Torsten

On 03.04.24, 21:26, "incepti...@googlegroups.com <mailto:incepti...@googlegroups.com> on behalf of Richard Eckart de Castilho" <incepti...@googlegroups.com <mailto:incepti...@googlegroups.com> on behalf of richard...@gmail.com <mailto:richard...@gmail.com>> wrote:

Hi Calliope

> On 3. Apr 2024, at 20:11, Calliope Bandera <calliop...@gmail.com <mailto:calliop...@gmail.com>> wrote:
>
> Is there any way to know more in detail how krippendorff's alpha (unitizing) is actually computed?

In INCEpTION 32.0 you will be able to export a tabular comparison of either a particular pair of annotators or even all annotators. While that does not explain specifically how the agreement is calculated, it gives you insight into the data on which the agreement is calculated.

In INCEpTION 31.x, you can also export a pairwise comparison by clicking on a score in the agreement table, but only if you select a non-unitizing measure.

The krippendorff's alpha (unitizing) basically takes the feature value and the position of the associated annotation and pours that into the algorithm. Unlike the non-unitizing measures, it does not filter out e.g. stacked annotations or such because the measure is supposed to be able to deal with them.

https://github.com/inception-project/inception/blob/e7c09321c3677024e9804dd9b0d4e45b97ee4bb2/inception/inception-agreement/src/main/java/de/tudarmstadt/ukp/clarin/webanno/agreement/measures/krippendorffalphaunitizing/KrippendorffAlphaUnitizingAgreementMeasure.java#L48-L99 <https://github.com/inception-project/inception/blob/e7c09321c3677024e9804dd9b0d4e45b97ee4bb2/inception/inception-agreement/src/main/java/de/tudarmstadt/ukp/clarin/webanno/agreement/measures/krippendorffalphaunitizing/KrippendorffAlphaUnitizingAgreementMeasure.java#L48-L99>

The data then goes into the DKPro Statistics library:

https://github.com/dkpro/dkpro-statistics/blob/dkpro-statistics-2.2.1/dkpro-statistics-agreement/src/main/java/org/dkpro/statistics/agreement/unitizing/KrippendorffAlphaUnitizingAgreement.java <https://github.com/dkpro/dkpro-statistics/blob/dkpro-statistics-2.2.1/dkpro-statistics-agreement/src/main/java/org/dkpro/statistics/agreement/unitizing/KrippendorffAlphaUnitizingAgreement.java>

In INCEpTION 31 upwards, the agreements are calculated per document and then averaged across the documents. In earlier versions, the Krippendorff's agreements were calculated a bit differently by mapping all annotations into a large virtual document and then calculating agreement on that one - however, that proved to be too inefficient for large numbers of documents, so now the per-document-agreement-then-average approach is used.

Is there something special you would like to know about the computation, potentially some extra information that INCEpTION might be able to show in the UI?

> Is there a way to obtain a unique agreement score for when there are more than 2 people (would doing the average of every pairwise score work)?

All of the agreement measures in DKPro Statistics support calculating agreement between two annotators. There are some that also support more than two, but I don't remember off the top of my head which ones they are. Using a measure that actually supports multiple annotators is likely a better choice than averaging over pairwise agreements.

At the moment, INCEpTION does not offer calculating agreement over more than two annotators,
but you can give your thumbs up subscribe to the respective feature request to be notified when this functionality is added:

https://github.com/inception-project/inception/issues/1734 <https://github.com/inception-project/inception/issues/1734>

That said, you could e.g. us the multi-annotator diff export from INCEpTION 32.0 soon and process that to calculate multi-annotator agreement externally.

> Is it possible in any way to relax the agreement calculation to count the overlap instead of exact match as an agreement?

The non-unitizing measures only consider exact matches because to my knowledge the algorithms would not be able to deal with partial overlaps. Are familiar with agreement measures and do you have a suggestion on how to extend them towards partial overlap?

I believe it is exactly the idea of the unitizing Krippendorff's Alpha measure to overcome this issue and also handle partial overlaps. That is why we can quite directly dump annotation data into that measure while for the non-unitizing measures, we first have to create a representation that says exactly which annotations to compare with which others - which then leads to the exact match requirement.

Cheers,

-- Richard

--
You received this message because you are subscribed to the Google Groups "inception-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inception-use...@googlegroups.com <mailto:inception-use...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/inception-users/EF937CF3-AB79-40DC-87BA-A48E94E9B098%40gmail.com <https://groups.google.com/d/msgid/inception-users/EF937CF3-AB79-40DC-87BA-A48E94E9B098%40gmail.com>.

Richard Eckart de Castilho

unread,

Apr 4, 2024, 4:43:19 AM4/4/24

to incepti...@googlegroups.com

It would be nice if we had Gamma in DKPro Statistics ;)

-- Richard

Richard Eckart de Castilho

unread,

Apr 11, 2024, 2:53:00 AM4/11/24

to incepti...@googlegroups.com

Hi,

> On 3. Apr 2024, at 20:11, Calliope Bandera <calliop...@gmail.com> wrote:
>
> Is there a way to obtain a unique agreement score for when there are more than 2 people (would doing the average of every pairwise score work)?

INCEpTION 32.0 will add per-document agreement statistics across all annotators (for those measures that support it).

See: https://github.com/inception-project/inception/issues/4704

Screenshot 2024-04-11 at 08.48.00.png

Reply all

Reply to author

Forward