Issue 736 in nltk: agreement error

4 views
Skip to first unread message

nl...@googlecode.com

unread,
Jul 21, 2012, 2:54:54 AM7/21/12
to nltk-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 736 by daniele....@gmail.com: agreement error
http://code.google.com/p/nltk/issues/detail?id=736

nltk.__version__ '2.0.2'


File:https://github.com/nltk/nltk/blob/master/nltk/metrics/agreement.py

Method:
def agr(self, cA, cB, i, data=None):
"""Agreement between two coders on a given item

"""
data = data or self.data
kA = (x for x in data if x['coder']==cA and x['item']==i).next()
kB = (x for x in data if x['coder']==cB and x['item']==i).next()
ret = 1.0 - float(self.distance(kA['labels'], kB['labels']))
log.debug("Observed agreement between %s and %s on %s: %f",
cA, cB, i, ret)
log.debug("Distance between \"%r\" and \"%r\": %f",
kA['labels'], kB['labels'], 1.0 - ret)
return ret


When the function Ao(self, cA, cB) call method:
line 161:
ret = float(sum(self.agr(cA, cB, item, item_data) for item, item_data in
data)) / float(len(self.I))

if the coders in the data in the file are swapped then the line where the
coders are swapped is not processed.
Example:

cat tmp.txt
a 1 stat
b 1 stat
a 2 foo
b 2 stat
b 3 foo
a 3 stat


DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b on ['1']: 1.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Distance
between "frozenset(['stat'])"
and "frozenset(['stat'])": 0.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b on ['2']: 0.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Distance
between "frozenset(['foo'])"
and "frozenset(['stat'])": 1.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b: 0.333333
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Expected
agreement
between a and b: 0.555556
-0.5

In this case there are two only iterations, instead of three.


cat tmp2.txt
a 1 stat
b 1 stat
a 2 foo
b 2 stat
a 3 foo
b 3 stat

DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b on ['1']: 1.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Distance
between "frozenset(['stat'])"
and "frozenset(['stat'])": 0.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b on ['2']: 0.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Distance
between "frozenset(['foo'])"
and "frozenset(['stat'])": 1.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b on ['3']: 0.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Distance
between "frozenset(['foo'])"
and "frozenset(['stat'])": 1.000000
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Observed
agreement
between a and b: 0.333333
DEBUG:/usr/local/lib/python2.7/dist-packages/nltk/metrics/agreement.py:Expected
agreement
between a and b: 0.333333
0.0

In this case there are three iterations.


Reply all
Reply to author
Forward
0 new messages