Any Suggestion for converting .ann format to Conll IOB format?

1,265 views
Skip to first unread message

Min Jun Park

unread,
Mar 6, 2015, 9:21:48 AM3/6/15
to brat-...@googlegroups.com
Hello,

I am totally new to brat's standoff format which is based on start/end-offset.

I mostly use NLTK library in Python. They provide various readers that supports different formats of annotated texts.

What I want to do is making the .ann format properly processed, so that it is able be handled in NLTK.

for an example of brat tutorial,

1 ) Citibank was involved in moving about $100 million for Raul Salinas de Gortari, brother of a former Mexican president, to banks in Switzerland.

T1	Organization 418 426	Citibank
T2	Money 456 468	$100 million
T3	Transfer-money 443 449	moving
E1	Transfer-money:T3 Giver-Arg:T1 Money-Arg:T2 Beneficiary-Arg:T4 Recipient-Arg:T6
T4	Person 473 496	Raul Salinas de Gortari
T5	Person 511 535	former Mexican president
T6	Organization 540 545	banks
T7	GPE 549 560	Switzerland
R2	Origin Arg1:T6 Arg2:T7	
#1	AnnotatorNotes T2	100000000 USD
R1	Family Arg1:T4 Arg2:T5	
A1	Mention T4 Name
A2	Individual T4
A3	Mention T5 Nominal
A4	Individual T5
A5	Confidence E1 High
N1	Reference T5 Wikipedia:64488	Carlos Salinas de Gortari

I want to get parsed data in IOB format like below.

Citibank NNP B-ORG
was VBD O
involved VBN O
in IN O
moving VBG O
about RB O
$ $ B-MONEY
100 CD I-MONEY
million CD I-MONEY
for IN O
Raul NNP B-PER
Salinas NNP I-PER
de FW I-PER
Gortari NNP I-PER
, , O
brother NN O
of IN O
a DT O
former JJ B-PER
Mexican JJ I-PER
president NN I-PER
, , O
to TO O
banks NNS B-ORG
in IN O
Switzerland NNP B-GPE
. . O

Of course, POS tagging should be executed seperately with .txt text file. consequently, POS tagging process will change start/end offset of each entities in ann format.
Then how can I locate the entity, say, 'Raul Salinas de Gortari' in post-POS-tagging text? Suppose I can, how can I put these IOB tags on relevant entities?

I believe converting to IOB format is needed for NER task. If there are some misunderstandings, please give me some advice. Thank you in advance.

Pidugu Sundeep

unread,
Mar 26, 2018, 5:39:59 AM3/26/18
to brat-users
Any update on how to convert ?

Pidugu Sundeep

unread,
Apr 15, 2018, 12:44:26 PM4/15/18
to brat-users
I wrote my own python file for the conversion check this out-> https://github.com/pidugusundeep/Brat-and-snorkel/blob/master/ann-coll.py , but this doesn't include POS tagging hope you can tweak the code and try it out else wait for an update on the current code.


On Friday, March 6, 2015 at 7:51:48 PM UTC+5:30, Min Jun Park wrote:
Reply all
Reply to author
Forward
0 new messages