Adding automatically generated annotations

176 views
Skip to first unread message

Mohammad Ghufran

unread,
Dec 14, 2016, 10:42:06 AM12/14/16
to brat-users
Hello, 

I have used brat for a little while for annotating some documents. Right now, I am trying to automatically generate annotations for some documents so that they can be imported into brat and then corrected manually with the help of brat. 

I thought it would be straight forward to do it: just generate the .ann files in the same format as generated by brat itself. But it seems to not be the case. If I open such a file in the interface, I don't see anything annotated but a bunch of errors:
For example: 

Unable to parse the following line(s):
1: T1 EducationInstitute 469 520 Institut Supérieur de l'Aéronautique et de l'Espace
3: T2 EducationInstitute 522 526 ISAE 

I have checked and the format of the auto-generated file is exactly the same as that of manually generated one. Could someone help me with this? 

Best Regards,
Ghufran



Goran Topic

unread,
Dec 14, 2016, 12:00:55 PM12/14/16
to brat-...@googlegroups.com
Sorry, would you mind posting your txt and ann file as an attachment?
If you paste them in mail, tabs tend to get lost, and I want to check
exactly where your error comes from.
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "brat-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to brat-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mohammad Ghufran

unread,
Dec 14, 2016, 2:18:56 PM12/14/16
to brat-users
Hello, 

I have attached a sample text file along with the annotation file.

Thank you for looking into it!

Ghufran
0460.ann
0460_sample.txt

Mohammad Ghufran

unread,
Dec 14, 2016, 2:53:52 PM12/14/16
to brat-users
I forgot to mention that the annotation file has Wikipedia urls as references which causes a warning because I believe that it expects an integer and not a URL. 
For my project, I am using the URL because I need it and since it is a warning only, I have let it be for the time being. Putting a url manually while annotating works so I expect it to not be a problem in the auto-generated file either..

Could you advise on how to debug the code? I looked at the code of src/server/annotation.py where the error originates but could not make the logging work. I tried using log_info("something") following some existing lines but it didn't appear in the output where the server is launched, nor in server.log (I'm not too experienced with python).

Also, is it possible to completely bypass checking whether the annotation and the actual file have the same text? I mean, if the content of the text file contains "A B C" and the annotation is for index 2 3 "b", the server complains since the text of the annotation is not the same as it is in the text.

Goran Topic

unread,
Dec 15, 2016, 4:55:00 AM12/15/16
to brat-...@googlegroups.com
Hi, Mohammad.

You have a non-breaking space (U+00A0) between span start and span
end, instead of the normal space (U+0020) as expected by brat.
There is no way to turn off the verification of the text of the
annotation. This catches errors such as yours, where the standoff is
incorrect (e.g. your first annotation is off by 36 bytes).
If you want to set up normalisation correctly, the tutorial is here:
http://brat.nlplab.org/normalization.html

Goran
Reply all
Reply to author
Forward
0 new messages