Include the original text in the .ann file

164 views
Skip to first unread message

Yang Yu'an

unread,
May 13, 2020, 12:18:35 PM5/13/20
to brat-users
Hi,
I was wondering if there is a way to we can include the text we are annotating in the .ann file. 
Currently, the .ann file shows all the annotations and the .txt shows the original text, but I was wondering if there is a way to include the text next to the annotations done on this text in one file. 

e.g.    the first annotation in the CoNLL demonstration file, let .ann output something like:
ORG 543 552 structure "The ‘problem’ for management or the owners of organizations is to get the structure right and operate it efficiently."

Right now when I try to extract data from the .ann file, I cannot point back to the original text, and have to go back and forth between the two files. I'm also really confused by the "line number" in the .ann file, because it is not the actual line number. 
Thanks!
Yu'an

Goran Topic

unread,
May 13, 2020, 12:49:15 PM5/13/20
to brat-...@googlegroups.com
`.ann` file includes the text of the annotated span (as is evident
from your example annotation); brat does not include more than that.
If you wish, you can include more in a notes annotation (basically a
comment), though brat will ignore it (except the note of the type
`AnnotatorNotes`).
I do not understand what you mean by "cannot point back to the
original text".. If you don't want to use the text in the annotation
itself, you can use the standoff and index the `.txt` file
accordingly.
Taking the example of your annotation, if you read the entire `.txt`
file into a variable `text` in Python, then `text[543:552]` should
yield the annotated text.
Also, there is no "line number" in `.ann` file, so again I am not sure
what you are referring to.

Goran
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "brat-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to brat-users+...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/brat-users/2a4708e6-53b6-42d3-8a12-30544c490606%40googlegroups.com.

Yang Yu'an

unread,
May 13, 2020, 7:13:45 PM5/13/20
to brat-users
Thank you so much! So there is no principled way to include more than the annotated span? 
Yu'an
> To unsubscribe from this group and stop receiving emails from it, send an email to brat-...@googlegroups.com.

Karin Verspoor

unread,
May 13, 2020, 10:18:51 PM5/13/20
to brat-...@googlegroups.com

The model for BRAT is “stand-off” annotation.

 

What you are looking for is “in-line” annotation.

 

The BioC format uses XML: https://academic.oup.com/database/article/doi/10.1093/database/bat064/341301

 

You may wish to see whether the brat2bioc tool is useful to you:

https://bitbucket.org/nicta_biomed/brat2bioc/src/default/

Jimeno Yepes A, Neves M, Verspoor K. (2013) Brat2BioC: conversion tool between brat and BioC. BioCreative IV Workshop, Bethesda, MD.

https://biocreative.bioinformatics.udel.edu/media/store/files/2013/bc4_v1_7.pdf

 

Karin

To unsubscribe from this group and stop receiving emails from it, send an email to brat-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/brat-users/6d250678-738f-429b-84bb-6d376d98e4bc%40googlegroups.com.

Yang Yu'an

unread,
Jun 1, 2020, 11:48:29 PM6/1/20
to brat-users
Thank you so much!! 
Reply all
Reply to author
Forward
0 new messages