Convert BIO or BIOES to Brat standoff format

127 views
Skip to first unread message

Antonio Miranda

unread,
Sep 9, 2020, 6:32:32 AM9/9/20
to bionl...@googlegroups.com, brat-...@googlegroups.com
Hi,

I am looking for tools to convert from a token-based representation of text (BIO or BIOES) to Brat standoff format. The tools should also align the generated ANN file with the original text, since misalignments that alter the original token positions may arise during the sentence reconstruction. 

If you know any of these resources, please, let me know. If the tool remains efficient when we want to convert many documents, that would be a plus. Links to GitHub repos or similar would be extremely helpful.

Thanks in advance!

--
Antonio Miranda
Biomedical Engineer at Barcelona Supercomputing Center
Phone: 0034 649227310
Location: Barcelona, Spain

Goran Topic

unread,
Sep 9, 2020, 9:45:20 AM9/9/20
to brat-...@googlegroups.com
I thought I'd quickly write one, so I committed it to brat repo. Pull the newest master from GitHub, and you will find it under `tools/bioes2standoff.py`. Then I found there already was something similar in `tools`. Mine is probably more permissive and easier to use, though. I just hacked this together in like an hour, so if you find any errors or problems, please let me know. You can convert multiple documents like this:

for file in bio/*.txt; do tools/bioes2standoff.py "$file" "${file%.txt}.bio" "${file%.txt}.ann"; done

or, if that is too inefficient, from Python:

from tools.bioes2standoff import convert
for ...:
    convert(text_file, bio_file, ann_file)

--

---
You received this message because you are subscribed to the Google Groups "brat-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brat-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/brat-users/CAJA%2BUT1EAYGSxCXWNCoBKeE%2B2OYZKsafkmVpsOHTU7-NodQDjQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages