Conversion from CoNLL-U ver 2 to Standoff

85 views
Skip to first unread message

Daria Mikhaylova

unread,
May 11, 2020, 6:11:50 AM5/11/20
to brat-users

Hello everyone,
I am trying to use brat to manually correct automatic annotation (Universal Dependencies) for my student project. I tried converted a file in CoNLL-U ver. 2 format, produced by UDPipe to standoff format using ConllXtostandoff.py from tools folder. In the resulting file syntactic relations are recognised correctly, instead POS-tags are mistaken, I guess it produces overall errors in brat. As I understand PoS-tags are taken from the 4th column instead of third.
Is there is way to correct conversion?
I will appreciate any help, thank you in advance!
Daria


This is example of input data, 10 columns:

conll.PNG


This is result seen from brat:

inbrat.PNG

This is a part of .ann file created by the script

ann.PNG


Goran Topic

unread,
May 11, 2020, 6:41:00 AM5/11/20
to brat-...@googlegroups.com
On Mon, May 11, 2020 at 7:11 PM Daria Mikhaylova <d.mikh...@gmail.com> wrote:
>
>
> Hello everyone,
> I am trying to use brat to manually correct automatic annotation (Universal Dependencies) for my student project. I tried converted a file in CoNLL-U ver. 2 format, produced by UDPipe to standoff format using ConllXtostandoff.py from tools folder. In the resulting file syntactic relations are recognised correctly, instead POS-tags are mistaken, I guess it produces overall errors in brat. As I understand PoS-tags are taken from the 4th column instead of third.
> Is there is way to correct conversion?

Sorry, I'm not really sure if format is different, or the conversion tool, or perhaps your needs... but you can adjust the tool to your needs quite easily. It should contain two lines like these:

            ID, form, POS = fields[0], fields[1], fields[4]
            head, rel = fields[6], fields[7]

It should be simple to edit the field positions in those lines to match the data you wish extracted.

Goran

Daria Mikhaylova

unread,
May 11, 2020, 9:02:11 AM5/11/20
to brat-users
Thank you very much, yes, part-of-speech tag in this format is in in 4th column, so I changed to fields[3], it works.
Do you know if anntoconll.py on github may be still used for backward conversion?


On Monday, 11 May 2020 12:41:00 UTC+2, Goran Topic wrote:

Goran Topic

unread,
May 11, 2020, 9:05:48 AM5/11/20
to brat-...@googlegroups.com
Sorry, no idea, I neither wrote it nor ever used it. Try, and if you
run into problems, please send exact details (input data example,
desired output, actual output) and we might figure something out. :)

Goran
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "brat-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to brat-users+...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/brat-users/4bd8d28e-a378-4a2e-8f76-4f80879bb006%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages