Import of GermEval 2014 dataset

16 views
Skip to first unread message

Christian Gawron

unread,
Sep 16, 2020, 4:09:23 AM9/16/20
to webanno-user
Dear WebAnno team,

I'm trying to import the GermEval2014 NER dataset into the public WebAnno instance (https://webanno.sfs.uni-tuebingen.de/webanno). My plan is to add other entity annotations related to date, time, and currency to the dataset.

Since the dataset was created with WebAnno, I was hoping that WebAnno supports the modified CoNLL format used by GermEval 2014.

Unfortunately, the import does not work for me (there is no error message, but regardless of the format I'm using, the documents are not uploaded).
I am trying to upload the official TSV files for the train, test, and dev splits.

Is there a possibility to import the dataset without manual preprocessing or is the GermEval2014 dataset available in a format supported by WebAnno?

Thanks in advance & best wishes
Christian Gawron

Richard Eckart de Castilho

unread,
Sep 17, 2020, 3:53:23 AM9/17/20
to Christian Gawron, webanno-user
Hi,

> On 16. Sep 2020, at 10:09, Christian Gawron <christia...@gmail.com> wrote:
>
> Since the dataset was created with WebAnno, I was hoping that WebAnno supports the modified CoNLL format used by GermEval 2014.

WebAnno uses the DKPro Core Conll2002Reader which supports the GermEval2014 format. However, the reader must be configured specifically
to be able to be able to read GermEval2014 and WebAnno doesn't do that configuration. That is why it can currently only load the
default CoNLL 2002 format.

If there is interest in importing the GermEval2014, it shouldn't be too difficult to add a second configuration of the reader as a new
format to WebAnno.

Best report it as a feature request on GitHub:

https://github.com/webanno/webanno/issues/new/choose

Cheers,

-- Richard

Christian Gawron

unread,
Sep 17, 2020, 4:30:45 AM9/17/20
to Richard Eckart de Castilho, webanno-user, Julia Warnke
Hi Richard,

If there is interest in importing the GermEval2014, it shouldn't be too difficult to add a second configuration of the reader as a new
format to WebAnno.
actually I just reformatted the GermEval2014 dataset to CoNLL2002 which is just a few lines of python code.

I have one question, though:
The official CoNLL2003 NER dataset contains only one NER layer, so I only used the first layer from GermEval2014.

Would the WebAnno import of CoNLL2002 support a second layer, i.e. a third column with NER tags?

Best wishes
Christian 
Reply all
Reply to author
Forward
0 new messages