Baseline system crashing

17 views

Skip to first unread message

Christos Christodoulopoulos

unread,

Apr 3, 2012, 11:18:59 AM4/3/12

to wils-ch...@googlegroups.com

Hi,

I'm trying to run the baseline system (http://code.google.com/p/pr-toolkit/downloads/detail?name=pr-dep-parsing.2010.11.tgz) on the data files of the challenge and I get NumberFormatException errors.
I ran dep-parsing-2010.11.jar with -trainingType 2 with the transitive corpus parameter set to true (to merge the train, devel and test corpora).
The exception is at
depparsing.data.DepInstanceList.readFromConll(DepInstanceList.java:55)
where the code is trying to read the head-word indices of the training data (I assume this is just for evaluation during training right?)

However, given that the files are unlabelled (except devel) the code is running into trouble. (Also I think the head-word index positions is shifted by 1 now that the universal tags are added, but that's a minor issue)

Is there a solution to this problem or should I just replace the '_' with dummy indices?

Thanks,
Chris

Trevor Cohn

unread,

Apr 3, 2012, 11:40:24 AM4/3/12

to wils-ch...@googlegroups.com

Hi Chris,

We've got a slightly modified version to cope with the changes in the
file format. I'll was just about to put it on the wiki, along with the
evaluation scripts. Hold tight, we just need to package it up.