Hi,
I'm trying to run the baseline system (
http://code.google.com/p/pr-toolkit/downloads/detail?name=pr-dep-parsing.2010.11.tgz) on the data files of the challenge and I get NumberFormatException errors.
I ran
dep-parsing-2010.11.jar with
-trainingType 2 with the
transitive corpus parameter set to true (to merge the train, devel and test corpora).
The exception is at
depparsing.data.DepInstanceList.readFromConll(DepInstanceList.java:55)
where the code is trying to read the head-word indices of the training data (I assume this is just for evaluation during training right?)
However, given that the files are unlabelled (except devel) the code is running into trouble. (Also I think the head-word index positions is shifted by 1 now that the universal tags are added, but that's a minor issue)
Is there a solution to this problem or should I just replace the '_' with dummy indices?
Thanks,
Chris