Unexpected input format when trying to re-train SEMAFOR.

57 views
Skip to first unread message

ju...@calabs.ca

unread,
Sep 16, 2016, 11:25:17 AM9/16/16
to semafor-users, stho...@cs.cmu.edu, dipa...@gmail.com
Hello,

I've been trying to re-train SEMAFOR, and I've been running into a few issues.

When running the script 4_1_createAlphabet.sh the method DataPointWithFrameElements.decomposeFELine I get a NumberFormatException. The method takes a "frame element line" and after splitting on tabs, it tries to get three numerical values from it: rank, score, numSpans.
(see code below)

This doesn't work and I get a NumberFormatException since the input, in reality, looks like this:
[4, Economy, economy.n, 7, economy, 1, Political_region, 3:4, Descriptor, 5:6, Economy, 7]


Especially the second and third values (score and numSpans) are not numerical values in the input, and it doesn't seem like any score is present in this part of the data at all.
Am I missing something?

Any help is much appreciated! Thank you in advance!

Best regards,
Julio

giancarl...@gmail.com

unread,
Oct 5, 2016, 7:39:43 PM10/5/16
to semafor-users, stho...@cs.cmu.edu, dipa...@gmail.com, ju...@calabs.ca
The problem is that the .frame.elements files in the test/dev/test splits in naacl12 are not in the correct format. Every line should be prepended by two numerical values (rank and score) where rank is an nonnegative integer and score is a double precision value. I have gotten the retraining to work by prepending every line with "0 1.0", but this is obviously not desirable. I would like to actually get the correct naacl12 splits so we can train semafor on the correct training set.

Nathan Schneider

unread,
Oct 31, 2016, 8:32:39 PM10/31/16
to giancarl...@gmail.com, semafor-users, Sam Thomson, Dipanjan Das, ju...@calabs.ca
Sam has finally tracked down the problem with the NAACL 2012 splits, and updated the link in the README: https://github.com/Noahs-ARK/semafor/blob/master/training/README.md These files should be in the correct format. Thanks to all who brought the problem to our attention!

Best,
Nathan

--
You received this message because you are subscribed to the Google Groups "semafor-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semafor-users+unsubscribe@googlegroups.com.
To post to this group, send email to semafo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semafor-users/498e3fbf-fd68-4ba0-9b6c-5f1297bb18d9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages