Training Semafor - Alphabet Creation Step

100 views
Skip to first unread message

kanan...@berkeley.edu

unread,
May 10, 2016, 4:46:41 PM5/10/16
to semafor-users
Hi, I had a problem attempting to retrain SEMAFOR. I am using the version available here: (https://github.com/Noahs-ARK/semafor/)

When running 3_1_idCreateAlphabet.sh, I get a Number Format Exception "for input string "3:4" "

I am using the naacl2012 splits, this string is part of the role span pairs in cv.train.sentences.frame.elements.

Is this step assuming the data will be formatted differently? In the training/data/README it describes the data as it is in the naacl2012 directory, however it seems like the colon is causing problems here.

I tried editing cv.train.sentences.frame.elements to only include the first token, rather than a span (so 3 instead of 3:4), just to see if it would run through like that, but this provides another error. (IndexOutOfBoundsExceptions: index (2) must be less than size (1).

Thanks in advance!


ju...@calabs.ca

unread,
Sep 14, 2016, 1:31:25 PM9/14/16
to semafor-users, kanan...@berkeley.edu
Hi,
I'm running into the same problem. Did you find a solution for it?
Thank you in advance!

giancarl...@gmail.com

unread,
Oct 4, 2016, 10:50:21 PM10/4/16
to semafor-users, kanan...@berkeley.edu
So it looks like in the method "processLine", they throw out to first two fields in toks, but then do not adjust the indices in the tokens.get(i) calls. So you could either remove the ".sublist(2,toks.length)", or adjust the indices in the tokens.get(i) calls.

Hope this helps! 

nandi...@gmail.com

unread,
Mar 3, 2017, 7:23:44 PM3/3/17
to semafor-users, kanan...@berkeley.edu
Hello,
Where did you find cv.train.sentences.frame.elements?
Reply all
Reply to author
Forward
0 new messages