Hi,
Currently I am using SEMAFOR (found interesting and excellent tool for my research) for one of my project. I download the source code and run it. I cross check the output with the web demo (http://demo.ark.cs.cmu.edu/parse) and found different json result.
My input (taking from news)
-- Adds detail , context , grafs 3 , 5 to end ; adds highlights , updates byline -LRB- CNN -RRB- -- A provincial council candidate and nine of his supporters were killed by the Taliban in Afghanistan two days after they were kidnapped , said Sakhidad Haidari , deputy police chief of northern Sar-e-Pul province .
I have couple of questions:
1. The json in web demo contains the tokens but those are missing in github built output. Is there any way to produce the same output?
2. The victim field are quite different. How does software calculate start and end index of a label of a frame? I took the character index in the github built output and found the
following victim which are different than web.
Github built:
Victim: Sakhidad Haidari , deputy police chief of northern Sar-e-Pul province
Victim: they
Web:
Victim: "A provincial council candidate and nine of his supporters"
Victim: they
I saw the web demo consider the token index (not character index) when calculating start and end but because of
different start/end index in github built json output file, I took character index (which seems more logical to me after
observing the output).
With best regards,
M Solaimani
Research Assistant
University of Texas at Dallas