Hi,
I am research intern working on retraining SEMAFOR to start recognizing new frames(that are there in FrameNet 1.7 data now but were not used for training SEMAFOR) while it parses sentences. We tried the following approach:
- Introduced the new frames in .map files (both framenet.original.map and framenet.frame.element.map)
- Changed the occurrence of these .map files everywhere in the semafor directory and within the maltmodel directory
- Introduced the .xml files of the lexical units of the new frames from the Framenet 1.7 data in lu folder of framenet 1.5 data
- Retrained by following the process available here
We were not able to capture the frames using this approach. To be sure that this was correct we tried to remove one of the existing frames which was only available in the exemplar files and not in the full annotated text. For this we again :
- Removed the frame from the .map files
- Changed the occurrence of .map with these new files
- Removed the .xml files of the lexical units of this frame from the lu folder
- Retrained
The new model that we got after retrain was still able to capture this frame.
PS - While shifting to new trained model we are just changing the environment variable that points to malt_model to the new model.
We will be grateful if you can help us answer these questions :
- Are we wrong to assume that the instructions provided on the github repo can be used to train semafor on introducing a frame.
- Why is the removed frame still caught by the SEMAFOR model ? Does SEMAFOR in some sense fine tunes the existing model based on the new data or we are not using the newly trained model at all ?
- If we are following a wrong process than can you please help us get on the right track ?
I would really appreciate any help I can get on this matter. Thank you.
Abhinav Agrawal