Hi!
I have a relation extraction problem, and I am trying to use NLTK's built in function, extract_rels(), to extract the relations. However, I wish to extract them using my own list of entities, instead of their conventional named entities such as: ORGANIZATION, PERSON, LOCATION, etc.
Here is a sample of my code:
-----------------------------------------------------------------------------------------------------------------------------------------
for i in range(1, len(sys.argv)):
with open(sys.argv[i], 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
IN = re.compile(r'.')
for i, sent in enumerate(tagged_sentences):
sent = nltk.ne_chunk(sent)
rels = extract_rels('PER', 'ORG', sent, corpus='ace', pattern=IN)
for rel in rels:
print('{0:<5}{1}'.format(i, rtuple(rel)))
#end for rel
#end for i
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I wish to replace 'PER' and 'ORG' with my own list of entities. If that is not possible, can anyone point me to a different resource that could help me achieve my goal? Thank you for the help!
Best,
Shawny Boy