I am working on an application that sends a list of nouns, structured within phrase searches, to Wikipedia and returns the number of hits for that phrase. The nouns must be singular. I have three input options:
1. Pattern parses a text file and extracts the nouns.
2. The nouns are entered by the user.
3. The program reads a list of nouns from a file.
All three options have been tested with the same set of words, including: happiness, sadness, tennis.
In option 1, the text file is tagged by the parser, and correctly leaves all three test words unchanged. This is my code:
if tag == "NNS":
noun = (singularize(noun, pos='NN', custom={})
tag = "NN"
That did not work in the other two (untagged) input options, so the program applies the code:
noun = singularize(noun)
which works absolutely fine with common nouns such as table, chair and room, and indeed with happiness, but deletes the final 's' of the other two words, so:
sadness = sadnes
tennis = tenni
All three words are in the Pattern en_lexicon file and are correctly tagged NN. So why do these two words 'not work' when the others do?
Apologies if I am missing something blindingly obvious, but I have been struggling with this all week! Any words of wisdom would be very gratefully received.
Thank you.
Kevin