Hi David,
You might try using the
TF-IDF scores for the document's tokens. I'm not sure if there's an easy way to expose those at the moment though. You'd need to essentially run the VectorSpaceModel with some hacking inside of the processSpace method to figure out which terms have the highest weights. We have both implemented.
One question is how you want to determine what is "key." Your earlier post made it sound like the most important aspect is frequency. However, terms like "the" will show up fairly frequent. Does it matter how often the terms show up in other resumes as well? (e.g., will rarer terms count more towards being "key"?). There's no one answer for how to do this, so it will probably depend on what kind of terms you want to find.
Thanks,
David