First off, I wanted to express my appreciation for this program. I've been creating a synthetic-phonics based reading curriculum (with a few other specific parameters), which entails over 160 different lessons, each with an associated wordlist added to the total words that the student can decode. Last week I was desperately wishing for a way to take a quote or a poem and automatically compare it against the wordlists, to see at which point a student would have mastered every word in the passage. I knew there had to be something out there, and was overjoyed to discover AntWordProfiler would let me do this - and that it ran on Linux! (I keep Windows VMs around, but native programs are always best if possible.) Even better, it could tell me which words I had forgotten to add to the lists somewhere, which is so helpful. :D
There are only two things I wonder how to do - if it's possible:
1) How do I include words such as "can't" and "don't" in the lists? My version seems to treat them as two separate words each "can" and "t", and "don" and "t", which is not so helpful. The same goes for hyphenated words. I'd like to be able to treat them as a single word. I found several questions about them in this list, but all seemed to be about encoding, and by default every text file I create is UTF-8, so I haven't seen any weird characters, it just doesn't treat them as a single word.
2) Is there a way to hide unused wordlists in the results? Currently it spits back a result for every one of the 160 wordlists, when a single sentence couldn't possibly have words in every one of the lists, resulting in empty headers for each list that wasn't used. Since I'm mainly using this to find the latest wordlist that a word appears in, simply hiding any wordlist that wasn't used at all make it a ton easier to find the last wordlist used.