Yeah T -- that's what I used to do the original implementation. It only does bigrams (and trigrams) however, and doesn't work particularly well for small documents.
To identify unique 1-grams I need to do tf-idf which requires a corpus to base uniqueness on. I'll just use a default (e.g. crappy) corpus for now, since nltk has a few default corpuses built in. I'm going to write up a proposed architecture for multi-document operations (this would allow someone to upload their own corpus). I have some ideas on that front already.
To identify good 2 and 3-grams in small bodies of text I'll need to think a bit more.
Higher-than-3-grams are a low priority, but would be nice to have just because why not.
Best,
- Dan