Hi,Does NLTK have a provision to extract character n-grams from text? I would like to extract character n-grams (instead of traditional unigrams,bigrams) as features to aid my text classification task.
True, though we've also provided it:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.util-module.html#ngrams
And if anyone is wondering whether NLTK supports certain
functionality, you might try consulting the documentation index here:
http://nltk.googlecode.com/svn/trunk/doc/api/identifier-index.html
-Steven