Using n-grams in a Document

48 views
Skip to first unread message

Tom Fawcett

unread,
Apr 8, 2013, 5:04:16 PM4/8/13
to pattern-f...@googlegroups.com
Hi. I’m using pattern.vector.Corpus.build to create a Corpus from a set of files. I’d like to use n-grams of words instead of just single words. The function pattern.en.ngrams generates n-grams but I don’t see any way to hook it into Document or Corpus. Any advice?

Thanks,
-Tom

Tom De Smelt

unread,
Apr 15, 2013, 6:41:46 PM4/15/13
to pattern-f...@googlegroups.com
Hi Tom,

Instead of passing a string to Document, you should be able to simply pass the output of ngrams() (a list of tuples of consecutive words) to Document. Have you tried:

from pattern.vector import Document
from pattern.en import ngrams
d = Document(ngrams('the black cat sat on the mat'))
print d.vector

?
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "Pattern" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pattern-for-pyt...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages