Hi,
in the file attached is an example.
The "to_unicode_or_bust function" converts UTF-8 to Unicode. The
uString is an example in German.
Import the tokenizer from nltk. Then split into sentences and words
are tokenized.
The result is then written back to a file.
Enjoy,
Mirko
> --
> You received this message because you are subscribed to the Google Groups
> "nltk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
nltk-users+...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
>