I want to tokenize the paragraphs in a block of text. I searched for "NLTK tokenize paragraphs" on the internet and found a few postings referencing NLTK but when I look on the NLTK site I can find no tokenize paragraph module. Is there or was there such a module? If not I would appreciate any references to how to tokenize paragraphs.Cheers, BobS
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/nltk-users/c64a027c-81e8-413b-b767-101f2c085166n%40googlegroups.com.
Julius, thanks for the suggestions, it is likely that splitting on newlines will work. I will give it a try. I am working with transcripts from corporate earnings calls so the text structure is conventional. I want to extract paragraphs that contain any keyword from a list (solar, wind, renewable etc.). I can do this for sentences but not yet for paragraphs which would provide more context.
To view this discussion on the web, visit https://groups.google.com/d/msgid/nltk-users/CAEsMKX1-KDsEYP9A-Rc9K-oY4MKbvB%2Bct%3DM0gaC4AmT4rVGU-A%40mail.gmail.com.