Sentence classification by verb form, tense, activity state, etc?

Ramon

unread,

Feb 26, 2013, 11:25:57 AM2/26/13

to nltk-...@googlegroups.com

Hello --

I'm new to NLTK and am trying to determine whether it can be (easily) used to do basic sentence classification based on tense, form, etc. Here are some simple examples of what I'm trying to do:

Verb Form

Active Form: I updated my web page.
Passive Form: My web page was updated.

Do vs Be (not sure how this is formally described in grammar):

Be / State-focused: My web page is updated.
Do / Action-focused: My web page was updated:

Tense:

Past Tense: I updated my web page.
Present Tense: I am updating my web page.
Future Tense: I will update my web page.

I'd like to feed in a sentence like "My web page was updated" and have it be tagged (Passive, Action-focused, Past Tense).

It has been years since I looked at anything NLP related so I'm not even sure if this problem fits into the traditional sentence classification space or has its own name. Is this doable with NLTK?

Thank you in advance for your help.

Ramon

Jacob Perkins

unread,

Feb 27, 2013, 10:32:10 AM2/27/13

to nltk-...@googlegroups.com

Hi Ramon,

Based on your examples, I think it would work to use a part-of-speech tagger to tag a sentence, then look for specific kinds of tags. The brown corpus has a wider variety of tags than treebank, and different tags for "to be" verbs, so I'd think training a tagger on that would be the best option. For example, the BE tag would indicate present tense Be form, BED would be indicate past tense Be form, and no BE* tag would indicate not the Be form. You can find a list of all the brown corpus tags here: https://en.wikipedia.org/wiki/Brown_Corpus. And for training a tagger, I recommend using the train_tagger.py script in https://github.com/japerk/nltk-trainer

Jacob

---

http://streamhacker.com

http://text-processing.com

http://twitter.com/japerk

Blake Griffith

unread,

Sep 7, 2016, 5:11:42 AM9/7/16

to nltk-users

Hello, I came across this while looking to do something similar.

I want to identify passive voice sentences, like Ramon. But then I want to change those sentences to active voice.

I want to do this with very contemporary data sources. Like from social media. Would the brown corpus work for this even though it is so old?

Reply all

Reply to author

Forward