Sentence classification by verb form, tense, activity state, etc?

1,126 views
Skip to first unread message

Ramon

unread,
Feb 26, 2013, 11:25:57 AM2/26/13
to nltk-...@googlegroups.com
Hello --

I'm new to NLTK and am trying to determine whether it can be (easily) used to do basic sentence classification based on tense, form, etc. Here are some simple examples of what I'm trying to do:

Verb Form
  • Active Form: I updated my web page.
  • Passive Form: My web page was updated.
Do vs Be (not sure how this is formally described in grammar):
  • Be / State-focused: My web page is updated.
  • Do / Action-focused: My web page was updated:
Tense:
  • Past Tense: I updated my web page.
  • Present Tense: I am updating my web page.
  • Future Tense: I will update my web page.
I'd like to feed in a sentence like "My web page was updated" and have it be tagged (Passive, Action-focused, Past Tense). 

It has been years since I looked at anything NLP related so I'm not even sure if this problem fits into the traditional sentence classification space or has its own name. Is this doable with NLTK?

Thank you in advance for your help.

Ramon

Jacob Perkins

unread,
Feb 27, 2013, 10:32:10 AM2/27/13
to nltk-...@googlegroups.com
Hi Ramon,

Based on your examples, I think it would work to use a part-of-speech tagger to tag a sentence, then look for specific kinds of tags. The brown corpus has a wider variety of tags than treebank, and different tags for "to be" verbs, so I'd think training a tagger on that would be the best option. For example, the BE tag would indicate present tense Be form, BED would be indicate past tense Be form, and no BE* tag would indicate not the Be form. You can find a list of all the brown corpus tags here: https://en.wikipedia.org/wiki/Brown_Corpus. And for training a tagger, I recommend using the train_tagger.py script in https://github.com/japerk/nltk-trainer

Jacob
---

Blake Griffith

unread,
Sep 7, 2016, 5:11:42 AM9/7/16
to nltk-users
Hello, I came across this while looking to do something similar.

I want to identify passive voice sentences, like Ramon. But then I want to change those sentences to active voice. 

I want to do this with very contemporary data sources. Like from social media. Would the brown corpus work for this even though it is so old?
Reply all
Reply to author
Forward
0 new messages