Is it possible to program a Grammar checker using the NLTK?

115 views
Skip to first unread message

max77

unread,
Aug 4, 2017, 5:04:20 AM8/4/17
to nltk-users
Hello community, 

I have really enjoyed the book so far and have a lot of great ideas on how to apply NLTK. One big question I have is: Is it possible (for individual like myself) to create a program that checks If a sentence is grammatically correct or not. I was speaking to a man who wrote an entire C++ problem solving book and he said that it would be very, very difficult. Well here I am and I just want to know just how difficult a task like this would be. 

Let me give you all a simple example:

The dog ran.                -This is a correct sentence.

The ran dog.                -Oops this isn't grammatically correct. 

If anyone is casually reading this post, I would love to hear their ideas on how to approach this. I know this has been talked about in the past, but many years ago.


Best Regards,

-Max

Dimitriadis, A. (Alexis)

unread,
Aug 4, 2017, 5:56:17 AM8/4/17
to nltk-...@googlegroups.com
Hi Max,

Since you’ve already had an expert answer in principle, and this is the nltk-users list, here’s a couple of nltk remarks to get you started:

- You can use the nltk’s cfg module to write rules for simple grammars, which you can pass to a chart parser (or other type) to define a parser that recognizes which sentences match the grammar you defined. You can then take any of the nltk’s many corpora of actual English text (e.g., the Brown corpus) and check how many of its sentences your parser can accept. (Spoiler: The proportion will stay very close to zero no matter how many rules you write. Language is very complex).

- You can separately download the Stanford parser, a “statistical parser” designed to parse real text, and use it from within the nltk. The Stanford parser doesn’t declare sentences as ungrammatical, but suppose it did? You can get a feel for how accurate it would be by looking at how often it makes mistakes with middling-complex grammatical sentences: I believe you’ll find enough errors that you wouldn’t want to trust it as the judge of what is ungrammatical. 

Anyway, seeing is believing. Try your hand with the nltk, and come back to the list if you have any questions about its offerings.

Best,


Alexis

PS Perhaps you weren’t thinking of yourself as the standard of difficulty? In that case your question is not about the nltk, and does not belong on the list. Outside the nltk, look at how poorly the “grammar checker” in Microsoft Word, or in any other software that offers one. Think about why those well-funded companies don’t offer anything better…



Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

max77

unread,
Aug 8, 2017, 6:31:44 PM8/8/17
to nltk-users
Thanks Alexis for the terrrific feedback as always. I will think about the tools that you mentioned and continue to keep an open mind. Come to think about it, the only company I can think of that has really complex software is Grammarly. -Best

Alex Rudnick

unread,
Aug 8, 2017, 8:37:30 PM8/8/17
to nltk-...@googlegroups.com
Excellent points, Alexis!

Another approach that might be interesting would be training an n-gram
model on words or perhaps on POS tags.

You could imagine a simple grammar checker that does something like this:

- POS tag the input sentence
- run the sequence of POS tags through a language model (where the
model was trained on lots of grammatically correct tagged sentences).

Now if the model thinks it's a very unlikely sequence, you have some
evidence that it's not a grammatical sentence?

Clearly that wouldn't get you all the way there, but it could help. I
think grammar checking is a big enough problem to require lots of
different inputs.

On Tue, Aug 8, 2017 at 4:03 AM, max77 <maxsne...@gmail.com> wrote:
> Thanks Alexis for the terrrific feedback as always. I will think about the
> tools that you mentioned and continue to keep an open mind. Come to think
> about it, the only company I can think of that has really complex software
> is Grammarly. -Best

--
-- alexr
Reply all
Reply to author
Forward
0 new messages