Maximum Likelihood Estimation bigrams

JAGANADH G

unread,

May 18, 2012, 3:09:53 PM5/18/12

to nltk-users

Hi All

How can we find the maximum Maximum Likelihood Estimation any example ?

Does the below given code correct to get the estimate ?

mle[bigram] = big_freq[bigram] / uni_freq[bigram.split(" ")[0]]

--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in

Alex Rudnick

unread,

May 20, 2012, 1:13:43 AM5/20/12

to nltk-...@googlegroups.com

Hey Jaganadh,

So "maximum likelihood estimate" just means "what is the estimate that
you could use to maximize the likelihood of the observation that we
got", in effect believing your data even though you might be (probably
are) overfitting. It means you're not doing any smoothing of your
distribution.

So what do you want to calculate? If I understand your variable names
right, it looks like you're doing something sensible: the proportion
of the times the bigram appears out of the times the first word in the
bigram appears.

--
-- alexr

JAGANADH G

unread,

May 20, 2012, 3:25:36 AM5/20/12

to nltk-...@googlegroups.com

On Sun, May 20, 2012 at 10:43 AM, Alex Rudnick <alex.r...@gmail.com> wrote:

Hey Jaganadh,

So "maximum likelihood estimate" just means "what is the estimate that
you could use to maximize the likelihood of the observation that we
got", in effect believing your data even though you might be (probably
are) overfitting. It means you're not doing any smoothing of your
distribution.

So what do you want to calculate? If I understand your variable names
right, it looks like you're doing something sensible: the proportion
of the times the bigram appears out of the times the first word in the
bigram appears.

Hi Alex

If I apply smoothing it will be like this right:

mle[bigram] = (big_freq[bigram] + 1) / (uni_freq[bigram.split(" ")[0]] + len(set(words)))

I came across the MLE algo in the book Natural Language Processing with Perl and Prolog. Just I was trying to make a quick practice on the same with Python. Dies my implementation correct or not . (I am weak in stat :-( that is why I asked )

Best regards

Jagan

On Fri, May 18, 2012 at 12:09 PM, JAGANADH G <jaga...@gmail.com> wrote:
> Hi All
>
> How can we find the maximum Maximum Likelihood Estimation any example ?
>
> Does the below given code correct to get the estimate ?
>
> mle[bigram] = big_freq[bigram] / uni_freq[bigram.split(" ")[0]]
>

Alex Rudnick

unread,

May 20, 2012, 10:30:25 PM5/20/12

to nltk-...@googlegroups.com

On Sun, May 20, 2012 at 12:25 AM, JAGANADH G <jaga...@gmail.com> wrote:
> If I apply smoothing it will be like this right:
> mle[bigram] = (big_freq[bigram] + 1) / (uni_freq[bigram.split(" ")[0]] +
> len(set(words)))

That's one way to do smoothing, yes! (there are others)

> I came across the MLE algo in the book Natural Language Processing with Perl
> and Prolog. Just I was trying to make a quick practice on the same with
> Python. Dies my implementation correct or not . (I am weak in stat :-( that
> is why I asked )

It seems like it might be correct! A little more explanation about
what you're trying to do and why might help, though!

Cheers,

--
-- alexr

JAGANADH G

unread,

May 21, 2012, 2:14:22 PM5/21/12

to nltk-...@googlegroups.com

It seems like it might be correct! A little more explanation about
what you're trying to do and why might help, though!

Cheers,

Hi Alex

The purpose

1) is practice equations and algo to working code

2) Find out patterns from short text like tweets; which can be related to a particular topic/concept