Weighting techniques in PyMC

418 views
Skip to first unread message

Postman

unread,
Sep 23, 2010, 10:04:07 AM9/23/10
to PyMC
I'm new to the group and am doing some sports modeling using pymc. I
was able to replicate the WinBUGS code in this paper:
http://www.statistica.it/gianluca/BaioBlangiardo.pdf but want to
extend it a little further. What they did was model the Italian Serie
A 1991-1992 soccer (football) league using a simple Poisson model. But
this fits the entire season. What I want to do is apply a weighting
vector to the games so that I weight the most recent games more
heavily than the ones int he past. This is quite common and I have
this in another multinomial logit model (also in Python).

My code is below (let me know if you need the data). Can someone help
me understand how to apply a weighting scheme (vector) to my model?

Thanks in advance,
Postman

----------------------
Python code:
# prior on the home effect
home = Normal('home', mu=0.0, tau=precision, value=0.0)

# priors on the random effects
mu_att = Normal('mu_att', mu=0.0, tau=0.0001, value=0.0)
mu_def = Normal('mu_def', mu=0.0, tau=0.0001, value=0.0)
tau_att = Gamma('tau_att', 0.01, 0.01)
tau_def = Gamma('tau_def', 0.01, 0.01)

att_star = Normal('att_star', mu=mu_att, tau=tau_att,
value=np.zeros(nteams))
def_star = Normal('def_star', mu=mu_def, tau=tau_def,
value=np.zeros(nteams))

# Trick to get sum of attack, defense = 0
attack = Lambda('attack', lambda att_star=att_star: att_star -
np.mean(att_star))
defense = Lambda('defense', lambda def_star=def_star: def_star -
np.mean(def_star))

theta1 = Lambda('theta1', lambda home=home, attack=attack,
defense=defense: np.exp(home + attack[home_team] +
defense[away_team]))
theta2 = Lambda('theta2', lambda attack=attack, defense=defense:
np.exp(attack[away_team] + defense[home_team]))

y1 = Poisson('y1', mu=theta1, value=home_goals, observed=True)
y2 = Poisson('y2', mu=theta2, value=away_goals, observed=True)

Chris Fonnesbeck

unread,
Sep 24, 2010, 10:39:06 AM9/24/10
to PyMC
On Sep 23, 9:04 am, Postman <postman...@gmail.com> wrote:
> I'm new to the group and am doing some sports modeling using pymc. I
> was able to replicate the WinBUGS code in this paper:http://www.statistica.it/gianluca/BaioBlangiardo.pdfbut want to
> extend it a little further. What they did was model the Italian Serie
> A 1991-1992 soccer (football) league using a simple Poisson model. But
> this fits the entire season. What I want to do is apply a weighting
> vector to the games so that I weight the most recent games more
> heavily than the ones int he past. This is quite common and I have
> this in another multinomial logit model (also in Python).
>
> My code is below (let me know if you need the data). Can someone help
> me understand how to apply a weighting scheme (vector) to my model?
>
> Thanks in advance,
> Postman
>

Hi Postman,

Its not clear how a weighting scheme would work here, as I do not see
any sort of function that aggregates or integrates over the entire
season. Are you trying to use this to predict future games? Perhaps an
autoregressive or ARIMA model would be most appropriate, if so.

cf

Postman

unread,
Sep 24, 2010, 11:13:39 AM9/24/10
to PyMC
Chris,

Thanks for the quick response. Why don't I start with a simple example
using linear regression. In the examples there is a model called
"Simple Pricing Model". I have copied it below, and added a simple
weighting function. I want to solve the simple equation y = a + b*x,
but I want to weight it with the weight array I defined below (where
there is almost no weight on the first x,y pair and full weight on the
last (linear slope in between). In R I would simply call the function
lm by:

results = lm(price~age, weight=weight)

This will minimize the Sum(weights * Residuals**2). I could code this
from scratch where I find the weighted mean for age & price, and then
find the residuals, then multiply the weights by the residuals. This
is quite common in regression and we use it all the time to weight
more recent events more heavily than old ones.

So how could I modify the code below to incorporate the weights array
I defined? I thought using a Potential variable would do it, but I
can't find any examples anywhere. The text in the User's Guide helps
to explain the purpose of Potentials, but doesn't show how to use
them.

Thanks again,
Gary Post

______________________

from pymc import *
from numpy import *

# Data
age = array([13, 14, 14,12, 9, 15, 10, 14, 9, 14, 13, 12, 9, 10, 15,
11, 15, 11, 7, 13, 13, 10, 9, 6, 11, 15, 13, 10, 9, 9, 15, 14, 14, 10,
14, 11, 13, 14, 10])
price = array([2950, 2300, 3900, 2800, 5000, 2999, 3950, 2995, 4500,
2800, 1990, 3500, 5100, 3900, 2900, 4950, 2000, 3400, 8999, 4000,
2950, 3250, 3950, 4600, 4500, 1600, 3900, 4200, 6500, 3500, 2999,
2600, 3250, 2500, 2400, 3990, 4600, 450,4700])/1000.

weight = array([i/float(sum(range(len(age)))) for i in
range(len(age))])

# Constant priors for parameters
a = Normal('a', 0, 0.0001)
b = Normal('b', 0, 0.0001)

# Precision of normal distribution of prices
tau = Gamma('tau', alpha=0.1, beta=0.1)

@deterministic
def mu(x=age, a=a, b=b):
# Linear age-price model
return a + b*x

# Sampling distribution of prices
p = Normal('p', mu, tau, value=price, observed=True)


On Sep 24, 9:39 am, Chris Fonnesbeck <fonnesb...@gmail.com> wrote:
> On Sep 23, 9:04 am, Postman <postman...@gmail.com> wrote:
>
> > I'm new to the group and am doing some sports modeling using pymc. I
> > was able to replicate the WinBUGS code in this paper:http://www.statistica.it/gianluca/BaioBlangiardo.pdfbutwant to

Chris Fonnesbeck

unread,
Sep 24, 2010, 12:08:19 PM9/24/10
to py...@googlegroups.com
On Fri, Sep 24, 2010 at 10:13 AM, Postman <postm...@gmail.com> wrote:
> Chris,
>
> Thanks for the quick response. Why don't I start with a simple example
> using linear regression. In the examples there is a model called
> "Simple Pricing Model". I have copied it below, and added a simple
> weighting function. I want to solve the simple equation y = a + b*x,
> but I want to weight it with the weight array I defined below (where
> there is almost no weight on the first x,y pair and full weight on the
> last (linear slope in between). In R I would simply call the function
> lm by:
>
> results = lm(price~age, weight=weight)
>

OK, I see what you are getting at. What you need is to code a weighted
likelihood, as PyMC does not have any currently defined. This would
involve adding, for example, a weight term to the variance of the
normal likelihood. This is probably something PyMC should have, but
until now we have not had a call for weighted regression. Gelman
outlines an approach in this paper:
http://www.stat.columbia.edu/~gelman/research/published/aiepub.pdf

Again, though, if you are really interested in predicting scores based
on past results weighted by recency (as opposed to weights that are
not necessarily decreasing over time), it would be way easier to use
an autoregessive model, as it is easily coded in PyMC.

cf
--
Chris Fonnesbeck
Department of Biostatistics
Vanderbilt University School of Medicine, T-2303 MCN
1161 21st Avenue South, Nashville, TN  37232-2158
(615) 936-0317

Postman

unread,
Sep 24, 2010, 12:39:51 PM9/24/10
to PyMC
Thanks, Chris. As it appears coding a weighted likelihood function is
probably beyond my current skills, could you point me to some examples
using autoregression. I have used autoregression in other models, but
not in PyMC. And there don't seem to be a whole lot of examples out
there.

Cheers,
Gary

On Sep 24, 11:08 am, Chris Fonnesbeck <fonnesb...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages