Multivariate Multilevel Negative Binomial Model

208 views

Skip to first unread message

Jason Tilley

unread,

Jan 30, 2016, 5:44:15 PM1/30/16

to stan-...@googlegroups.com

I have made some progress on my multivariate multilevel negative binomial model. I thought I'd share what I have and see if there are any suggestions. I have attached an html of the iPython notebook if you'd rather look at that.

Multivariate Multilevel Negative Binomial Model

Here is what I have for my model so far. First make any imports and create a Python function for STAN's negative binomial formulation:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pystan
from scipy.special import binom

def stan_negbin(theta, alpha, beta):
    term1 = binom((theta + alpha - 1),(alpha - 1))
    term2 = (beta / (beta + 1)) ** alpha
    term3 = (1 / (beta + 1)) ** theta
    x = term1 * term2 * term3
    return x

Now let's create some fake data. EDIT: I have fixed a few mistakes and lowered the sample size compared to my earlier code.

# fake data
N = 60 # number per groups (must be factor of ten)
G = 3 # number of groups
# array of temperatures
base_temp = 25
# temperature for group 1
temp_1 = np.random.normal(base_temp - 4, 0.1, N)
# temperature for group 2
temp_2 = np.random.normal(base_temp, 0.1, N)
temp_3 = np.random.normal(base_temp + 4, 0.1, N)
temp = np.append(temp_1, temp_2)
temp = np.append(temp, temp_3)

# array of hours past sunrise (10 per day)
hours = np.tile(np.arange(0, 10), N * G / 10)

prob_param = temp * 0.1 + hours * 0.5

prey_1 = np.empty(N * G)
prey_2 = np.empty(N * G)

for i in range(0, N * G):
    # number of prey 1
    prey_1[i] = np.random.choice(100, 1, p=stan_negbin(np.arange(100),
                                                       prob_param[i],
                                                       2.2))
    # number of prey 2
    prey_2[i] = np.random.choice(100, 1, p=stan_negbin(np.arange(100),
                                                       prob_param[i] - 0.1,
                                                       2.2))

group = np.append(np.ones(N), np.ones(N) + 1).astype(int)
group = np.append(group, np.ones(N) + 2).astype(int)

y = np.matrix([list(prey_1), list(prey_2)]).T
x = np.matrix([list(temp), list(hours)]).T

Now let's look at the distibution of prey item 1 for each temperature group.

ax1 = plt.subplot(311)
ax1.hist(prey_1[group==1], np.arange(0,11,1))
ax1.annotate('temp=21', xy=(6,16))
ax1.set_ylim(0,25)

ax2 = plt.subplot(312)
ax2.hist(prey_1[group==2], np.arange(0,11,1))
ax2.annotate('temp=25', xy=(6,16))
ax2.set_ylim(0,25)

ax3 = plt.subplot(313)
ax3.hist(prey_1[group==3], np.arange(0,11,1))
ax3.annotate('temp=29', xy=(6,16))
ax3.set_ylim(0,25)

Good. Feeding success is showing a slightly increasing trend with temperature. Now let's look at the distibution of prey item 1 (sums) over time.

num_prey = np.empty(len(np.unique(hours)))
for i in range(0,len(num_prey)):
    num_prey[i] = prey_1[hours==i].sum()
    
plt.bar(np.arange(10), num_prey)

Once again, the data is behaving as intended. As the day progresses, prey items are accumulating in the predator's stomach. Now let's prepare the PyStan model for the data. This is done by creating a dictionary of variables to pass into the data block of the STAN code:

model_dat = ({'N':len(y),
             'X': np.shape(x)[1],
             'Y': np.shape(y)[1],
             'L': len(np.unique(group)),
             'y': y.astype(int),
             'x': x,
             'random': group,
             })

Now you can setup the STAN model code. This is beyond the scope of this thread. For more information see: https://github.com/stan-dev/stan/releases/download/v2.9.0/stan-reference-2.9.0.pdf.

model_code = """
    data {
      int<lower=0> N; //rows of data
      int<lower=1> X; // number of covariates
      int<lower=1> Y; // number of responses
      int<lower=1> L; // number of groups
      // vector[X] x[N] is an array with N elements, each of which is a vector of X elements,
      vector[X] x[N]; // design matrix (predictors)
      int<lower=0> y[N,Y]; // response (outcomes)
      int<lower=1, upper=L> random[N];
    }
    transformed data {
      vector[Y] zeros;
      vector[L] zeros_u;
      
      zeros <- rep_vector(0, Y);
      zeros_u <- rep_vector(0, L);
    }
    parameters {
      vector[Y] intercept; // Population intercept
      matrix[Y,X] beta; // Population slope parameters
      cholesky_factor_corr[Y] omega;
      cholesky_factor_corr[L] omega_u;
      vector<lower=0>[Y] sigma; // population standard deviation
      matrix[L,Y] u; // random effects
      vector<lower=0>[L] sigma_u; // random effects SD
    }
    transformed parameters {      
      // mu has N rows and Y columns
      matrix[Y,N] mu;
      
      // should be random effects for each y at each station
      matrix[L,Y] beta_0j; // random intercepts
      
      // negative binomial beta parameter for each group and response
      matrix<lower=0>[L,Y] NB_beta;
      // negative binomial alpha parameter for each group and response
      matrix<lower=0>[L,Y] alpha;
      
      //Varying intercepts definition
      for (i in 1:Y) {
          for (l in 1:L) {
            beta_0j[l,i] <- intercept[i] + u[l,i];
          }
      }
      
      // each response may have different slopes and intercepts
      for (i in 1:Y) { // for each response
        for (n in 1:N) { // for each individual
          mu[,n] <- beta * x[n] + beta_0j[random[n],i];
        }
      }
      
      // group-response - I have to do this manually at the moment
      NB_beta[1,1]<-exp(mean(mu[1,1:60])); // log link
      alpha[1,1] <- NB_beta[1,1] * mean(mu[1,1:60]);
      NB_beta[2,1]<-exp(mean(mu[1,61:120])); // log link
      alpha[2,1] <- NB_beta[2,1] * mean(mu[1,61:120]);
      NB_beta[3,1]<-exp(mean(mu[1,121:180])); // log link
      alpha[3,1] <- NB_beta[3,1] * mean(mu[1,121:180]);
      NB_beta[1,2]<-exp(mean(mu[2,1:60])); // log link
      alpha[1,2] <- NB_beta[1,2] * mean(mu[2,1:60]);
      NB_beta[2,2]<-exp(mean(mu[2,61:120])); // log link
      alpha[2,2] <- NB_beta[2,2] * mean(mu[2,61:120]);
      NB_beta[3,2]<-exp(mean(mu[2,121:180])); // log link
      alpha[3,2] <- NB_beta[3,2] * mean(mu[2,121:180]);
    }
    model {
      matrix[Y,Y] Sigma;
      matrix[L,L] Sigma_u;
      
      to_vector(beta) ~ normal(0, 2);
      
      Sigma <- diag_pre_multiply(sigma, omega);
      Sigma_u <- diag_pre_multiply(sigma_u, omega_u);
      
      // Priors
      for (i in 1:Y)
          u[,i] ~ multi_normal_cholesky(zeros_u, Sigma_u);
          
      intercept ~ multi_normal_cholesky(zeros, Sigma);
      omega ~ lkj_corr_cholesky(4);
      omega_u ~ lkj_corr_cholesky(4);
      sigma ~ normal(0,2);
      sigma_u ~ normal(0,2);
      for (i in 1:Y) { // for each response
        for (l in 1:L) { // for each group
          alpha[l,i] ~ gamma(1,1);
          NB_beta[l,i] ~ gamma(1,1);
        }
      }

      // Data Model
      for (i in 1:Y) // for each covariate
          y[,i] ~ neg_binomial(alpha[random,i], NB_beta[random,i]);
    }
    """

Now finally we can run the model.

fit = pystan.stan(model_code=model_code, data=model_dat,
                  iter=1000, chains=2,
                  model_name='diet_model',
                  sample_file='/Users/<username>/Desktop/sample')

Now lets look at the fit... I've omitted the mu values.

fit

Inference for Stan model: diet_model_ed20cd8ba6c15435e436ff2e4dad6d32.
2 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=1000.

               mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
intercept[0]    0.4     0.2   2.28  -3.29  -0.66   0.04   0.86   7.15  131.0   1.02
intercept[1]    0.1    0.13   1.83  -4.14  -0.67   0.03   0.89    4.1  205.0    1.0
beta[0,0]      0.02    0.02   0.15  -0.24  -0.07   0.01   0.11   0.34   98.0   1.01
beta[1,0]      0.02    0.01   0.15  -0.26  -0.07   0.02   0.11   0.35   99.0   1.01
beta[0,1]      0.15    0.09   0.85  -1.71  -0.35   0.18   0.71   1.79   89.0   1.01
beta[1,1]      0.18    0.09   0.84  -1.69  -0.32   0.23    0.7   1.81   92.0   1.01
omega[0,0]      1.0     0.0    0.0    1.0    1.0    1.0    1.0    1.0 1000.0    nan
omega[1,0]   3.1e-4    0.02   0.34  -0.61  -0.26 2.6e-3   0.24   0.64  279.0    1.0
omega[0,1]      0.0     0.0    0.0    0.0    0.0    0.0    0.0    0.0 1000.0    nan
omega[1,1]     0.94  4.7e-3   0.08   0.73   0.91   0.97   0.99    1.0  259.0    1.0
omega_u[0,0]    1.0     0.0    0.0    1.0    1.0    1.0    1.0    1.0 1000.0    nan
omega_u[1,0]   0.03    0.02   0.33  -0.62  -0.21   0.04   0.28   0.64  334.0    1.0
omega_u[2,0] 7.0e-3    0.02   0.33  -0.61  -0.25   0.02   0.24   0.64  302.0    1.0
omega_u[0,1]    0.0     0.0    0.0    0.0    0.0    0.0    0.0    0.0 1000.0    nan
omega_u[1,1]   0.94  5.7e-3   0.08   0.72   0.92   0.97   0.99    1.0  213.0    1.0
omega_u[2,1]   0.02    0.02   0.31  -0.54  -0.21-5.3e-4   0.23   0.66  334.0    1.0
omega_u[0,2]    0.0     0.0    0.0    0.0    0.0    0.0    0.0    0.0 1000.0    nan
omega_u[1,2]    0.0     0.0    0.0    0.0    0.0    0.0    0.0    0.0 1000.0    nan
omega_u[2,2]   0.88  6.8e-3    0.1   0.61   0.84   0.91   0.96    1.0  235.0   1.01
sigma[0]       1.67     0.1    1.3   0.05   0.63   1.37   2.39   5.04  180.0   1.01
sigma[1]       1.56    0.07   1.12    0.1   0.67   1.32   2.27   4.22  237.0    1.0
u[0,0]         0.01    0.12   1.62  -3.57  -0.56 9.9e-3   0.53   3.57  191.0    1.0
u[1,0]        -0.01    0.09   1.28  -2.79  -0.49  -0.01    0.4   3.14  217.0    1.0
u[2,0]         0.05     0.1   1.39  -3.04  -0.45   0.02    0.6   3.14  210.0    1.0
u[0,1]         0.24    0.09   0.89  -1.63  -0.17   0.12   0.61   2.59  108.0   1.02
u[1,1]        -0.06    0.07   0.69  -1.61  -0.35  -0.04   0.22    1.5  108.0   1.01
u[2,1]         0.07     0.1   0.93  -1.97  -0.28   0.04   0.43   2.19   94.0    1.0
sigma_u[0]     1.24    0.09   1.02   0.05   0.48   0.98   1.75   3.91  118.0   1.01
sigma_u[1]     1.09    0.07   0.94   0.04   0.35   0.84    1.5   3.45  186.0    1.0
sigma_u[2]     1.19    0.07   0.99   0.08   0.43   0.94   1.69   3.94  176.0    1.0
beta_0j[0,0]   0.42    0.23   2.76   -4.4   -1.0   0.11   1.38   7.27  150.0   1.01
beta_0j[1,0]   0.39    0.21   2.65  -4.07  -0.95   0.03   1.38    7.1  160.0   1.01
beta_0j[2,0]   0.46    0.22   2.72  -4.11  -1.11   0.16   1.46   7.39  148.0   1.01
beta_0j[0,1]   0.34    0.14   1.88  -3.63  -0.59   0.25   1.28   4.27  171.0   1.01
beta_0j[1,1]   0.04    0.14   1.92  -4.11  -0.88  -0.07    1.0    4.2  187.0    1.0
beta_0j[2,1]   0.17    0.16   2.13   -4.3  -0.89   0.12   1.24   4.89  175.0    1.0
NB_beta[0,0]   4.62    0.03   0.58    3.6   4.21   4.58   4.99   5.88  334.0    1.0
NB_beta[1,0]   3.76    0.02   0.43    3.0   3.46   3.74   4.05   4.69  334.0    1.0
NB_beta[2,0]   4.74    0.03   0.61   3.64   4.33   4.71   5.12   6.01  334.0    1.0
NB_beta[0,1]   5.08    0.04   0.66   3.85   4.62   5.05    5.5   6.44  334.0    1.0
NB_beta[1,1]   4.12    0.03   0.46    3.3   3.78   4.08    4.4   5.11  323.0    1.0
NB_beta[2,1]   5.15    0.04   0.65   4.02   4.67   5.12   5.58   6.51  334.0    1.0
alpha[0,0]      7.1    0.08   1.47    4.6   6.06   6.97   8.03  10.41  332.0    1.0
alpha[1,0]     5.02    0.06   1.01    3.3    4.3   4.94   5.66   7.24  334.0    1.0
alpha[2,0]     7.41    0.09   1.56   4.71   6.34   7.29   8.37  10.79  334.0    1.0
alpha[0,1]      8.3    0.09   1.73   5.18   7.07   8.17   9.37  11.98  334.0    1.0
alpha[1,1]     5.85    0.06   1.11   3.93   5.03   5.73   6.51   8.32  322.0    1.0
alpha[2,1]     8.49    0.09   1.73   5.59    7.2   8.35   9.58  12.19  334.0    1.0
lp__         -807.3    0.35   4.03 -815.5 -810.1 -807.4 -804.5 -799.3  133.0   1.01

Samples were drawn using NUTS(diag_e) at Mon Feb  1 14:44:47 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

I believe this is what I'm after in terms of parameterization. Also, the fact that the negative binomial parameters are different for each prey type and group is promising. The predicted values match fairly close with the actual data. The blue bars represent the actual data, the red lines indicate the predicted distributions.

alpha = fit.extract('alpha')['alpha'].mean(0)
beta = fit.extract('NB_beta')['NB_beta'].mean(0)

prey = [prey_1, prey_2]
for i in range(0,model_dat['Y']): # prey
    for j in range(0,model_dat['L']): # group
        NB_beta = beta[j,i]
        NB_alpha = alpha[j,i]
        a = stan_negbin(np.arange(0,10), NB_alpha, NB_beta) * N
        plt.plot(np.arange(0,10) + 0.5, a, color='red')
        binned = np.histogram(prey[i][group == j + 1], np.arange(0,10))
        plt.bar(np.arange(0,9), binned[0])
        title = 'prey = ' + str(i + 1) + ' group = ' + str(j + 1)
        plt.suptitle(title)
        plt.show()

As you can see, the predicted distributions match pretty well. I obviously have to do some sensitivity analysis and come up with trace plots and verify that the model is indeed specified correctly. There are possibly bad priors or bad assumptions. The code also runs very slowly on my old laptop, so there are likely many optimizations that could be made. I feel a model of this type has great utility in many fields, and I would love to have a general example of this type of model. I am obviously very new to this, but I welcome any criticisms, fixes, optimizations, comments, or suggestions. I appreciate all the help I have recieved from the group over the last few weeks.

Message has been deleted

Jason Tilley

unread,

Feb 1, 2016, 4:52:34 PM2/1/16

to Stan users mailing list

I believe I might have my model working correctly now, I have updated the original post. I welcome any feedback. It was obviously a hack to get the individual parameters for the negative binomial distribution for each prey and group. I hope I can learn more about array indexing (or maybe vectorization?) in STAN and have it do this for me automatically. I have a lot of work ahead in terms of sensitivity analysis, traceplots, and generating statistics, but I feel this model must be close to the correct specification.

Christopher Peterson

unread,

Feb 1, 2016, 5:08:10 PM2/1/16

to Stan users mailing list

This is interesting. I've been investigating a number of possible approaches for multivariate ecological count data, and I'll be interested to see how this works out. Do the individual-level random effects (the u matrix) allow for negative correlations among species? I've read that this approach has a few difficulties with that. Thanks for taking the time to put this up.

I've been working on a similarly motivated model (Poisson for now, but with eventual extension to NB) using copulas, though it is a bit difficult to get them to work in stan.

Jason Tilley

unread,

Feb 1, 2016, 5:49:01 PM2/1/16

to Stan users mailing list

Thanks. Hopefully it's right! I too have hopes to use this with ecological data (diets in my case). I think it would be valuable to evaluate factors driving community compositions as well. Do you know any references to a multivariate mixed NB models used in ecology? I would love to see them, and perhaps give a short discussion of them in my manuscript. My limited research could not find any in fisheries ecology. I'm not sure how to answer your question about negative correlations. I mean I understand that the presence of certain species might be negatively correlated, but in terms of this model I don't know. Maybe if you included the abundance of a certain species as a covariate instead of a response? You probably understand much more about modeling than me. I have only moved past simple ANOVA in the past couple years. But what I have learned, is just how flexible STAN is. I imagine you could build more into it to do whatever you want. It seems if you can dream it, you can build it. Now I'm hoping this will work with my real world data. I have a manuscript that is sitting somewhere between rejected and accepted with major revisions. One of the main points of contention was that my chi-squared did not control for the spatial dependence in my unbalanced data. It was a fair point, and one I have taken seriously. Quite a valuable mistake to make in retrospect. I'm very happy to have discovered STAN.

Jason

Maxwell Joseph

unread,

Feb 2, 2016, 8:02:15 PM2/2/16

to Stan users mailing list

FWIW Here's an example of a zero inflated multivariate Poisson model that uses Stan for counts/distributions of free living and symbiotic species: https://peerj.com/preprints/1502/

Reply all

Reply to author

Forward

0 new messages