Bayesian Network model with mixed variable types using brms

337 views
Skip to first unread message

ViktorR

unread,
Oct 2, 2017, 9:51:20 AM10/2/17
to brms-users
Hi,

First of all, thank you so much Paul-Christian for developing brms. As a newbie to Bayesian statistics, stan and MLM, the package is very helpful in facilitating the learning process. Originally starting with JAGS, I'd like to move over to brms/stan for my Bayesian Network model with mixed variable types (continuous/discrete).
From reading the different vignettes of brms and the stan manual, I think it should be possible to parametrize such a network in brms but I'm having a hard-time conceptualizing it. Maybe someone here has tried something similar and can point me into the right direction.

In a Bayesian Network the network structure and the directed graphs define the (in)dependencies between random variables. In my example the different nodes are connected through regression models. Figure 1 shows the structure of my network (see attachment). My dependent variable rloss represents a relative loss proportion ranging from 0 (no loss) to 1 (completely destroyed). The data is sort of beta-distributed with a high probability density close to 0 (see Figure 2). In addition there are about 10% of cases in my dataset which are exactly zero. From reading Andrew Gelman's suggestion in this post in the stan user group, I could first model a latent variable 'dam' that represents the probability of rloss being either 0 or some value different from 0 using logistic regression. Another option is using a mixture model. For that I'm re-designing rloss, where I add 1€ to each absolute loss value before normalizing it with the total value for each case to get the proportion for rloss. By doing that, I'm getting rid of cases that are exactly zero and I can log-transform rloss. After the log transformation, I get two normal distributions, where one distribution represents the cases that originally ran from >0 to 1 and the ones that are 0 (see Figure 3). I've implemented the latter in a brms model, which seems to work fine for two predictors (wst and con) for the distributions and one predictor for proportion of the mixture. However, the model doesn't converge when I add the variable 'd' for predicting the distribution and the mixture.

Questions:

1) Given the structure of my response variable, which of the two options (logistic regression or mixture model) would make more sense to implement using brms?

2) Is there a good way to put the "hierachical" or stacked structure of the network model in brms by combining multiple regression models? E.g. in the current example: model wst through a regression with rfi and then use wst to model rloss etc.

3) When making predictions for rloss, I might not have data or evidence for all predictors (nodes in the network). Is there a way in brms to make predictions with incomplete observations by using the underlying distribution function for the missing data point?


library(brms)

#data
> head(data)
   rfi  wst  d     log_d con dam        rloss  log_rloss
1 45.1 -231  2 0.6931472   0   1 2.726096e-03  -5.904885
2 58.6    5 48 3.8712010   1   1 1.963865e-03  -6.232841
3 41.3 -166  6 1.7917595   1   1 1.696065e-02  -4.076859
4 58.6 -158  7 1.9459101   0   0 1.990050e-06 -13.127351
5 45.1  170 24 3.1780538   1   1 9.782640e-03  -4.627146
6 60.8 -243  2 0.6931472   0   1 3.108501e-03  -5.773615

> summary(data)
      rfi              wst               d              log_d       con    
 Min.   : 29.40   Min.   :-247.0   Min.   :  1.00   Min.   :0.000   0:322  
 1st Qu.: 54.05   1st Qu.:  10.0   1st Qu.:  5.00   1st Qu.:1.609   1:109  
 Median :105.30   Median :  51.0   Median : 12.00   Median :2.485          
 Mean   :102.17   Mean   :  30.5   Mean   : 27.66   Mean   :2.507          
 3rd Qu.:139.70   3rd Qu.:  96.0   3rd Qu.: 24.00   3rd Qu.:3.178          
 Max.   :221.80   Max.   : 294.0   Max.   :840.00   Max.   :6.733          
      dam             rloss             log_rloss      
 Min.   :0.0000   Min.   :0.0000009   Min.   :-13.952  
 1st Qu.:1.0000   1st Qu.:0.0013709   1st Qu.: -6.592  
 Median :1.0000   Median :0.0071715   Median : -4.938  
 Mean   :0.9118   Mean   :0.0179951   Mean   : -5.696  
 3rd Qu.:1.0000   3rd Qu.:0.0205136   3rd Qu.: -3.887  
 Max.   :1.0000   Max.   :0.3183683   Max.   : -1.145 

#mixture model

mix_rloss <- mixture(gaussian, gaussian)
prior_rloss <- c(
  prior(normal(-14, 10), Intercept, dpar = mu1),
  prior(normal(-5, 10), Intercept, dpar = mu2))

fit_rloss <- brm(bf(log_rloss ~ wst + con,  theta2 ~ wst),
            data=data, family = mix_rloss, prior = prior_rloss,
            inits = 0, chains = 2, iter=3000)

#does not converge:
fit_rloss1 <- brm(bf(log_rloss ~ wst + con + log_d,  theta2 ~ wst + log_d),
            data=data, family = mix_rloss, prior = prior_rloss,
            inits = 0, chains = 2, iter=3000)

pp_check(fit_rloss) #see Figure_4 in the attachment
> summary(fit_rloss)
 Family: mixture(gaussian, gaussian) 
Formula: log_rloss ~ wst + con 
         theta2 ~ wst
   Data: raw (Number of observations: 431) 
Samples: 2 chains, each with iter = 3000; warmup = 1500; thin = 1; 
         total post-warmup samples = 3000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Population-Level Effects: 
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
mu1_Intercept      -12.94      0.08   -13.10   -12.77       2675 1.00
mu2_Intercept       -5.24      0.10    -5.43    -5.03       3000 1.00
theta2_Intercept     2.33      0.17     2.01     2.69       2659 1.00
mu1_wst              0.00      0.00    -0.00     0.00       3000 1.00
mu1_con1            -0.18      0.22    -0.62     0.27       2172 1.00
mu2_wst              0.00      0.00     0.00     0.00       3000 1.00
mu2_con1             0.59      0.19     0.20     0.96       2260 1.00
theta2_wst           0.00      0.00     0.00     0.01       3000 1.00

Family Specific Parameters: 
       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma1     0.47      0.06     0.37     0.60       2470 1.00
sigma2     1.63      0.06     1.52     1.74       2703 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

> summary(fit_rloss1)
 Family: mixture(gaussian, gaussian) 
Formula: log_rloss ~ wst + con + log_d 
         theta2 ~ wst + log_d
   Data: raw (Number of observations: 431) 
Samples: 2 chains, each with iter = 3000; warmup = 1500; thin = 1; 
         total post-warmup samples = 3000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Population-Level Effects: 
                   Estimate  Est.Error     l-95% CI    u-95% CI Eff.Sample Rhat
mu1_Intercept    2691973.17 4472441.77   -897613.26 13400546.72          3 2.56
mu2_Intercept         -6.48       0.53        -7.45       -5.69          1 2.41
theta2_Intercept     175.37     204.51         1.37      598.19          1 2.10
mu1_wst          -131184.73  163061.46   -476645.10        0.00          1 2.63
mu1_con1         4091472.60 8892720.32 -14036559.30 22885133.19          5 1.38
mu1_log_d         109411.50  287674.89   -365882.46   853932.49         11 1.11
mu2_wst                0.00       0.00         0.00        0.01          1 1.89
mu2_con1               0.68       0.32         0.15        1.36          2 1.42
mu2_log_d              0.34       0.09         0.17        0.55         90 1.03
theta2_wst            -0.02       0.30        -0.77        0.63        277 1.00
theta2_log_d           2.82      30.40       -64.02       74.25        209 1.02

Family Specific Parameters: 
       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma1     5.89     10.40     0.38    32.91          2 1.22
sigma2     2.13      0.54     1.50     2.82          1 8.63

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 1336 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.



Thanks,
Viktor
Figure_1.png
Figure_2.png
Figure_3.png
Figure_4.png

Paul Buerkner

unread,
Oct 2, 2017, 10:58:24 AM10/2/17
to ViktorR, brms-users
Hi Viktor,

1) have you thought of using the zero_inflated_beta family for your data?

2) brms is not yet able to fit multivariate models except for multivariate normal ones, but this will be implemented in the future.

3) This is not yet possible in brms.

Given the kind of model you want to fit, I am not sure if brms should be your package of choice. Maybe you can start with the Stan code brms provides and build your own Stan model based on that.

--
You received this message because you are subscribed to the Google Groups "brms-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+unsubscribe@googlegroups.com.
To post to this group, send email to brms-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/fbdbd9cc-2155-4b53-8f1a-584cb30c82b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ViktorR

unread,
Oct 2, 2017, 11:40:56 AM10/2/17
to brms-users
Thanks for your quick response, Paul! I will play around with the zero_inflated_beta family and move over to stan for the multivariate extension of the model. This would be a really cool feature for a future version of brms. There's a package called HydeNet in R which does something similar using JAGS.
To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+...@googlegroups.com.

Paul Buerkner

unread,
Oct 2, 2017, 3:58:07 PM10/2/17
to ViktorR, brms-users
Full Multivariate models will be the central feature of brms 2.0. Hopefully I can get it done end of this year.

To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+unsubscribe@googlegroups.com.

To post to this group, send email to brms-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages