ERGM Model degeneracy

871 views
Skip to first unread message

Esther Kukielka

unread,
Jul 14, 2016, 2:20:45 PM7/14/16
to Davis R Users' Group

Hello all,


I am working with a dataset collected from individual interviews other researchers gathered from Georgian farmers (Georgia, the country). Interviews were done at the farmer level, and included, among others, information about pig shipments between villages. Unfortunately, destination and origin of shipments were only recorded at the village level; thus, I collapsed farmers’ characteristics at the village level* as well in order to have a network with villages as nodes and pig shipments as edges. Besides, my network is not a complete one :'( . I had missing data in some of the variables and imputated it (so no NAs anymore).

[*] I transformed most of them into binomials by using the median as a cut off. 


Now I am trying to analyze the data with ergm, from the statnet package in R, with aims to predict probabilities of shipments (ties) between two villages (while accounting for village-characteristics and network structural ones).

When only using node-level (nodefactor(), nodecov()) and dyadic-level (nodematch(),nodemix()) predictors, everything goes ok. However, when trying to use relational/network (edgecov()) or structural (gwesp()) predictors, I get warnings (and nonsense results) or simple an error message stating that my model did not converged.


For example, when trying to use edgecov(), I get the following warning, basically telling me that my model is degenerated:


 mdist
<-ergm(netg.imp1~edges+ edgecov(netg.imp1, attr="dist"))
Evaluating log-likelihood at the estimate.
Warning messages:
1: In ergm.mple(Clist, Clist.miss, m, MPLEtype = MPLEtype, init = init,  :
  glm
.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
2: In ergm.mple(Clist, Clist.miss, m, MPLEtype = MPLEtype, init = init,  :
  glm
.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
3: glm.fit: fitted probabilities numerically 0 or 1 occurred

 

When trying to use gwesp(), I get the following:

mgwesp<-ergm(netg.imp1~edges+ gwesp(0.05,fixed=T),verbose=T,control=control.ergm(seed=33,MCMC.samplesize=10000))Evaluating network in model
Initializing Metropolis-Hastings proposal(s): ergm:MH_TNT
Initializing model.
Using initial method 'MPLE'.
Fitting initial model.
MPLE covariate matrix has
13 rows.
Fitting ERGM.
Starting maximum likelihood estimation via MCMLE:
Density guard set to 10000 from an initial count of 187  edges.
Iteration 1 of at most 20 with parameter:
           edges gwesp
.fixed.0.05
       
-5.422733         2.074192
Sampler accepted  91.095% of 10240000 proposed steps.
Sample size = 10000 by 10000
Back from unconstrained MCMC. Average statistics:
           edges gwesp
.fixed.0.05
       
-24.79410        -12.08144
Average estimating equation values:
           edges gwesp
.fixed.0.05
       
-24.79410        -12.08144
is.inCH: iter= 1, inside hull.
iter
= 1, est=1.000000, low=1.000000, high=1.000000, test=1.
Calling MCMLE Optimization...
Using Newton-Raphson Step with step length  1  ...
Using lognormal metric (see control.ergm function).
Using log-normal approx (no optim)
The log-likelihood improved by 7.077
Step length converged once. Increasing MCMC sample size.

 

Followed by kind of the same thing for 20 times and then:


 MCMLE estimation did
not converge after 20 iterations. The estimated coefficients may not be accurate. Estimation may be resumed by passing the coefficients as initial values; see 'init' under ?control.ergm for details.
 
Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
 
This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

 

 

mcmc.diagnostics(mgwesp) is not amazing and the gof is horrific.

 

Does anyone know how can I solve these two problems (both of which I blame on the model degeneracy issue)?

I have played around with MCMC.samplesize, MCMC.interval and the MCMLE.density.guard, but have not succeeded in making it work, so far...

 

This is the first time I post here, so apologies if I'm missing any rule for posting or if my explanation is unclear. I wanted to give a reproducible example, but as it is not an easy vector line, I was not sure if I should paste here all the dput(netg.imp1) outcome (which is LONG). 
Thanks for any suggestions!
Esther

Esther Kukielka

unread,
Jul 14, 2016, 2:24:03 PM7/14/16
to Davis R Users' Group
The packages/libraries:
  #install.packages('network')
               
#install.packages('ergm')
               
#install.packages('sna')
               
#install.packages('coda')
               
#install.packages('statnet')
               
#install.packages("intergraph")
                library
(network)
                library
(ergm)
                library
(sna)
                library
(coda)
                library
(statnet)
                library
(intergraph)

Esther Kukielka

unread,
Jul 14, 2016, 2:42:20 PM7/14/16
to Davis R Users' Group
I am now running a more complete model and not coming into this problem anymore...

m5b<- ergm(netg.imp1~edges+nodefactor ("region")+nodecov("betN")+nodecov("outdegN")+nodecov("indegN")+nodemix("ASFyn",base=1)+edgecov(netg.imp1, attr="dist"),control=control.ergm(seed=33))
summary
(m5b)
m5b$ mle
.lik

I will keep working on it...
e

Matt Espe

unread,
Jul 14, 2016, 7:23:45 PM7/14/16
to Davis R Users' Group
Hi,

Without really digging into depth with what this function is doing, I can tell you that the acceptance rate from the Metropolis-Hastings step is way too high. This suggests the algorithm does not cover much ground during the MCMC step and is not likely to converge quickly to the target distribution. There are a couple things you can do to help with this.

1) start the algorithm somewhat close to reasonable parameter values so it has to move less to converge to the target distribution.

2) increase the step-size so the algorithm 

3) increase the number of total steps allows beyond 20.

or, 4) there might be some pathology in your model (weird stuff in the target distribution) that is causing the algorithm to become stuck.

There are some options to control 1, 2, and 3 in the control.egrm() function. 4 can be more difficult to pick up.

More broadly, these black-box MCMC methods can really bite you - I am not sure if you are familiar with these types of algorithms. If not, there is a good primer in Gelman et al's Bayesian Data Analysis - Chapter 13ish? It is geared towards a full Bayesian analysis, but the techniques are pretty much the same. It might be a good idea to check it out so you are aware of how these functions work and generally how they can break.

Matt

Esther Kukielka

unread,
Jul 15, 2016, 1:58:26 PM7/15/16
to davi...@googlegroups.com
Thanks Matt!
Ill have a look at that book, I'm sure I need some more background on that side!
Best,
E

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to a topic in the Google Groups "Davis R Users' Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/davis-rug/Y3FSR5d0Zz0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to davis-rug+...@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages