Gumbel Copula estimation

Shanchi Zhang

unread,

Apr 6, 2017, 11:45:25 AM4/6/17

to Stan users mailing list

Hello, everyone,

I am a user for Rstan, Right now I encounter a problem regarding the Copula Bayesian estimation using Stan.

I try to estimate the right tail dependence between monthly losses to buildings (x) and losses to tenancies(y), the dataset is attached as data.csv (N is the total number of data, which is 132). And I select Gumbel copula to model the right tail dependence.

The Gumbel formula is as belows:

So I try to estimate lamda L in the above table.

For the Gumbel and Clayton copula density, I searched on the internet，

Ｃｌａｙｔｏｎ　copula　ｄｅｎｓｉｔｙ：

Gumbel copula density:

here both u and v are uniform distributions at [0,1], and delta is the parameter to be estimated (in my model I denote as "a").

Then I saw Ben's post at https://groups.google.com/forum/#!msg/stan-dev/E953tjdtg44/SRCUSHPB92YJ talking about the Clayton copula, and this gave me some intuition for writing the stan model of Gumbel.

In my own model, I think the only difference from Ben's post model is the gumbel density function (I believe is correct). The problem is, the final result for lamda L is extremely large, like 0.98, which is not reasonable.

Then I try to compare my Gｕmbel copula model and Clayton copula model, for the estimated Kendall’s tau. Because this is the common parameter that these two models share (no right tails dependence for Clayton and no left tail dependence for Gumbel).

As a reference, the kendalls’s tau directly calculated from dataset is 0.286.

Ｍｙ　ｇｕｍｂｅｌ　ｍｏｄｅｌ (a is the parameter in copula, and tau is the kendall’s tau. I assume lognormal marginal distribution and non-informative prior for tau)：

modelcode="
data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
  }
parameters{ 
  real mu1;
  real mu2;
  real<lower=0> sigma1;
  real<lower=0> sigma2;
  real<lower=-1,upper=1> tau; 
 
  }
transformed parameters{
 real<lower=1> a;
 a=1 / (1 - tau);
}
model{
 //priors
  x~lognormal(mu1,sigma1);
  y~lognormal(mu2,sigma2);

  //likelihood
  if(tau!=1){
  real p[2]; 
  p[1]= lognormal_cdf(x,mu1,sigma1);
  p[2]= lognormal_cdf(y,mu2,sigma2);
 
 
target+=-((-log(p[1]))^a+(-log(p[2]))^a)^(1/a)-log(p[1])-log(p[2])+(-2+2/a)*log((-log(p[1]))^a+(-log(p[2]))^a)+(a-1)*log(log(p[1])*log(p[2]))+log1p((a-1)*((-log(p[1]))^a+(-log(p[2]))^a)^(-1/a));
 
  } 
}

"

Then I run it:

fit=stan(model_code=modelcode,data=dat,iter=12000,warmup=2000,thin=10,chains=3)
print(fit)

result after 3000 post-warmup draws:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu1 3.55 0.00 0.04 3.48 3.53 3.55 3.58 3.63 2642 1
mu2 3.21 0.00 0.06 3.10 3.17 3.21 3.25 3.33 2487 1
sigma1 0.40 0.00 0.02 0.35 0.38 0.39 0.41 0.45 2724 1
sigma2 0.62 0.00 0.04 0.55 0.59 0.61 0.64 0.69 2758 1
tau 0.99 0.00 0.01 0.97 0.98 0.99 0.99 0.99 2537 1
a 80.01 0.53 26.36 31.63 58.95 80.43 101.40 124.81 2483 1
lp__ 173.33 0.03 1.67 169.19 172.40 173.67 174.57 175.60 2642 1

As you can see, the kendall's tau is 0.99, and the parameter a (or "delta" in the above formula) is too high, not reasonable!

But if using Clayton model(I also attached the model below, almost same with the Gumbel model above, except the log density) is 0.39, which seems reasonable.

The clayton copula model:

modelcode="
data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
  }
parameters{ 
  real mu1;
  real mu2;
  real<lower=0> sigma1;
  real<lower=0> sigma2;
  real<lower=-1,upper=1> tau; 
  }
transformed parameters{
 real<lower=1> a;
 a=2 * tau / (1 - tau);
}
model{
  // assume non-informative priors for tau
  x~lognormal(mu1,sigma1);
  y~lognormal(mu2,sigma2);
  //likelihood
  if(tau!=0){
  real p[2]; 
  p[1]= lognormal_cdf(x,mu1,sigma1);
  p[2]= lognormal_cdf(y,mu2,sigma2);
  target += log1p(a) - (a + 1) * log(p[1] * p[2]) - (2 * a + 1) / a * log(p[1] ^ -a + p[2] ^ -a - 1);
  } 
}
"

So it seems the tau getting from my Gumbel model is obviously not correct, I don’t know the reason and have been confused by days. Someone may say the Gumbel copula density is not correct? But I have checked it multiple times and couldn't see anything wrong.

Could you guys help me have a look?

Thanks a lot,

Jason Zhang

data.csv

Bob Carpenter

unread,

Apr 7, 2017, 1:54:32 PM4/7/17

to stan-...@googlegroups.com

I don't know anything about these models, and it's rather hard
to debug a wall of someone else's formulas. I'd suggest
writing your Gumbel density as a function then providing
some unit tests to make sure it's returning the right values.

You can call user-defined functions directly in RStan, or you'll
have to hack up tests in generated quantities.

- Bob

> On Apr 6, 2017, at 11:45 AM, Shanchi Zhang <zsc...@gmail.com> wrote:
>
> Hello, everyone,
>
>
>
> I am a user for Rstan, Right now I encounter a problem regarding the Copula Bayesian estimation using Stan.
>
> I try to estimate the right tail dependence between monthly losses to buildings (x) and losses to tenancies(y), the dataset is attached as data.csv (N is the total number of data, which is 132). And I select Gumbel copula to model the right tail dependence.
>
> The Gumbel formula is as belows:
>
>
>
>

> So I try to estimate lamda L in the above table.
>
> For the Gumbel and Clayton copula density, I searched on the internet，
>
> Ｃｌａｙｔｏｎ　copula　ｄｅｎｓｉｔｙ：
>
>
>

> Gumbel copula density:

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <data.csv>

Ben Goodrich

unread,

Apr 7, 2017, 3:48:06 PM4/7/17

to Stan users mailing list

On Thursday, April 6, 2017 at 11:45:25 AM UTC-4, Shanchi Zhang wrote:

I am a user for Rstan, Right now I encounter a problem regarding the Copula Bayesian estimation using Stan.

You need to define a function in the Stan language that implements the logarithm of the Gumbel copula (with appropriate spacing so that it can be read by a human), expose it to R, test that it is valid, and then call it in the model block of your Stan program. Then it works fine. See the attachments.

Ben

gumbel.stan

gumbel.R

data.csv

andre.p...@googlemail.com

unread,

Apr 8, 2017, 9:38:40 PM4/8/17

to Stan users mailing list

Ben,

excellent. I translated also the normal-copula, the frank and have the student-t copula on my list.

We should put all this really helpful things to a github.. If I do so and keep everything

under BSD or whatever license, do you allow to store your code examples with reference of

the owener(s) as well?

I'm not family with law.

Andre

Ben Goodrich

unread,

Apr 8, 2017, 10:21:02 PM4/8/17

to Stan users mailing list

You could make a pull request at

https://github.com/stan-dev/example-models

It is probably best to make a bivariate_copula/ directory under misc/.

Ben

Ben Goodrich

unread,

Apr 9, 2017, 11:07:29 AM4/9/17

to Stan users mailing list

On Saturday, April 8, 2017 at 9:38:40 PM UTC-4, andre.p...@googlemail.com wrote:

excellent. I translated also the normal-copula, the frank and have the student-t copula on my list.
We should put all this really helpful things to a github.. If I do so and keep everything
under BSD or whatever license, do you allow to store your code examples with reference of
the owener(s) as well?

Here is a numerically better version.

  /* Gumbel copula log density
   *
   * @param u Real number on (0,1], not checked but function will return NaN
   * @param v Real number on (0,1], not checked but function will return NaN
   * @param theta Real number >= 1, will throw otherwise
   * @param log density
   */
  real gumbel_copula(real u, real v, real theta) {
    real neg_log_u = -log(u); real log_neg_log_u = log(neg_log_u);
    real neg_log_v = -log(v); real log_neg_log_v = log(neg_log_v);
    real log_temp = log_sum_exp(theta * log_neg_log_u,
                                theta * log_neg_log_v);
    real theta_m1 = theta - 1;
    if (theta < 1) reject("theta must be >= 1");
    if (is_inf(theta)) {
      if (u == v) return 0;
      else return negative_infinity();
    }
    return theta_m1 * log_neg_log_u + theta_m1 * log_neg_log_v
    + neg_log_u + neg_log_v - exp(log_temp / theta)
    + log_sum_exp( 2 * theta_m1  / -theta * log_temp, log(theta_m1) + 
                  (1 - 2 * theta) / theta * log_temp);
  }

Ben

andre.p...@googlemail.com

unread,

Apr 9, 2017, 12:09:14 PM4/9/17

to stan-...@googlegroups.com

  real normal_copula(real u, real v, real rho) {
    return (0.5*rho*(-2. *u*v + 1. *square(u)*rho + 1. *square(v)*rho)) /
      (-1. + square(rho)) - 0.5*log(1 - square(rho));
  }

  real clayton_copula(real u, real v, real theta) {
    if(theta == 0.0) 
      reject("clayton_copula: theta must != 0");
     return log1p(theta) 
            - (theta + 1) * (log(u) + log(v))
            - (1 + 2 * theta) / theta 
              * log(pow(u, -theta) + pow(v, -theta) - 1);
  }
  
  real frank_copula(real u, real v, real theta) {
    return log(theta) + log1m(exp(-theta))-theta*(u+v)
    - 2* log(1 - exp(-theta) - (1-exp(-theta*u))*(1-exp(-theta*v)));
  }

checks in R with (example)

library(copula)
myCop <- archmCopula(family = "frank", dim = 2, param =.2)
u<-c(0.7088403,0.05479929)
dCopula(u,myCop)
[1] 0.9622948

reference to student-t copula: http://comisef.wikidot.com/article:enhancing-eml-estimation-of-multivariate-t-copulas-w

from András Bárdossy in "Copula-based uncertainty modelling:

Application to multisensor precipitation

estimates"

Andre

Shanchi Zhang

unread,

Apr 10, 2017, 10:42:15 AM4/10/17

to Stan users mailing list

Thanks for your answer, Ben.

Now I found that my previous wrong result was due to the lack of loop (vector[N] summands in your example), not my copula density. But I will follow your suggestions to define the right density functions first, anyway.

I really appreciate your help. You answered perfectly.

在 2017年4月7日星期五 UTC-4下午3:48:06，Ben Goodrich写道：

Shanchi Zhang

unread,

Apr 10, 2017, 10:43:59 AM4/10/17

to Stan users mailing list

This is a good idea, especially for those who want to model various copulas in stan.

在 2017年4月9日星期日 UTC-4下午12:09:14，andre.p...@googlemail.com写道：

Reply all

Reply to author

Forward