Error running a simple linear model - bug or data problem?

171 views
Skip to first unread message

Lindsay

unread,
Feb 22, 2013, 12:29:38 PM2/22/13
to stan-...@googlegroups.com
Dear Stan community,

I have been having some problems getting a simple linear model to run in Stan. I thought I was doing something dumb in my coding and after trying some different things posted a question to Stack Overflow (http://stackoverflow.com/questions/15024573/sampling-error-basic-regression-model-in-stan). However, to try and experiment a bit, I subsequently ran my model on a subset of my data, and it ran fine! So that makes me think there is something weird going on in my data, but I can't for the life of me see it. I am running Stan 1.1.0 on R 2.15.2 (OS X Mountain Lion). The error I am getting is

TRANSLATING MODEL 'lm_stan' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'lm_stan' NOW.
SAMPLING FOR MODEL 'lm_stan' NOW (CHAIN 1).
Error : Error in function stan::prob::normal_log(d): Random variable is nan, but must not be nan!
In addition: Warning message:
In storage.mode(x) <- "integer" : NAs introduced by coercion
error occurred during calling the sampler; sampling not done
>    

The data is time-series, cross-section, so I know my model is inappropriate---but I am trying to figure out how to use Stan as I am going along, so I am starting simple and working my way up. The Stan code and data is included below. As far as I can see there is nothing irregular about my data. Any help or insight greatly appreciated.

Lindsay


stan_data <- list("y"=y,
                "year"=year,
                "N_obs" = N_obs)


lm <- "data {
                    int<lower=1> N_obs;
                    real year[N_obs];
                real y[N_obs];
                 }
            parameters {
                    real alpha;
                real beta;
                   real<lower=0> sigma;
             }
             
             transformed parameters{

             }
                      
             model {
                 vector[N_obs] mu_hat;
           
                 alpha ~ normal(0, 100);
                 beta ~ normal(0, 100);
                 sigma ~ uniform(0, 100);
   
                 for(i in 1:N_obs){
                     mu_hat[i] <- alpha + beta * year[i];
                          y[i] ~ normal(mu_hat[i], sigma);
                 }     
              }"


write(lm, file="lm_stan")


lm.fit0 <- stan(file="lm_stan",
                data=stan_data,
                chains=1,
                iter=5000)
               

The data is as follows:

stan_data $y 42089728 9339536 9781184 138361088 30910448 30411792 629997056 21062368 1167006 7631744 6925444 5893008 35743680 -55904 116299776 966712 178152 19397504 101188992 1536242176 44078264 1243806 105937664 43202352 -4213172 40201728 84412544 16671128 0 19432968 44403296 89021120 33442736 5850532 68061664 0 86286272 636771072 65779408 6416524 25559184 0 0 11437649 128506560 26867136 1646992 -16684608 43974528 6812660 0 0 -906249 17730360 6571846 -14056304 -2317026 29722656 43035904 70388248 -202987 24308224 0 19598944 25241600 31093140 172198080 68365824 -15307088 345229424 0 91912288 6387084 6936104 362958976 10828080 34233728 465616896 185831488 4554222 14789792 19448168 27692960 88308096 75171552 -246307584 11228152 8361832 2265296 172424512 1182046720 22629408 1165429 348064512 77001792 11092408 84706848 -19970752 -2386432 66124424 19266104 72069984 14311872 -1680048 509040 188740112 318636288 170175680 -244937216 16264160 6017916 327072 159117760 0 8156479 320665728 36684736 17502416 29556064 47395008 12937934 168051632 0 892982 10329560 1355983 -4529648 -43117 -10704432 226641152 23704368 -3433973 -73329408 0 3594688 51327088 59915116 293390016 382384192 -12102624 -336263424 0 -24685504 -899952 10155976 218019584 48748112 30058752 1842414592 44083792 5092000 24174848 10985128 33436544 159885024 36513376 140204416 12631560 8951732 25929808 353803264 3143784448 60253136 702773 506841344 38420128 11721112 92972608 60845840 30016168 37990192 -6470864 78287520 21554528 29755168 3766984 35639136 26794784 583849280 267967488 37916960 11501600 22704880 133042624 513627 3389580 289430272 21665616 85471472 39646656 116267616 -13407846 15678080 27691000 682450 9635360 580544 16791136 793524 38486832 -79701376 -63242544 2160139 202091584 300 60001872 120758144 50716744 13548672 623414144 21202400 0 0 17696512 -5566584 -3197064 201575680 34187360 50923296 1267788800 28845072 1021406 20589376 5255816 19726800 43046336 84012320 93750016 1549232 4102708 20721248 36500736 5098330112 -20425392 781041 247644672 28292416 21682296 52508672 38884352 57993648 953560 1437008 81498304 86611584 23846608 5454052 37785760 99136512 58742016 1308937472 37354624 14447532 19370288 81054432 108383989 5834392 196654592 -37886048 199787840 -38083360 -19815904 1496112 7065456 30429000 -190947 3102040 5150997 6569152 711859 42429536 148236256 70894720 -888473 62231296 15503290 -17289808 106739712 -46661248 -185851136 602047616 15609200 940000 0

$year
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4

$N_obs
284




Jiqiang Guo

unread,
Feb 22, 2013, 1:32:03 PM2/22/13
to stan-...@googlegroups.com
One problem in the stan code is that sigma is declared to be on the positive real line but the prior is a uniform.  So the declaration of sigma needs to be changed to 

real<lower=0,upper=100> sigma; 

And then you can get rid of the line that says sigma ~ uniform(0, 100) since it is redundant.

That said, the above seems not to be the reason.  The real reason might be that after transformation on the data, some entries of y become NA.   RStan tries to convert data into integers if no accuracy would be lost, but it failed to consider the case that the integers are so big that they cannot be represented in R, which only support 32 bit integers now.  

For the time being, a workaround is to transform data y into double (add some digits after the decimal place)  before calling stan().  For example, if I do the following, all will go well. 

y[N_obs] <- y[N_obs] + 0.00001 # the influence should be negligible

And this is the model code I have modified:

data {
  int<lower=1> N_obs;
  real year[N_obs];
  real y[N_obs];
}

parameters {
  real alpha;
  real beta;
  real<lower=0,upper=100> sigma;
}


model {
  vector[N_obs] mu_hat;
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ uniform(0, 100);
  for(i in 1:N_obs){
    mu_hat[i] <- alpha + beta * year[i];
  }
  y ~ normal(mu_hat, sigma);
}






--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Lindsay Stirton

unread,
Feb 22, 2013, 2:46:20 PM2/22/13
to stan-...@googlegroups.com
Thank you Guo!

y[N_obs] <- y[N_obs] + 0.00001 # the influence should be negligible

This doesn't work for me. 

However, based on your explanation I figured that if I divide my y vector by 1,000,000 (it is budgetary data, so makes sense to consider in millions) it would work--and it does!

Best wishes, 

Lindsay 

Jiqiang Guo

unread,
Feb 22, 2013, 2:57:30 PM2/22/13
to stan-...@googlegroups.com
Great.  My workaround for this problem should work, but I should have pointed out that is a line of R code (not Stan code) for data y before call stan() in R. 

--
Jiqiang 
Reply all
Reply to author
Forward
0 new messages