Matrix Multiplication / Neural Nets in Rstan

2,566 views
Skip to first unread message

Sam Weiss

unread,
May 12, 2015, 5:56:56 PM5/12/15
to stan-...@googlegroups.com
I'm trying to made a neural network in Rstan (just to try it) and having trouble doing some matrix /vector multiplication. I'm getting the error:

TRANSLATING MODEL 'model' FROM Stan CODE TO C++ CODE NOW.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'model' with error message:
EXPECTATION FAILURE - DIAGNOSTIC(S) FROM PARSER:

no matches for function name="multiply"
    arg 0 type=vector
    arg 1 type=vector
available function signatures for multiply:
0.  multiply(real, real) : real
1.  multiply(vector, real) : vector
2.  multiply(row vector, real) : row vector
3.  multiply(matrix, real) : matrix
4.  multiply(row vector, vector) : real
5.  multiply(vector, row vector) : matrix
6.  multiply(matrix, vector) : vector
7.  multiply(row vector, matrix) : row vector
8.  multiply(matrix, matrix) : matrix
9.  multiply(real, vector) : vector
10.  multiply(real, row vector) : row vector
11.  multiply(real, matrix) : matrix
expression is ill formed


LOCATION OF PARSING ERROR (line = 26, position = 90):

PARSED:

model{
matrix[N,num_nodes] fitted_middle[num_hidden];



for(n in 1:N)
  for(j in 1:num_nodes) fitted_middle[1,n, j] ~ bernoulli_logit(beta_initial[j]*x_data[n]

EXPECTED: <eps> BUT FOUND: 

);


How can I set up the multiplication so it is in matrix form?

I'm mainly trying to get this to compile to see if it works but, more generally, could rstan be used to estimate parameters for neural networks instead of gradient descent? 

Thanks in advance,
Sam
nnet_rstan.R

Ben Goodrich

unread,
May 12, 2015, 6:23:39 PM5/12/15
to stan-...@googlegroups.com, samc...@gmail.com
Two vectors are not conformable for multiplication. You could transpose the first one, or use the dot_product() function (which is more tolerant of input that is not strictly correct).

Ben

Sam Weiss

unread,
May 14, 2015, 10:46:56 AM5/14/15
to stan-...@googlegroups.com, samc...@gmail.com
Ben

Thanks! Got it to work. Here's github if anyone is interested:

Bob Carpenter

unread,
May 14, 2015, 11:39:57 AM5/14/15
to stan-...@googlegroups.com
Thanks for sharing. We've been meaning to do something like
this ourselves for ages so would like to include a variant of
what you're doing in our doc or example models.

Were you reproducing a textbook example here (I saw the Iris
data)?

Did you fit it with optimization or MCMC?

How well did it work? We'd like to include an example with
some of the mods suggested below.

Where'd iris come from? This is the first line of the R:

iris1=iris[1:100,]


I'd suggest replacing

fitted_middle[q,n, j] <- 1/ (1+exp(-beta_middle[q, j] * fitted_middle[q-1,n]'));

with:


fitted_middle[q,n,j] <- inv_logit(beta_middle[q,j] * fitted_middle[q-1,n]');

dot_product's even a bit faster. It would be even faster to do a matrix
product for beta_middle and fitted_middle and assign to a temporary and then
assign to fitted_middel using inverse logit.


If you define beta_final and fitted_middle appropriately in terms
of shape, you should be able to replace this:

y_data[n] ~ bernoulli_logit(beta_final' * fitted_middle[num_hidden,n]');

with

y_data ~ bernoulli_logit(beta_final * fitted_middle);

without having to transform anything (though transforms aren't that
expensive because there are no derivatives).

Also, if you don't want to save all of fitted_middle, you could
declare and define it in the model block as a local variable.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

FHRB Toledo

unread,
May 14, 2015, 11:57:17 AM5/14/15
to stan-...@googlegroups.com
Dear All,

Thanks indeed to Sam for sharing it.

Regarding the Bob questions:

The iris data is from Fisher (http://en.wikipedia.org/wiki/Iris_flower_data_set). It was also used as example on the nnet R package and is a native's R data.

Regards,
FH

Sam Weiss

unread,
May 14, 2015, 11:58:31 AM5/14/15
to stan-...@googlegroups.com
Hey Bob,

The Iris dataset is in my R sessions. I just type in "iris" and the data set is there : http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html. I used this dataset since it seems to be a standard among testing new applications (its always in ML vignettes). The iris dataset includes 3 classes of 50 observations each, so I just took first 100 observations which corresponds to first two classes. 

It was fit with MCMC.

As for performance, it does predict data out of sample well but I haven't compared it with neuralnet packages found in R.

I didn't take this example from a textbook but just wanted to get a better understanding of neural networks and rstan so I decided to try it.. 

Thanks for suggestions. I'll include them.

Of course, feel free to use this as a starting point for your documentation. 

Sam


You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/3QBBpo11Lus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
May 14, 2015, 12:02:56 PM5/14/15
to stan-...@googlegroups.com
Wow --- a top-level global named iris for the iris data.
R never ceases to amaze me!

- Bob

Herra Huu

unread,
May 14, 2015, 3:08:39 PM5/14/15
to stan-...@googlegroups.com
Nice. I have been wanting to test neural networks with Stan for some time as well. So this seemed like a good opportunity to finally test it out. Made some improvements (hopefully) to the provided code. I guess the most important changes were:

- Bob's ideas (at least most of them) + some additional linear algebra
- added priors for betas (probably the most important part!)
- fixed some bugs (for example, first middle layers wasn't used at all in the original code, I think)

I'm not 100% sure that there isn't any bugs any more, but at least the predictions seems to be pretty decent. And it's quite fast +  n_eff&Rhat values are good.

Edit: updated file nn.stan, fixed one small bug.
testing_nn.R
nn.stan

Aki Vehtari

unread,
May 15, 2015, 8:41:49 AM5/15/15
to stan-...@googlegroups.com
In nnet literature it's quite common to center sigmoid transfer functions to 0.
It might be useful to scale also inv_logits as
 inv_logit(.)*2-1
to make the prior mean of the hidden unit to be 0, which will make the zero mean priors used for the weights to work better. 

Aki

Herra Huu

unread,
May 15, 2015, 10:26:01 AM5/15/15
to stan-...@googlegroups.com
Thanks, that makes a lot of sense. Updated model code as an attachment. It did improve results quite a bit, at least for the iris dataset (which is not that complex).  I think I will do some simulations next.
nn.stan

Sam Weiss

unread,
May 15, 2015, 10:39:09 AM5/15/15
to stan-...@googlegroups.com
I'm getting an error when I try and run the code:

"Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sampling’ for signature ‘"missing"’ "

Do you know what that could be?

Thanks,
Sam

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/3QBBpo11Lus/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Herra Huu

unread,
May 15, 2015, 10:49:14 AM5/15/15
to stan-...@googlegroups.com
Oh sorry, I didn't remember to post the updated R file (I changed some of the parameter/data names in the model, so that's probably the problem). Does this (attached file) work?
testing_nn.R

Sam Weiss

unread,
May 15, 2015, 10:49:28 AM5/15/15
to stan-...@googlegroups.com
Loaded .rstan file incorrectly. Works fine.

Bob Carpenter

unread,
May 15, 2015, 3:27:56 PM5/15/15
to stan-...@googlegroups.com
You can use tanh(u) instead of inv_logit(u) * 2 - 1.

At least that's what I've usually seen used.

It scales by a factor of 2, in that:

2 * inv_logit(u) -1 = tanh(u / 2)

- Bob
Reply all
Reply to author
Forward
0 new messages