RStan vs. Stan (+ bonus on backward compatiblity)

Bob Carpenter

unread,

Apr 15, 2015, 12:00:00 AM4/15/15

to stan...@googlegroups.com

I couldn't get a word in edgewise at that meeting to clarify
what I meant.

What I was trying to say is that I don't think of things that
are only in RStan as being part of Stan. This is purely semantic.
I'm not trying to discourage you guys from putting things into
RStan.

What I would like to encourage is that we take functionality and
put them into the C++ API for Stan --- that's what I think of as
Stan itself. The advantage is that then we can expose them beyond
the subset of Stan users who go through RStan. It will also save
us from having to reimplement them multiple times.

I would also like to encourage the RStan developers to keep
backward compatiblity in mind. It's really a pain in the ass
for a user when interfaces change and the result is that users
will continue to use old versions of the system with interfaces they
know rather than switching.

I was speaking to a heavy user of JAGS who didn't want to try
Stan because he'd built up so much infrastructure around JAGS.
Changing interfaces pulls the rug out from under that kind of
infrastructure building and is really frustrating for users.

Now having said that, I do realize we have to occassionally break
backwards compatibility in the interest of making everything
better. I just think we should do this absolutely no more than
once every couple years. We should not under any circumstances
think of this as something that happens every couple of months as
we trickle out new features that break old things.

- Bob

Daniel Lee

unread,

Apr 15, 2015, 12:34:15 AM4/15/15

to stan...@googlegroups.com

+1 to all of this.

- Bob

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,

Apr 15, 2015, 2:42:13 AM4/15/15

to stan...@googlegroups.com

On Wednesday, April 15, 2015 at 12:00:00 AM UTC-4, Bob Carpenter wrote:

What I would like to encourage is that we take functionality and
put them into the C++ API for Stan --- that's what I think of as
Stan itself. The advantage is that then we can expose them beyond
the subset of Stan users who go through RStan. It will also save
us from having to reimplement them multiple times.

I basically agree with this, although we currently have approximately zero known users of the C++ API and from the user's perspective Stan "is" the interface that they are using. But if something is relatively easy to do in the interfaces or in one interface and hard or time-consuming to put into the C++ API, then I don't think it should get delayed. The partial inits that Mav is doing are an example of that. Conversely, I don't like the idea that we sort of moved in the direction of today where "the interface recognizes missing data, rewrites the .stan program, and then parses / compiles it". The C++ API should have a way to represent missingness like every other statistical software.

I would also like to encourage the RStan developers to keep
backward compatiblity in mind. It's really a pain in the ass
for a user when interfaces change and the result is that users
will continue to use old versions of the system with interfaces they
know rather than switching.

The only times I remember us breaking backward compatibility were changing stan() to take a control argument for the config options instead of passing them all through the ... and now eliminating the set_cppo() function (which is but shouldn't be called in .R scripts anyway). But we'll keep trying to keep the user-facing stuff working the way it has in the past.

Ben

Bob Carpenter

unread,

Apr 15, 2015, 3:21:01 AM4/15/15

to stan...@googlegroups.com

> On Apr 15, 2015, at 4:42 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Wednesday, April 15, 2015 at 12:00:00 AM UTC-4, Bob Carpenter wrote:
> What I would like to encourage is that we take functionality and
> put them into the C++ API for Stan --- that's what I think of as
> Stan itself. The advantage is that then we can expose them beyond
> the subset of Stan users who go through RStan. It will also save
> us from having to reimplement them multiple times.
>
> I basically agree with this, although we currently have approximately zero known users of the C++ API and from the user's perspective Stan "is" the interface that they are using.

Absolutely. So I guess I think of Stan as more what's in
all of the interfaces. So maybe I took the min when I meant
the max :-)

> But if something is relatively easy to do in the interfaces or in one interface and hard or time-consuming to put into the C++ API, then I don't think it should get delayed.

OK --- I'd just like to do it in a way that when we do write it
in C++, it won't be pain to retrofit.

> The partial inits that Mav is doing are an example of that. Conversely, I don't like the idea that we sort of moved in the direction of today where "the interface recognizes missing data, rewrites the .stan program, and then parses / compiles it". The C++ API should have a way to represent missingness like every other statistical software.

I completely agree (though obviously not every other stats
software deals with missingness --- not even every R function
can handle it).

> I would also like to encourage the RStan developers to keep
> backward compatiblity in mind. It's really a pain in the ass
> for a user when interfaces change and the result is that users
> will continue to use old versions of the system with interfaces they
> know rather than switching.
>
> The only times I remember us breaking backward compatibility were changing stan() to take a control argument for the config options instead of passing them all through the ... and now eliminating the set_cppo() function (which is but shouldn't be called in .R scripts anyway). But we'll keep trying to keep the user-facing stuff working the way it has in the past.

It was Michael who I believed proposing breaking backward
compatiblity everywhere. The fact that it's going to be hard to
maintain backward compatibility with the current CmdStan structure
is why I don't like it.

- Bob

Michael Betancourt

unread,

Apr 15, 2015, 4:05:59 AM4/15/15

to stan...@googlegroups.com

>
> I basically agree with this, although we currently have approximately zero known users of the C++ API

False. Just because they don’t clog the users list doesn’t mean that
they’re not out there using the code. For some reason they just don’t
like to publicize their projects unless you’re meeting them in person.

> and from the user's perspective Stan "is" the interface that they are using.

But is that interface R or Python? PyStan has a huge number of users
judging by the PyPI stats, and random functions developed for RStan
will not be available to those users.

Stan’s too big to be “agile” anymore. We’re just going to have to deal
with that or hire a bunch more developers to do all of the backend
refactoring and testing we spend much of our time on.

> The C++ API should have a way to represent missingness like every other statistical software.

Maybe the C++ API should do crappy inference like every other
statistical software, too? If there were a simple way to incorporate
unknown missingness into our statically compiled models we would
have done it by now. Even specific missingness patterns (missing
covariates only, for example) would be nontrivial to implement.

Ben Goodrich

unread,

Apr 15, 2015, 9:59:42 AM4/15/15

to stan...@googlegroups.com

On Wednesday, April 15, 2015 at 4:05:59 AM UTC-4, Michael Betancourt wrote:

>
> I basically agree with this, although we currently have approximately zero known users of the C++ API

False. Just because they don’t clog the users list doesn’t mean that
they’re not out there using the code. For some reason they just don’t
like to publicize their projects unless you’re meeting them in person.

That is not inconsistent with there being approximately zero known users of the C++ API relative to the huge number of known users of one of the interfaces. I'm sure there would be more API users if our API were better or more stable but still dwarfed by the number of users who don't perceive the C++ at all.

> and from the user's perspective Stan "is" the interface that they are using.

But is that interface R or Python? PyStan has a huge number of users
judging by the PyPI stats, and random functions developed for RStan
will not be available to those users.

or MatlabStan or JuliaStan or StataStan ... I'm sure Stan "is" different things to different people depending on which interface they use. And I think there is consensus on trying to make the interfaces all implement low-hanging fruit, while at the same time doing what is natural within the high-level language. But there are plenty of things that weren't low-hanging fruit like ShinyStan or exposing user-defined Stan functions to the interface that I personally don't know how to do outside of R and don't personally know of anyone who knows how to do it outside of R.

> The C++ API should have a way to represent missingness like every other statistical software.

Maybe the C++ API should do crappy inference like every other
statistical software, too?

I don't think there is anything crappy about the way other statistical software _represents_ missingness.

> sum(is.na(poll$Income))
[1] 53

I have 53 missing values on income in this poll and the first few are

> head(which(is.na(poll$Income)))
[1]  33  42  88 207 371 576

That is totally reasonable behavior. Even Stata's 27 ways to represent missingness I can live with because I don't need the 27 largest floating point numbers anyway.

But Stata (and most software that I know of) differs from R in the default way it handles variables with missingness. If you ask for the mean of such a variable in R, it will return NA by default but there is an option to most functions that defaults to FALSE which allows you to do the calculating excluding the missing values, which is the default behavior in Stata and SPSS and whatnot. That is a design choice but not a good one because it encourages people to ignore missingness rather than modeling it. And Stan achieves the same bad behavior by not even having a convenient way to represent missingness.

If there were a simple way to incorporate
unknown missingness into our statically compiled models we would
have done it by now. Even specific missingness patterns (missing
covariates only, for example) would be nontrivial to implement.

Depending on how you define compiled, BUGS did it 20 years ago and a Java implementation of BART does it today. It would be simple to do in Stan for continuous data if there were a way to represent missingness. It would be simple to do in Stan for PMFs with a finite number of categories if there were a way to represent missingness.

We have quiet and signaling NaNs available in C++ but no way to do I / O with them currently. We can do the I / O with plus or minus infinity even though there are few reasons to do so, and I think they would be more usefully interpreted as missing values. Rcpp uses the R way of representing missingness but that doesn't help with the other interfaces. Maybe the simplest thing for Stan is to go the Stata route and reserve the largest numbers to represent missingness.

But there isn't going to be any development of Stan features to model missingness until there is something that a user can conveniently "if, else" on.

Ben

Allen B. Riddell

unread,

Apr 15, 2015, 10:36:28 AM4/15/15

to stan...@googlegroups.com

On 04/15, Bob Carpenter wrote:
>
> > On Apr 15, 2015, at 4:42 PM, Ben Goodrich <goodri...@gmail.com> wrote:
> >
> > On Wednesday, April 15, 2015 at 12:00:00 AM UTC-4, Bob Carpenter wrote:
> > What I would like to encourage is that we take functionality and
> > put them into the C++ API for Stan --- that's what I think of as
> > Stan itself. The advantage is that then we can expose them beyond
> > the subset of Stan users who go through RStan. It will also save
> > us from having to reimplement them multiple times.
> >
> > I basically agree with this, although we currently have approximately zero known users of the C++ API and from the user's perspective Stan "is" the interface that they are using.
>
> Absolutely. So I guess I think of Stan as more what's in
> all of the interfaces. So maybe I took the min when I meant
> the max :-)
>

Shouldn't it be the 95% HPD on features across interfaces?

On a previous point, I think the C++ API will be more useful once it's
better documented and one can use it without being fluent in Eigen.

-a

Bob Carpenter

unread,

Apr 15, 2015, 6:58:01 PM4/15/15

to stan...@googlegroups.com

> On Apr 16, 2015, at 12:36 AM, Allen B. Riddell <a...@ariddell.org> wrote:
>
> ...

> On a previous point, I think the C++ API will be more useful once it's
> better documented and one can use it without being fluent in Eigen.

Eigen's the easiest part of the C++ API! I do understand
it's providing you some problems w.r.t. Python, though.

- Bob

Reply all

Reply to author

Forward