prioritizing to-do list

40 views
Skip to first unread message

Bob Carpenter

unread,
Nov 15, 2012, 3:31:58 PM11/15/12
to stan...@googlegroups.com
I extracted some top-level to-do items for Stan in no particular
order. Could we discuss priorities at the next meeting?

* discrete sampling
* forward sampling (RNG in gen quant, all RNGs for distros)
* cumulative distributions & truncated prob functions
* vectorization of multivariate prob functions
* vectorized derivatives for prob functions
* special function vectorization
* matrix derivative partial evals
* Python, MATLAB, Stata, ??? interfaces
* intro applied Bayes book with RStan
* C++ manual
* C++ code cleanup (API doc, privates, consts, doc, split files,
long long, reserved word checking)
* simplify error handling by removing policies
* matrix slicing and dicing
* matrix expressions
* ragged arrays
* subroutines in modeling language
* user-specified language extensions
* conditional expressions, statements (for, while, if-else, comparisons)
* covariance matrix w. caching (inverse, determinant, Cholesky)
* I/O from CSV files
* fancier web site graphics
* multi-threading
* convergence diagnostics (basic I/O of leapfrog/gradients + evals)
* command for multiple chains + report like print(fit) in RStan
* initialization block in model
* ensemble samplers: DREAM, differential evolution, stretch/walk
* adapting fixed covariance matrix
* higher-order auto-dif and RM-HMC
* implicit functions with derivatives (for diff eqs?)
* compiler for R's linear model notation
* syntactic improvements in language (e.g., multiple declares,
declare and assign)
* user-callable transforms with Jacobian lp__ adjustment
* unconstrained parameterizations of prob functions
* new types: lower triangular (+/- strict), diagonal matrix,
symmetric matrix, Cholesky factor of pos-def matrix
* improve special function and prob function behavior at limits
* more example models, complete/improve BUGS models

- Bob

Matt Hoffman

unread,
Nov 15, 2012, 8:40:20 PM11/15/12
to stan-dev
Last night someone when someone at the ML meetup asked me "what's
being prioritized over GPU support?" I wish I'd had this list in front
of me. Also, I got some requests from coworkers today for Matlab
support.
> --
>
>

Andrew Gelman

unread,
Nov 15, 2012, 8:41:35 PM11/15/12
to stan...@googlegroups.com
Matlab should pay us for that!
> --
>

Ben Goodrich

unread,
Nov 15, 2012, 9:14:49 PM11/15/12
to stan...@googlegroups.com
On Thursday, November 15, 2012 3:32:01 PM UTC-5, Bob Carpenter wrote:
I extracted some top-level to-do items for Stan in no particular
order.   Could we discuss priorities at the next meeting?

In terms of local priority,

* covariance matrix w. caching (inverse, determinant, Cholesky)
* new types: lower triangular (+/- strict), diagonal matrix,
   symmetric matrix, Cholesky factor of pos-def matrix
 
I would say that the Cholesky factor type should go in ASAP and before the covariance matrix class stuff. But I'm not sure the other types would add that much value once we have the planned covariance matrix classes. There will be a DiagonalCovarianceMatrix class, and we already have diag_matrix(vector). Rarely is there a need for a symmetric matrix that is not also a covariance matrix or a need for a triangular matrix with negative diagonal elements (and thus is not a Cholesky factor). But maybe the value added would be worth the effort if Eigen supported them directly. Conversely, if we add a Cholesky factor type, then we should also add wishart_cholesky() and inv_wishart_cholesky() densities.

Ben

Bob Carpenter

unread,
Nov 16, 2012, 8:11:36 PM11/16/12
to stan...@googlegroups.com


On 11/15/12 9:14 PM, Ben Goodrich wrote:
> On Thursday, November 15, 2012 3:32:01 PM UTC-5, Bob Carpenter wrote:
>
> I extracted some top-level to-do items for Stan in no particular
> order. Could we discuss priorities at the next meeting?
>
>
> In terms of local priority,
>
> * covariance matrix w. caching (inverse, determinant, Cholesky)
> * new types: lower triangular (+/- strict), diagonal matrix,
> symmetric matrix, Cholesky factor of pos-def matrix
>
>
> I would say that the Cholesky factor type should go in ASAP and before the covariance matrix class stuff.

That's actually not that hard a thing to do. I could
probably knock that off in a few days including all the
doc and testing. In fact, it's probably even easier than
that because most of the code and transform doc is already
there inside of cov_matrix!

> But I'm not
> sure the other types would add that much value once we have the planned covariance matrix classes. There will be a
> DiagonalCovarianceMatrix class, and we already have diag_matrix(vector). Rarely is there a need for a symmetric matrix
> that is not also a covariance matrix or a need for a triangular matrix with negative diagonal elements (and thus is not
> a Cholesky factor).

I've already been surprised about what people have
asked for in terms of data structures. I think general
ragged data structures can cover the triangular case.

Nobody's asked for symmetric matrices -- I'm just
a natural collector and like to have complete sets :-)

diag_matrix(vector) is very inefficient the way we
do it -- we're still bypassing the expression templates
in Eigen that make it efficient. We could add an
efficient

matrix diag_matrix_multiply(vector v, matrix m);

that computes diag_matrix(v) * m directly without blowing
it out. First level would be to use Eigen, second level
would be to write specialized auto-diffs.

I have almost no feeling at all which of these ops will
be useful. They're very easy to write, though.

> But maybe the value added would be worth the effort if Eigen supported them directly. Conversely, if
> we add a Cholesky factor type, then we should also add wishart_cholesky() and inv_wishart_cholesky() densities.

Agreed.

- Bob

Bob Carpenter

unread,
Nov 16, 2012, 8:40:54 PM11/16/12
to stan...@googlegroups.com
> On Nov 15, 2012, at 8:40 PM, Matt Hoffman wrote:
>
>> Last night someone when someone at the ML meetup asked me "what's
>> being prioritized over GPU support?"

:-)

>>I wish I'd had this list in front
>> of me. Also, I got some requests from coworkers today for Matlab
>> support.

I just gave a talk to the IGERT students this
afternoon, and apparently they're mostly EE and
CS types who only use Matlab or Python. They hadn't
even heard of R!!! Amazing how high these intellectual
silos are.

On 11/15/12 8:41 PM, Andrew Gelman wrote:
> Matlab should pay us for that!

Writing MATLAB interfaces is on our radar -- it was in the
NSF grant proposal, and we're missing lots of users without
it. I keep hoping we can get a volunteer to do it.

We could either go the whole hog and do an RStan-like
integration inside MATLAB's process, or we could
do something more command-line like that'd just
shuttle data back and forth to other process(es)
running Stan.

- Bob

Ben Goodrich

unread,
Nov 16, 2012, 8:51:10 PM11/16/12
to stan...@googlegroups.com
On Thursday, November 15, 2012 3:32:01 PM UTC-5, Bob Carpenter wrote:
I extracted some top-level to-do items for Stan in no particular
order.   Could we discuss priorities at the next meeting?

In addition, I came up with a plan to make a RStan package that would work with an already-installed Stan. This would require Stan to have a make install target that would generate the .pc files to be used by pkg-config

http://en.wikipedia.org/wiki/Pkg-config

and put the .pc files into the locations where pkg-config can find them. But I don't know how much of a priority that is.

Ben

Bob Carpenter

unread,
Nov 17, 2012, 1:37:14 PM11/17/12
to stan...@googlegroups.com
It can't hurt. The only issue is where to doc it so
users can find it.

Could you put the new make target in rstan/makefile?
We should keep all the R-related material under rstan.

- Bob

Ben Goodrich

unread,
Nov 17, 2012, 4:24:22 PM11/17/12
to stan...@googlegroups.com

No, the make install target would be for Stan, but it doesn't have anything inherently to do with R. These days, many libraries (with the notable exception of boost) come with .pc files. For example, the top-level of eigen has a fragment of one:

goodrich@CYBERPOWERPC:/tmp/include-what-you-use$ cat /opt/eigen/eigen3.pc.in
Name: Eigen3
Description: A C++ template library for linear algebra: vectors, matrices, and related algorithms
Requires:
Version: ${EIGEN_VERSION_NUMBER}
Libs:
Cflags: -I${INCLUDE_INSTALL_DIR}

When someone "installs" eigen, this .pc.in file gets processed so that ${EIGEN_VERSION_NUMBER} becomes 3.1.x and ${INCLUDE_INSTALL_DIR} becomes whatever is appropriate for that platform (/usr/include/eigen3/ for me) and the .pc file gets put into some canonical location where pkg-config can find it (separate from the rest of the library). Thus, this mechanism is useful for any external program that wants to use a library.

What I am thinking for the case of Stan is that make install would

-- call make bin/stanc{.exe} which also makes libstan
-- autogenerate / process the corresponding libstan.pc file(s)
-- put the libstan.pc file into the place in the filesystem where pkg-config looks for that platform

It isn't even absolutely necessary to move any components of Stan into a different place in the filesystem, although I guess we could have an option for that. We can then bundle the pkg-config-lite program (GPL2+) with RStan. The libstan.pc file would have all the information about directories and flags needed to use Stan, so that something like

g++ -o user_model.o user_model.cpp $(pkg-config --libs --cflags libstan)

would compile. At that point, the RStan package would only need to be a few MB and small enough to go on CRAN, since it wouldn't need copies of Stan / boost / eigen. But, to install RStan the computer would need the same version of Stan installed.

Ben

Bob Carpenter

unread,
Nov 17, 2012, 4:28:24 PM11/17/12
to stan...@googlegroups.com
OK -- that makes sense. I'm so glad
you're doing this -- config dries me
bananas.

- Bob

On 11/17/12 4:24 PM, Ben Goodrich wrote:
> On Saturday, November 17, 2012 1:37:15 PM UTC-5, Bob Carpenter wrote:
>
> On 11/16/12 8:51 PM, Ben Goodrich wrote:
> > On Thursday, November 15, 2012 3:32:01 PM UTC-5, Bob Carpenter wrote:
> >
> > I extracted some top-level to-do items for Stan in no particular
> > order. Could we discuss priorities at the next meeting?
> >
> >
> > In addition, I came up with a plan to make a RStan package that would work with an already-installed Stan. This
> would
> > require Stan to have a make install target that would generate the .pc files to be used by pkg-config
> >
> > http://en.wikipedia.org/wiki/Pkg-config <http://en.wikipedia.org/wiki/Pkg-config>
> --
>
>
Reply all
Reply to author
Forward
0 new messages