Fwd: packaging Stan for Debian

34 views
Skip to first unread message

Ben Goodrich

unread,
Aug 3, 2013, 4:17:44 PM8/3/13
to stan...@googlegroups.com
forgot to CC the list

On Sat, Aug 3, 2013 at 3:56 PM, Ross Boylan wrote:
Maybe I'm overthinking it.  Maybe you just meant that "stan" is not the name
of the executable.  My intent was "the executable for the stan system".

Right, I just meant that the binary is called stanc

goodrich@CYBERPOWERPC:/opt/stan$ bin/stanc --help

stanc version
1.3.0

USAGE
:  stanc [options] <model_file>

and there is no binary called stan. Although, as Bob points out, there is a print binary (that would need to be renamed in the Debian package) that prints a summary of the posterior distribution from the .csv files. Those are the only two executables.

Anyway, packaging Stan wouldn't be very hard. Making r-noncran-rstan is the harder part, but probably more useful. It is true, as Bob mentioned on another post, that the R package doesn't use all of Boost, but neither does Stan. It is just that Stan embeds all of Boost because no one has gotten around to pruning the parts we don't need with bcp. The easiest thing to do would be to depend on libboost1.54-all, but Debian frowns on that sort of thing.

And then reconfiguring the R package to work with a system-wide Stan is a bit more work. Nothing too hard, it is just that Stan has followed a very Windows-centric philosophy of distributing itself and that is often at odds with the Debian / UNIX way.

Ben


Ben
 

Bob Carpenter

unread,
Aug 3, 2013, 4:36:20 PM8/3/13
to stan...@googlegroups.com


On 8/3/13 4:20 PM, Ben Goodrich wrote:

> If Ross / Dan / whomever are up for it, packaging for Debian would be really beneficial. It would mean Stan gets a lot
> of testing on non-Intel platforms, facilitate the python integration, etc.

That would be great.

How does it facilitate Python integration? (I'm pretty much
completely ignorant about Python and Linux packaging.)

> It just means doing a lot of things the
> Debian way instead of the wrong way.

Could you be more specific? I know we've talked before about
lib and compiler dependencies.

- Bob

Ben Goodrich

unread,
Aug 3, 2013, 5:05:16 PM8/3/13
to stan...@googlegroups.com, stan-...@googlegroups.com
On Saturday, August 3, 2013 4:36:20 PM UTC-4, Bob Carpenter wrote:
On 8/3/13 4:20 PM, Ben Goodrich wrote:

> If Ross / Dan / whomever are up for it, packaging for Debian would be really beneficial. It would mean Stan gets a lot
> of testing on non-Intel platforms, facilitate the python integration, etc.

That would be great.

Also, it would re-open the possibility of getting Stan-related packages onto CRAN.
 
How does it facilitate Python integration?  (I'm pretty much
completely ignorant about Python and Linux packaging.)

The Pystan repo is like RStan in that it embeds its dependencies. If there were Stan-related Debian packages, then it would be pretty easy to make a tiny python-pystan Debian package that also used the system libraries. It doesn't make anything easier on Mac or Windows.
 
> It just means doing a lot of things the
> Debian way instead of the wrong way.

Could you be more specific?  I know we've talked before about
lib and compiler dependencies.

Debian is as much a religion as it is a Linux distribution. But I would say that three of the packaging principles are

-- Don't embed dependencies on other software
-- Make version requirements as lenient as possible
-- Separate the library from the headers from the binaries from the test-suite, etc.

(R)Stan is pretty much the opposite of that, but it wouldn't be that hard to have a Debian git branch that facilitated doing things the way Debian requires them.

Ben

Bob Carpenter

unread,
Aug 5, 2013, 11:20:13 AM8/5/13
to stan...@googlegroups.com
Can we take this discussion over to stan-dev?

More below.

On 8/5/13 9:46 AM, Ben Goodrich wrote:

> The cost of having
> hundreds of MB of unused headers was deemed less than the cost of running a one-line bcp command.

I don't have a problem with cutting down Boost or Eigen in our
official releases.

I made a similar comment about cutting down the Eigen contributed
code and I seem to recall the consensus was that I should leave it
intact.

I recall the following concerns about trimming Boost and Eigen:

1. Users might try to use it as their Boost or Eigen distribution
and be disappointed that it's incomplete.

2. We might try to use it as our Boost or Eigen distribution during
development and not have everything we need when we want to use a new
feature.

3. Something might go wrong in the include tool, especially w.r.t.
templates.

4. Adding another tool adds complexity to our build process.

I'm not at all worried about (1).

As to (2), what should we do with our Git repository?

As to (3), Jiqiang seems to have evertyhing working for RStan.

For (4), I think it may be worth it to cut down our distribution size.
Also as we (or maybe just I) get more comfortable with all the other
tools, the decision to add a new one gets cheaper. And new developers
may be insulated from these issues if the solution to (2)'s simple
enough.

One thing that's nice about the way we have things set up now is that
the release just mirrors the GitHub structure and the libs mirror their
distributions, so it's very conceptually simple. I value conceptual
simplicity very highly in software, but there are tradeoffs.

I'm also assuming their licenses are OK with less-than-complete
redistributions.

- Bob





Ben Goodrich

unread,
Aug 5, 2013, 2:25:15 PM8/5/13
to stan...@googlegroups.com
On Monday, August 5, 2013 11:20:13 AM UTC-4, Bob Carpenter wrote:
I made a similar comment about cutting down the Eigen contributed
code and I seem to recall the consensus was that I should leave it
intact.

Eigen is more challenging because there is no tool like bcp to sort out the dependencies. Also, Eigen is smaller so there is less to gain. But it is still probably a good idea in principle.
 
I recall the following concerns about trimming Boost and Eigen:

1.  Users might try to use it as their Boost or Eigen distribution
and be disappointed that it's incomplete.

Such people probably download Boost and Eigen from upstream or easily could.
 
2.  We might try to use it as our Boost or Eigen distribution during
development and not have everything we need when we want to use a new
feature.

Then we just have to add some headers back. Both Boost and Eigen have git repos.
 
3.  Something might go wrong in the include tool, especially w.r.t.
templates.

Just need to run the tests before merging the pull request that deletes stuff.

4.  Adding another tool adds complexity to our build process.

I think we need to cut from the repo, and at that point the build process is the same.

One thing that's nice about the way we have things set up now is that
the release just mirrors the GitHub structure and the libs mirror their
distributions, so it's very conceptually simple.  I value conceptual
simplicity very highly in software, but there are tradeoffs.

The directory would have the same structure, just fewer subdirectories and possibly fewer files within the subdirectories.
 
I'm also assuming their licenses are OK with less-than-complete
redistributions.

Yes.

Just do

goodrich@CYBERPOWERPC:/opt/stan$ mkdir /tmp/boost_1.54.0
goodrich@CYBERPOWERPC
:/opt/stan$ export STAN_HOME=/opt/stan
goodrich@CYBERPOWERPC
:/opt/stan$ find ${STAN_HOME}/src -name \*\.\[ch]pp -exec bcp --scan --boost=${STAN_HOME}/lib/boost_1.54.0 '{}' /tmp/boost_1.54.0/ \; &> /tmp/boost_1.54.0/bcp.log
goodrich@CYBERPOWERPC
:/opt/stan$ make CC=clang++ BOOST=/tmp/boost_1.54.0 -j9 test-headers && echo "boost working"

If that works, it should be safe to then do

git checkout develop
git checkout
-b feature/less_boost
git rm
-rf lib/boost_1.54.0
mv
/tmp/boost_1.54.0 lib
rm lib
/boost_1.54.0/bcp.log
git add lib
git commit
-m "trim boost"

Ben

Bob Carpenter

unread,
Aug 5, 2013, 3:16:25 PM8/5/13
to stan...@googlegroups.com
Is everyone OK with removing the parts of Boost and some of
the contributed mods in Eigen we're not using?

If so, we can make 2.0 a lot smaller to download for users.

More discussion inline below.

On 8/5/13 2:25 PM, Ben Goodrich wrote:
> On Monday, August 5, 2013 11:20:13 AM UTC-4, Bob Carpenter wrote:
>
> I made a similar comment about cutting down the Eigen contributed
> code and I seem to recall the consensus was that I should leave it
> intact.
>
>
> Eigen is more challenging because there is no tool like bcp to sort out the dependencies.

I hadn't realized that bcp was part of Boost and was
Boost-specific.

> Also, Eigen is smaller so
> there is less to gain. But it is still probably a good idea in principle.

I just wanted to only include the contributed libs that we
used (just fft, I think).

> I recall the following concerns about trimming Boost and Eigen:
>
> 1. Users might try to use it as their Boost or Eigen distribution
> and be disappointed that it's incomplete.
>
>
> Such people probably download Boost and Eigen from upstream or easily could.

Agreed. This seems like a non-issue to me, too.

> 2. We might try to use it as our Boost or Eigen distribution during
> development and not have everything we need when we want to use a new
> feature.
>
>
> Then we just have to add some headers back. Both Boost and Eigen have git repos.

The problem I envision is that I want to use function foo() from Boost, but
what I'm really going to need to do is go back to all of Boost and run
bcp again because foo() is likely to depend on a gazillion other things the way
Boost is put together.

We should also be better about using the lowest-level includes possible.
We're definitely over-including our own Stan code in a lot of places
by grabbing all of matrix.hpp, for example, instead of just the functions
we need.

> 3. Something might go wrong in the include tool, especially w.r.t.
> templates.
>
>
> Just need to run the tests before merging the pull request that deletes stuff.

In that case, the tests need to be complete and call all of
the possible run-time template instantiations. Or is that somehow
guaranteed by bcp when it looks at all our headers?

Also, we have some .cpp code that should also be included.

> 4. Adding another tool adds complexity to our build process.
>
>
> I think we need to cut from the repo, and at that point the build process is the same.

Right --- there's no user overhead.

But there is overhead for developers to update Boost versions and to
add new functions from Boost, both of which look totally manageable.

> One thing that's nice about the way we have things set up now is that
> the release just mirrors the GitHub structure and the libs mirror their
> distributions, so it's very conceptually simple. I value conceptual
> simplicity very highly in software, but there are tradeoffs.
>
>
> The directory would have the same structure, just fewer subdirectories and possibly fewer files within the subdirectories.
>
> I'm also assuming their licenses are OK with less-than-complete
> redistributions.
>
>
> Yes.
>
> Just do
>
> |
> goodrich@CYBERPOWERPC:/opt/stan$ mkdir /tmp/boost_1.54.0
> goodrich@CYBERPOWERPC:/opt/stan$ export STAN_HOME=/opt/stan
> goodrich@CYBERPOWERPC:/opt/stan$ find ${STAN_HOME}/src -name \*\.\[ch]pp -exec bcp --scan
> --boost=${STAN_HOME}/lib/boost_1.54.0 '{}' /tmp/boost_1.54.0/ \; &> /tmp/boost_1.54.0/bcp.log
> goodrich@CYBERPOWERPC:/opt/stan$ make CC=clang++ BOOST=/tmp/boost_1.54.0 -j9 test-headers && echo "boost working"
> |
>
> If that works, it should be safe to then do
>
> |
> git checkout develop
> git checkout -b feature/less_boost
> git rm -rf lib/boost_1.54.0
> mv /tmp/boost_1.54.0 lib
> rm lib/boost_1.54.0/bcp.log
> git add lib
> git commit -m "trim boost"
> |

Thanks.

- Bob
Reply all
Reply to author
Forward
0 new messages