editing model_code strings in emacs with stan-mode

224 views
Skip to first unread message

Tamas Papp

unread,
Aug 24, 2014, 8:35:30 AM8/24/14
to stan-...@googlegroups.com
Hi,

When building up models gradually with RStan, I found it is easier to
manager my code if, instead of having a .stan file for each model, I
provide it as a string, as in

model <- stan_model(model_code =
"
data {
...
}
...
")

However, this implies that I have to edit Stan code as text, and miss
the benefits of Jeffrey Arnold's excellent stan-mode.

I thought of the following functionality that would help me out: with
the cursor on a string, I call a function (eg stan-mode-edit-string)
that would read the string, open it in another, temporary buffer, using
stan-mode, allow me to edit it, and then when done, replace the original
with the new version. Similarly to org-mode's C-c ' in babel source
code, etc.

Before starting to hack this together (turns out to be a bit complicated
with my meagre knowledge of Emacs Lisp), I thought I would ask if anyone
has any EL code for this or similar functionality.

Best,

Tamas

Lionel Henry

unread,
Aug 24, 2014, 10:43:05 AM8/24/14
to stan-...@googlegroups.com
Hello,

I think this should not be very hard to implement with polymode.

https://github.com/vitoshka/polymode

Jeffrey Arnold

unread,
Aug 24, 2014, 12:49:40 PM8/24/14
to stan-...@googlegroups.com
I don't have any code that does this. Echoing the previous reply, you could check out polymode (https://github.com/vitoshka/polymode). However, this isn't  the typical use case for polymode. You are just editing a string in a language that happens to contain code in another language, rather than a file with chunks in different languages. To define a polymode for R and Stan, you would probably need to define an opening and closing pattern, something like "// stan-start" and "// stan-end" that you would put within the Stan  before and after the Stan model in order to identify the area to edit. 
So your character vectors with Stan models would look something like this:

stancode <- "
// stan-start
data{
...
}
model {
...
}
// stan-end

However, I think writing the model inside a string is a bad idea, unless it is absolutely necessary that everything is in one file or you are programmatically generating the model code. 



--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Aug 24, 2014, 1:19:59 PM8/24/14
to stan-...@googlegroups.com
Could you say more about why you found it easier to provide models
as strings in R? We're trying to understand how our users interact
with R and lots of people seem to like working this way. We had
assumed it was because it was in our getting started instructions,
but it sounds like you've tried it both ways and like the string
include.

I personally findit easier to keep the models separate so
that I can use them across our interfaces and easily test them straight
from C++.

- Bob

Tamas Papp

unread,
Aug 25, 2014, 3:57:47 AM8/25/14
to stan-...@googlegroups.com
Hi Bob,

There are two reasons: one is general, the other is specific to Stan.

The general reason is that I find it easier to keep things in one file
for reproducible research. I interweave text, explanations, graphs, and
of course code. This is not what the final version of the paper will
look like, of course, but it is great for keeping track of what I am
doing and also for communicating with coauthors. Org-mode is great for
this, and there are various tutorials, eg http://irreal.org/blog/?p=2686

The Stan-specific reason is that I try to follow the advice of the BDA
and Gelman & Hill books and build up models gradually, starting from a
simple one, and checking it along the way. Interweaving R code (data
preparation, understanding results, model checking) with Stan code (the
actual model) allows me to read the file like a (mostly) linear
narrative of what I am doing: I estimated this model, worked OK along
some dimensions, but these graphs show that it needs improvements along
others, I implement those in R (if I need to transform the data etc) and
Stan (if I need to change the model), then repeat until I am
satisfied. All versions of the models and the results end up in the
file, so I can always look back at how I got there (I am also using git
for tracking changes, but that's orthogonal to this).

That said, I have been using the single-file approach for only a short
while, so I am still experimenting with it. I don't think it would be
possible to do this without Emacs or an equivalent editor, though. I use
outline minor mode to fold parts of the buffer, org-mode to combine
text, code, graphs, and tables, and stan code within R for the models.

Best,

Tamas

Tamas Papp

unread,
Aug 25, 2014, 4:45:13 AM8/25/14
to stan-...@googlegroups.com
Hi,

Thanks for the suggestion! I tried this and ran into an issue with the
terminating ": https://github.com/vitoshka/polymode/issues/37

If you are familiar with polymode, can you recommend a solution?

Best,

Tamas

Tamas Papp

unread,
Aug 25, 2014, 4:46:49 AM8/25/14
to stan-...@googlegroups.com
Hi Jeffrey,

Can you please elaborate on why you consider this a bad idea? I find
that it works really well for my purposes, but maybe I am missing
something important that will be an issue later.

Best,

Tamas

Luc Coffeng

unread,
Aug 25, 2014, 9:58:21 AM8/25/14
to stan-...@googlegroups.com
Hi Bob,

Like Tamas, I like keeping everything in one file because it shows the narrative. Also, I'm kind of paranoid about files getting separated, so I like to have everything in one file. I can see why you as a developer want to be able to rerun code on different platforms. However, I think for most users this is not really something they will do, as most stick to one platform (or as I do, run their R code in Windows or Mac).

Back to the narrative argument though. I think for me, the "narrative"-thinking is kind of a habit. I haven't enjoyed any formal coding classes, so I've never been taught to organize code in separate files. Now, I can see the benefits of doing so (e.g. I recently became aware of version control software etc). However, I've found version control to be a pain to integrate in my daily routine.

I'm not sure whether it's still within the scope of the Stan Development Team to provide a general guide on coding practices (although I'd love to see such a guide), unless they believe that Stan has specific features that require special attention, from a user-perspective. Maybe you might like to launch a poll to see what most people do, and why.

Best,

Luc

Jeffrey Arnold

unread,
Aug 25, 2014, 10:07:09 AM8/25/14
to stan-...@googlegroups.com
I agree with most of the points of Rob. It just makes most things more difficult and I think it creates less rather than more readable code. 

For editor and syntax support it is more difficult to handle the multiple languages.  Org-mode and knitr do not handle arbitrary code in strings but have a special structure that places different units of code in chunks with metadata about how to handle it.  When you place code in the string, you also give up the ability to have syntax highlighting of the code; you could write something that highlights the string differently than a normal string, but then you have logically inconsistent highlighting.

If you are using a version control system, it is easier to find commits where you changed the model.  It is easier to tell that foo.stan was changed, than that line 3535 of foo.org which was changed corresponds to a change in the Stan model.  Very good comment discipline could solve this, but that's taking something for which an automated solution is available, and making it manual again.

It isn't required for reproducible research that everything is in a single file. Reproducible research is about having all the code necessary to reproduce an analysis available.  Already, data isn't included in-line in the root document, but is kept in separate files.

The way I go about it with knitr is to have the stan code in a separate file, and then have a chunk that just prints out the stan model verbatim.   Code that uses that file occurs elsewhere. 



Jeffrey Arnold

unread,
Aug 25, 2014, 11:07:53 AM8/25/14
to stan-...@googlegroups.com
I've never used polymode before. I think what is happening is that after the Stan section, when emacs highlights the R it is unaware of the original opening ".  I played around with it for bit but I wasn't able to get anything working.


Bob Carpenter

unread,
Aug 25, 2014, 1:19:21 PM8/25/14
to stan-...@googlegroups.com

On Aug 25, 2014, at 9:58 AM, Luc Coffeng <lucco...@gmail.com> wrote:

> Hi Bob,
>
> Like Tamas, I like keeping everything in one file because it shows the narrative. Also, I'm kind of paranoid about files getting separated, so I like to have everything in one file. I can see why you as a developer want to be able to rerun code on different platforms. However, I think for most users this is not really something they will do, as most stick to one platform (or as I do, run their R code in Windows or Mac).

But even as a user, I work with Michael and Andrew on models and Michael
only uses CmdStan and Andrew only RStan.

It may not be an issue if our users all clump with collaborators on the
same platform.

> Back to the narrative argument though. I think for me, the "narrative"-thinking is kind of a habit.

Interesting. This is why I find stats papers so hard to read. They
want to tell a story rather than give definitions.

I don't think you need a narrative of how you got from where you started
to where you're at (except perhaps as a guide for the next person so they
don't make the same mistakes).

What I like is a single script that can run everything from the shell.
There's your narrative. And when I say "everything", I mean everything
from the raw data munging to the final output graph generation.

One problem doing this with a tool that runs the code as it builds the
doc is that the code may take a long time to run. It may need to be shipped
out to clusters to run experiments. So trying to shoehorn it into my LaTeX
pipeline just isn't going to work.

I do keep hearing good things about iPython notebooks. One of them being
that you can just open one and start typing and it records everything you
did to get from point A to point B. Myself, I'd rather keep refactoring until
I have a reproducible path from A to B that isn't just a record of everything
I typed into a shell or interpreter.

> I haven't enjoyed any formal coding classes, so I've never been taught to organize code in separate files. Now, I can see the benefits of doing so (e.g. I recently became aware of version control software etc). However, I've found version control to be a pain to integrate in my daily routine.

I don't think anyone learns this kind of thing in classes. You
learn software engineering more from working with others on bigger
projects. I think it's kind of like applied stats that way. Homework
is a very controlled environment compared to applied stats work (which
can span years and involve lots of feedback) and the same goes for
software engineering. The hardest thing to do in my own experience is
learn how to read others' code --- you often have to reverse engineer their
thinking and then come to appreciate doing just about everything the standard
way so there's less guesswork.

If you're finding version control to be a pain, that's just because
it's one of those "bicycle skills" to quote John Cook:

http://www.johndcook.com/blog/2012/08/01/bicycle-skills/

Once you get used to it, there's really no alternative. If you use a remote repo,
you also get backups and can easily collaborate with others. It also helps organize
your work so you don't have all the "last-weeks-version", "first-version",
"current", "no-really-current" directories that the lack of version control
tend to engender.

Using something like SVN (or Git like SVN) is pretty easy. I found the jump up
to the Git branching and development model painful, but again now feel that I can't
live without it on a project our size.

The "right" answer is very much related to what your goals are and how
many people you're working with.

> I'm not sure whether it's still within the scope of the Stan Development Team to provide a general guide on coding practices (although I'd love to see such a guide), unless they believe that Stan has specific features that require special attention, from a user-perspective. Maybe you might like to launch a poll to see what most people do, and why.

We'll do whatever we can to be helpful. I don't think Stan has any
features that differentiate it from BUGS or JAGS in terms of workflow.

- Bob

Bob Carpenter

unread,
Aug 25, 2014, 1:22:00 PM8/25/14
to stan-...@googlegroups.com
+1 to commit file granularity and reproducibility only needing
all the code, not requiring a single file. Very rarely can I put
the data in a single file, because I usually get it from some external
source in some format that requires munging to a usable format.

- Bob

Avraham Adler

unread,
Aug 25, 2014, 3:25:44 PM8/25/14
to stan-...@googlegroups.com
I have a couple of papers in peer review, and I use the multiple file method and a git repository. As long as I can call the chunks I need, it works fine. I acyually keep all my stan files as separate files. When showing the code, I ended up using LaTeX "\lstinputlisting"  instead of knitr itself, I forget why :) Either way, I have completely reproducible output and, as mentioned above, my commits are simple to follow since the .Rnw is separate from the .R (although I ended up with R code in the Rnw too) is separate from the .stan file. If they were all in one massive document, I would find that much more difficult to debug. FWIW, I'm using RStudio under Win7 (although sometimes I attack the .stan directly in Notepad++).

Avi

Ross Boylan

unread,
Aug 25, 2014, 3:51:57 PM8/25/14
to stan-...@googlegroups.com
On Mon, Aug 25, 2014 at 01:19:18PM -0400, Bob Carpenter wrote:
>
...
> What I like is a single script that can run everything from the shell.
> There's your narrative. And when I say "everything", I mean everything
> from the raw data munging to the final output graph generation.

For my direct use of stan I had separate files for each analysis, but
in the post-stan analysiss I had a bunch of very similar analyes and
the only sense in which they are all together is that they are in the
version control system. Most of the code that did the work is in the
final file, but some of the superstructure I tweaked for each
analysis. I did so partly because keeping around
multiple versions of a function with fairly trivial differences
distasteful, and partly because some things I tweaked, such as the
host list for nodes, reflected ephemeral features of the environment.

In retrospect it probably would have been better to come closer to the
single, reproducible script (interpreting script as a set of files),
but you were wondering how people were doing things, and that's what I
did.

I used the ESS package for emacs to preserve transcripts of my R
sessions for both (r)stan and later analysis. I found it convenient
to have the stan model as a string in the R file, in part because that
made it easier to be sure the data I passed in to stan matched the
form the stan model was expecting.

Ross Boylan

Bob Carpenter

unread,
Aug 25, 2014, 4:22:32 PM8/25/14
to stan-...@googlegroups.com
It's these little tweaks here and there that are most worrisome to me
as I do applied modeling. They don't seem worth abstracting into proper
solutions, yet cutting-and-pasting big scripts to change little bits of
them is very error prone from a consistency point of view. So what I do
is try to pull out common functions that can be saved and try to minimize
the differences. But it's always a struggle and I usually feel myself
losing a grip on all the pieces after a half dozen or so variants.

I really don't know what the right solution to this is that appropriately
measures the value of programmer time. It's easy to be prescriptive without
considering programmer time and say everything should be production-ready
industrial-strength software, but that makes no sense from a practical point of
view in terms of getting more work done at an acceptably high quality level.

I have even less of an idea about what to do for issues like specifying
compute nodes and other ephemeral (love that word) properties that are
very user specific. You could go to abstract specs of compute power, but
that's a huge overkill for what an individual needs, though without it, you
leave scripts in a state that they can't be directly run by others without
modifying the internals.

- Bob

Ross Boylan

unread,
Aug 25, 2014, 4:56:18 PM8/25/14
to stan-...@googlegroups.com
On Mon, Aug 25, 2014 at 04:22:28PM -0400, Bob Carpenter wrote:
> It's these little tweaks here and there that are most worrisome to me
> as I do applied modeling. They don't seem worth abstracting into proper
> solutions, yet cutting-and-pasting big scripts to change little bits of
> them is very error prone from a consistency point of view. So what I do
> is try to pull out common functions that can be saved and try to minimize
> the differences. But it's always a struggle and I usually feel myself
> losing a grip on all the pieces after a half dozen or so variants.
>
> I really don't know what the right solution to this is that appropriately
> measures the value of programmer time. It's easy to be prescriptive without
> considering programmer time and say everything should be production-ready
> industrial-strength software, but that makes no sense from a practical point of
> view in terms of getting more work done at an acceptably high quality level.

That puts the problem very well. Some strategies I used:

* preserving the transcripts so they show what state things were in,
including pesky often-tweaked functions. Limitation: it's pretty easy
for the tweaked code not to appear in the transcript since it sourced
from elsewhere.

One odd limitation I ran into is that there seems to be something
about running distributed R under emacs/ESS that makes emacs lock up
unrecoverably after the computation completes. This makes all the
unsaved transcripts disappear. I never tracked down the cause, which
I suspect was something idiosycratic to my system set up.

* always commit, with a good comment, immediately after what may be a
final production run. Limitation: wading through commit logs is tedious,
particulary since there is only one time the run turns out to be
final, but often many times that were thought to be final :)

I also limited the tweaking to stuff related to the distributed
computations, which is probably not going to be that reproducible
anyway. More on that below.

Using branches and/or tags in teh version control system would
probably have some value as well; I didn't use them. One issue with
this is that it's not always apparent what will end up going in the
final paper.

>
> I have even less of an idea about what to do for issues like specifying
> compute nodes and other ephemeral (love that word) properties that are
> very user specific. You could go to abstract specs of compute power, but
> that's a huge overkill for what an individual needs, though without it, you
> leave scripts in a state that they can't be directly run by others without
> modifying the internals.

I think the reasonableness of reproducibility varies with the
activity. Usually the goal is just to get the analysis working for
the researcher. Even at that level there are issues preservign the
exact software configuration. Most relevantly, I was tracking r/stan
development and don't have complete records of which commits various
analyses were based one. But it could be the results also depend on
the exact version of some library with no obvious connection to the
work, or perhaps to the compiler version used.

The there's the "reproducible for someone else" level. My scripts for
Linux aren't necessarily going to be too helpful for someone on
Windows. And finally there's distributed computations; it's very
unlikely someone else will have the same distributed infrastructure as
I do. Particulary at that level, I think the only thing reasonable is
to regard what the researcher has done as a bunch of hints, not
something someone will be able to pull down and run without
modification and thought.

There are also often issues with making the underlying data
available. Even data from publicly available data sources is often
cleaned and modified in various ways, not always in a very
reproducible way.

Ross

Jeffrey Arnold

unread,
Aug 26, 2014, 1:08:39 AM8/26/14
to stan-...@googlegroups.com
I think the reasonableness of reproducibility varies with the
activity.  Usually the goal is just to get the analysis working for
the researcher.  Even at that level there are issues preservign the
exact software configuration.  Most relevantly, I was tracking r/stan
development and don't have complete records of which commits various
analyses were based one.  But it could be the results also depend on
the exact version of some library with no obvious connection to the
work, or perhaps to the compiler version used.

That most projects don't include a local versions of the packages used in the analysis is something that hopefully will start to change.  Have you seen http://rstudio.github.io/packrat/ ? It creates a project specific library, so that the environment in which R analyses are run is portable. It's like an R version of virtualenv (python). And since Rstudio develops it, it already has built in Rstudio support.

Tamas Papp

unread,
Aug 26, 2014, 4:41:16 AM8/26/14
to stan-...@googlegroups.com
Hi,

This thread started as a "how to do X" question, then Bob asked "why do
you do X", and now we are having an "X vs Y / best practice"
conversation, which is interesting but I don't yet have a strong opinion
on X myself: I am at the stage of experimenting with a single-file
approach to see how it works out.

My main gripe with separate R scripts and stan files was that I had a
hard time of (mentally) keeping track of which belongs to which. For
some projects I did 10-20 models when exploring the data, and even
though Emacs has facilities for keeping views together (the workgroups
package) that was a bit too much overhead and I ended up 2x the number
of files.

Various languages/communities have different styles when it comes
organizing code into files, eg Matlab users frequently put each function
into its own file, R users put quite a few functions into a single file
but code documentation separately, while entire Common Lisp libraries
have been written in a single file. At this point I don't know whether I
will continue to write stan models as strings in R code, but I would
like to experiment with it for a while so it would be great if the
possibility remained open.

Jeffrey's point about editor support is a valid one, but orthogonal to
the issue: if it would really be more convenient to keep R and Stan code
together (I don't know), then IMO the editor should strive to support
it, instead of the programmer having to choose a solution that is not
preferred.

Thanks for all the help, I will try to follow up with polymode if I get
a working solution.

Best,

Tamas

Ross Boylan

unread,
Aug 26, 2014, 1:50:19 PM8/26/14
to stan-...@googlegroups.com
[changed subject as we seem to have expanded considerably on the original topic]

On Mon, Aug 25, 2014 at 10:08:38PM -0700, Jeffrey Arnold wrote:
> > [Ross wrote]
I hadn't heard of it; it looks interesting, although it sounds as if
each project would have its own copy of each library it needed, which
would be tough to keep updated. Real R packages (as opposed
to "code I used for my analysis") have dependency info already,
although that is insufficient to identify the exact versions used for
a particular analysis.

packrat is an R only solution; once you start including other tools,
programs and libraries, things get messier. At least on Debian one
could get a list of all Debian packages installed, including their
version, and the versions of all relevant non-Debian-packaged
software. In the vanilla case just specifying the overall release,
e.g., Debian wheezy, is pretty good, although there are ongoing
security updates within a release. I think its common to have various
bits and pieces that aren't from the release (I do), and lots of
people run on Debian "testing" which is much more of a moving target.

I think there are also tools that identify what files are accessed while a
program runs.

Another off-the-wall approach would be to preserve the run-time
environment in a virtual machine.

I've been discussing this as if the goal of reproduction is to get
bit-identical results in an environment in which any change, including
minor point releases of software, could make a difference. Three
comments on that, in order of increasing optimism:

1) Even basic arithmetic may not be bit identical on different
platforms or with different compilers. At the limit one would need
the exact stepping of the processors used. The reality of computing
is that almost anything could make a difference.

2) Usually things are not that sensitive, and if we go by analogy to
experimental science in the lab or real world not everything about the
experiment is recorded, because that would be impossible. While this
provides some perspective and rhetorical defense for looseness with
software and statistics, it would only be a great argument for
relaxing if lab and real world experiments were in fact reproducible.
It seems quite common that they aren't.

3) One could take the attitude that research should provide enough
info to perform the same "experiment" and that, at the point at which
an important difference emerges in the "reproduction" one should get
down to the particulars of what causes that. Attempting to record
every possibly relevant particular in advance is unrealistic.

Ross Boylan

Bob Carpenter

unread,
Aug 26, 2014, 4:45:16 PM8/26/14
to stan-...@googlegroups.com
Excellent points about floating point. Floating point
irreproducibility is a huge unknown unknown for most users
(and a huge known unknown for most of us, and a known known
for a handful of real experts).

I'd be happy if people got closer to stage (3) --- keep the
"experiment" the same. If the answers are that different, I
agree it should be a red flag.

That's all I've ever aimed for. Though I do keep versions
of software used around, I'm not keeping versions of compilers
and what-not used for projects. Even different optimization
levels can behave differently in the same compiler.

- Bob

Michael Betancourt

unread,
Aug 26, 2014, 4:52:16 PM8/26/14
to stan-...@googlegroups.com
To be fair, if results are sensitive to floating point errors propagating
then the ultimate application of computer arithmetic comes into
question. While we don’t want to abuse floating point (saving
intermediate states with 3 digits of precision) we also shouldn’t
be responsible for guarding against it.

Bob Carpenter

unread,
Aug 26, 2014, 6:07:51 PM8/26/14
to stan-...@googlegroups.com

On Aug 26, 2014, at 4:26 AM, Tamas Papp <tkp...@gmail.com> wrote:

> Hi,
>
> This thread started as a "how to do X" question, then Bob asked "why do
> you do X", and now we are having an "X vs Y / best practice"
> conversation, which is interesting but I don't yet have a strong opinion
> on X myself: I am at the stage of experimenting with a single-file
> approach to see how it works out.
>
> My main gripe with separate R scripts and stan files was that I had a
> hard time of (mentally) keeping track of which belongs to which. For
> some projects I did 10-20 models when exploring the data, and even
> though Emacs has facilities for keeping views together (the workgroups
> package) that was a bit too much overhead and I ended up 2x the number
> of files.

Yup --- that's a big problem. I tend to copy whole directories, and
keep things organized that way. But it's not ideal. Eventually I start
to try to break shared functionality down into reusable components, but
that's a separate file/path-management headache.

The other thing I tend to do is not create 20 models in parallel, but
explore them sequentially, checking into version control early and often,
so that I can recover old versions without having them cluttering up my
workspace.

> Various languages/communities have different styles when it comes
> organizing code into files, eg Matlab users frequently put each function
> into its own file, R users put quite a few functions into a single file
> but code documentation separately, while entire Common Lisp libraries
> have been written in a single file. At this point I don't know whether I
> will continue to write stan models as strings in R code, but I would
> like to experiment with it for a while so it would be great if the
> possibility remained open.
>
> Jeffrey's point about editor support is a valid one, but orthogonal to
> the issue: if it would really be more convenient to keep R and Stan code
> together (I don't know), then IMO the editor should strive to support
> it, instead of the programmer having to choose a solution that is not
> preferred.

Agreed. We don't want to be dictatorial on practices. Reasonable people
can differ on preferred mode of operation.

Our constraint is only time. And we tend to prioritize things we
do and think are good practice. (Not to mention that are documentable,
testable, and maintainable, which are orthogonal concerns.)

- Bob

Ross Boylan

unread,
Aug 26, 2014, 6:17:53 PM8/26/14
to stan-...@googlegroups.com
On Tue, Aug 26, 2014 at 09:52:10PM +0100, Michael Betancourt wrote:
> To be fair, if results are sensitive to floating point errors propagating
> then the ultimate application of computer arithmetic comes into
> question. While we don't want to abuse floating point (saving
> intermediate states with 3 digits of precision) we also shouldn't
> be responsible for guarding against it.
>
In some ways, excessive reproducibility is undesirable. If the
results really did depend on a particular software version or compiler
flag, it's more useful to discover that than to miss it because you or
someone else has exactly redone what was done before. If they depend
on a particular software package one might be suspicious too,
e.g. stan vs bugs, or stata vs SAS vs R.

If the results differ substantively, though, one probably wants to
find out why, and you can only do that if you can reproduce the
process that led to the earlier answer. So I guess my proposed attitudde
in option 3--wait until something is amiss and then debug it--really
only defers the problem until a discrepancy is uncovered. At that
point, you kind of do need a time machine.

Ross

P.S. Just in case it needs to be said, it is good for software to
produce the correct answer rather than an incorrect answer produced by
another piece of software or an earlier version of the same software.
But should there be differences, an explanation is required.

Emmanuel Charpentier

unread,
Aug 29, 2014, 4:42:48 PM8/29/14
to stan-...@googlegroups.com
A couple points :

"Code within report" (i. e. Stan code strings in R in our case) has a major point : it is the laziest (i.e. best in my lazy and arrogant opinion) to keep code and documentation in synch. Anything that can encourage good practices at zero- or very low cost mus be considered a big plus (at least as a default prior :).

The same remark can also be done about data. Putting (not too big) datasets in the reports seriously helps to ensure consistency. I do put my data in the .Rnw source of the analysis of small datasets (frequent in biomedical domains). Having one file for all allows for an easy answer to nosy reviewers questions...

Knitr has a number of very helpful properties that allows to use this "one file" approach with lengthy computation : its "cache" feature gives it a "make-like" behaviour that ensures consistency of the results. With a very small effort, it can also ensure consistency with external files (e. g. data too big or too complex to fit reasonably within one .Rnw file).

It can also allow easy reuse of text and chunks in various works on the same data. This way, you can keep parts common to, say, a technical report, a draft paper and a slide presentation in the same file, and reuse it at will. Again, consistency guaranteed.

The same result could be achieved with "clever" use of make and related tools. At a larger price...

Revision control systems : yes, yes, yes, this is a must-have. But, again, for most of applied statisticians, this is only a tool ; if it requires too much effort, it will not be used. Early experiences with RCS and CVS made me avoid them like the plague (huge initial investment in infrastructure understanding). A later try with Mercurial left me with a milder version of the same distrust. The no-nonsense, start from bottom, approach of git converted me in about a month. With a not-too-steep learning curve and *ZERO* infrastructure investment, git is probably THE way to get started. (I just regret to have needed almost 8 years to realize that...).

Of course, these tools "cross-fertilize" : git allows a better control of what goes or not goes in such and such files, allows for easier refactoring of a project that takes (or loses) importance, etc... The data-code-comment consistency allowed by knitr alleviate the need for branching due to minor errors/thinkos, etc... And all these tools allow for a lazier approach to work, which is of paramount importance :-) .

In this context, the "Stan code in R string" availability allows, at least for small project, an easier consistency. Therefore, it should be encouraged. If the price to pay for emacs to support it is the addition of a syntax for these "subchunks", so be it...

HTH,

--
Emmanuel Charpentier

Bill Harris

unread,
Aug 21, 2015, 10:24:49 PM8/21/15
to Stan users mailing list
I'm late to this thread, but I've gone back and forth on the one file method.  I'm an org-mode user, and I've just started putting the Stan code in example blocks, not in strings, and that seems the best I've seen so far for me.  If I name the block, then I can pull it into the R script in another block, and I can export the example block (code) as part of any export.   I agree that editing an R string was awkward.

With a bit of time, I could add Stan to org-babel-languages or whatever it's called and then make automatic use of stan-mode.  That would make a nice addition to org-mode, too.

Lacking that, I should be able to change to stan-mode when I'm working on the file.

Bill

PS: BTW, has anyone thought of processing the manual into a texinfo file, too?  Sometimes it would be nice to check the help with C-h i, and I don't recall any images except that of Stan Ulam.

Bob Carpenter

unread,
Aug 21, 2015, 10:28:28 PM8/21/15
to stan-...@googlegroups.com
There are a few more images than that. No, the manual's not in texinfo
format, and I don't even have time to figure it out. The reason we like
LaTeX is that it's full featured for math, has macros/commands, and we
all know it. If someone can find a way to translate it, that'd be great.
I wasn't really following all the issues that came up trying to do it before.
Ben probably remembers.

- Bob

Jeffrey Arnold

unread,
Aug 22, 2015, 12:30:16 AM8/22/15
to Stan users mailing list
I've tended to still use the separate stan file method, but a one-file method makes sense in org-mode or knitr where you can define separate blocks for different languages.  I did something similar by adding a stan engine to knitr. The stan code is put in a separate block, and then when the document compiles, the stan code is compiled and saved to a object that can be used by subsequent code.

--

Bob Carpenter

unread,
Aug 22, 2015, 1:06:27 AM8/22/15
to stan-...@googlegroups.com
Cool --- could you share that? My knitr-fu is nearly non-existent and I
find it a huge pain to do anything non-trivial in it.

- Bob

Tamas Papp

unread,
Aug 22, 2015, 3:31:23 AM8/22/15
to stan-...@googlegroups.com
Can you please post a minimal example of an org-mode file, eg on gist?

I am doing something similar, but tangle the stan code into a file and
use that.

Best,

Tamas

Bill Harris

unread,
Aug 22, 2015, 12:29:29 PM8/22/15
to Stan users mailing list
Bob,

The only way I would think it makes sense to produce texinfo is if something like `pandoc manual.tex -s -o manual.texinfo` produced useful results.  I don't think I've used pandoc more than as a simple test case, so I don't know what it would do.

Does the LaTeX exist anywhere as a source file, or is it assembled from a bunch of pieces by some make file?  If it's not huge, and if it's not complicated to grab or assemble, I could try that sometime.  I'm not a git user yet; for historical reasons, I still use bazaar.

I very much appreciate the use of LaTeX, although I usually create LaTeX these days by writing in org-mode and then letting it convert.

Bill

Bill Harris

unread,
Aug 22, 2015, 1:28:49 PM8/22/15
to Stan users mailing list
Tamas,

Does this help?  Any suggestions for improvement are welcome, especially if they allow one to edit the Stan code like one does other source code.

Bill
stanorgtemplate.org
stanorgtemplate.pdf

Kyle Meyer

unread,
Aug 22, 2015, 1:47:49 PM8/22/15
to Bill Harris, Stan users mailing list
Hi Bill,

Bill Harris <bill_...@facilitatedsystems.com> writes:

> Any suggestions for improvement are welcome, especially if they allow
> one to edit the Stan code like one does other source code.

I wrote a minimal ob-stan.el a while back but didn't end up putting it
online anywhere. I'll clean it up a bit and send it. If you and others
find it useful, I'd be happy to contact the Org list about putting it in
the contrib directory.

--
Kyle

Bob Carpenter

unread,
Aug 22, 2015, 2:21:53 PM8/22/15
to stan-...@googlegroups.com
I seem to recall people trying pandoc, but finding it's not very
robust.

Stan's manual is built using latexmk, but that's just to run the bib
and index to convergence --- you can also run pdflatex and bibtex
manually if you'd prefer. The source is all on the stan-dev/stan
GitHub repo (which you can download a zip if you don't want to
learn Git) in directory src/docs/stan-reference, with the bibtex
file in src/docs/bibtex.

- Bob

Kyle Meyer

unread,
Aug 22, 2015, 2:33:36 PM8/22/15
to Bill Harris, Stan users mailing list
Kyle Meyer <ky...@kyleam.com> writes:

> I'll clean it up a bit and send it.

I've attached ob-stan.el and an example Org file.

--
Kyle

ob-stan.el
ob-stan-example.org

Ben Goodrich

unread,
Aug 22, 2015, 2:45:49 PM8/22/15
to Stan users mailing list
On Sunday, August 24, 2014 at 8:35:30 AM UTC-4, Tamas Papp wrote:
When building up models gradually with RStan, I found it is easier to
manager my code if, instead of having a .stan file for each model, I
provide it as a string, as in

model <- stan_model(model_code =
"
data {
...
}
...
")

If you use Emacs, then you might not care about my 2 cents.
But for anyone who is using RStudio, the daily builds already have function name completion for Stan functions, which looks like

That is reason enough for me to use a separate file with a .stan extension when building a Stan program.
And the code completion for Stan syntax will probably be better in the future (more like that for R syntax).
Also, having line numbers to go back to when there are parser errors.

Ben

Bill Harris

unread,
Aug 22, 2015, 2:53:36 PM8/22/15
to Stan users mailing list
In case anyone wonders at the complexity of the headers to the file and the code blocks, I created two Easy Templates <r and <G to insert the headers for an R code block or an R code block that produces ggplot2 output.  The org-mode export dispatcher can insert header templates; I included the default and latex template, deleting the extra date line.  That's about 6 or 8 keystrokes for the start and two for each code block.

Bill 

Bill Harris

unread,
Aug 22, 2015, 3:04:12 PM8/22/15
to Stan users mailing list, bill_...@facilitatedsystems.com
Kyle,

Cool.  Thanks; I'll have to try it out (after I look up where and how to install ob-stan.el!).

From skimming the code and example, it sounds like you use it with rstan by letting it write a .stan file, but then you use a :var like I did, making it seem like the Stan code is passed internally through org and not externally through a file.  I was thinking about writing my example block to a file, if it's ever needed, but yours may do that already.  This could be good.

I sense that one of Andrew's valid concerns is that it's nice to have a generic Stan model that can be used in any of the interfaces.  Both your better approach and my simple approach would seem to let you do that in org, too: just write the code and then pull it into an R or a Python or a shell code block through a :var or from a file.

Bill

Kyle Meyer

unread,
Aug 22, 2015, 4:43:28 PM8/22/15
to Bill Harris, Stan users mailing list
Bill Harris <bill_...@facilitatedsystems.com> writes:

> Cool. Thanks; I'll have to try it out (after I look up where and how to
> install ob-stan.el!).

Put it in a directory in your load-path and then

(require 'ob-stan)

> From skimming the code and example, it sounds like you use it with rstan by
> letting it write a .stan file, but then you use a :var like I did, making
> it seem like the Stan code is passed internally through org and not
> externally through a file.

Yes, it is written to a file. If the :file argument ends with ".stan"
or if org-babel-stan-cmdstan-directory is not set, the block content is
written to the file. :var in the R block specifies to use the file name
that has the results from Stan code block.

If org-babel-stan-cmdstan-directory is set and the specified file does
not end in ".stan", then the contents are written to another file (file
+ ".stan") and the model is compiled to the specified file that can be
used in a shell block.

> I sense that one of Andrew's valid concerns is that it's nice to have a
> generic Stan model that can be used in any of the interfaces. Both your
> better approach and my simple approach would seem to let you do that in
> org, too: just write the code and then pull it into an R or a Python or a
> shell code block through a :var or from a file.

Yep, the result of the block is just a file name for the Stan file that
it created, so it can be pulled through any interface with Babel support
(and the Stan file can be used outside of Org). In cases where you'd
rather not have the Stan block in the Org file (e.g., if the Stan code
is really long), you could just use the name of an external Stan file
directly in the interface block and (optionally, for reference or easy
access) have a link to the Stan file.

--
Kyle

Bill Harris

unread,
Aug 22, 2015, 6:50:21 PM8/22/15
to Stan users mailing list
Thanks, Kyle.  That works great.

Bill

On Sunday, August 24, 2014 at 5:35:30 AM UTC-7, Tamas Papp wrote:
Hi,

When building up models gradually with RStan, I found it is easier to
manager my code if, instead of having a .stan file for each model, I
provide it as a string, as in

model <- stan_model(model_code =
"
data {
...
}
...
")

Kyle Meyer

unread,
Sep 3, 2015, 2:45:47 AM9/3/15
to Bill Harris, Stan users mailing list
Bill Harris <bill_...@facilitatedsystems.com> writes:

> Cool. Thanks; I'll have to try it out (after I look up where and how to
> install ob-stan.el!).

Just to follow up: ob-stan.el is now in the developmental branch of Org
mode and should be included in the next feature release.

http://orgmode.org/cgit.cgi/org-mode.git/commit/?id=7ab1874a93a89f5b9c509a2f15bbe4381b25c73f

http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-stan.html

--
Kyle

Tamas Papp

unread,
Sep 3, 2015, 9:41:05 AM9/3/15
to stan-...@googlegroups.com
Thank you so much! I have been using it for a day now, and this has
simplified my workflow greatly.

Earlier I wrote a simple Emacs script to generate buffer-local unique
id's like x195a. This allows me to deal with many independent models in
an R file quickly without mixing up stuff or making up long names. Now a
typical snippet for a model looks like this:

#+BEGIN_SRC R :exports both :cache yes :results silent
local({
... # simulate here
save(file="x863b.RData", ...variables to save...)
})
#+END_SRC

#+NAME: model_x863b
#+BEGIN_SRC stan :file "x863b.stan" :exports code
... stan code here ...
#+END_SRC

#+BEGIN_SRC R :results none :exports none :var model=model_x863b
local({
load("x863b.RData")
fit_x863b <- stan(model, refresh=-1, data=list(... data I use ...),
init="random", verbose=FALSE)
save(file="x863b_fit.RData", fit_x863b)
})
#+END_SRC

Best,

Tamas

Bill Harris

unread,
Sep 3, 2015, 7:07:05 PM9/3/15
to Stan users mailing list, bill_...@facilitatedsystems.com
Thank you very much, Kyle.  I'm using the version you posted here and am looking forward to seeing it in melpa.

Bill
Reply all
Reply to author
Forward
0 new messages