First, I wanted to introduce myself as I'm new to the group. I'm a PhD student at the University of Maryland working in pharmaceuticals (highly aligned with Sebastian's work) and since my thesis/interests revolve around both visualization/application development and deployment I'm going to take a crack at those areas.
Since this group will likely turn into alpha/beta testers a couple things that would be extremely helpful to me are if people could chime in on the following:
a) current/desired dev environment (eg rstudio + shinystan vs r-repl+emacs vs ipython-notebook+pystan etc)
b) project resource consumption - especially upper range of RAM consumption and range of model run times
c) use case - transient (temp environment to be destroyed after logging off) vs day-to-day driver (maintain results/projects/etc).
d) If day-to-day use, what are ideal sync scenarios (dropbox vs github vs direct upload/download, etc)
e) yes/no to desire to ever self-host, and if so hosting platform (internal cluster vs something like AWS).
Thanks and look forward to interacting more with everyone!
Devin
On Wednesday, July 29, 2015 at 6:59:47 PM UTC-4, Bob Carpenter wrote:
> > On Jul 29, 2015, at 5:12 PM, Devin Pastoor <devin....@gmail.com> wrote:
> >
> > Hi all,
> >
> > First, I wanted to introduce myself as I'm new to the group. I'm a PhD student at the University of Maryland working in pharmaceuticals (highly aligned with Sebastian's work) and since my thesis/interests revolve around both visualization/application development and deployment I'm going to take a crack at those areas.
> >
> > Since this group will likely turn into alpha/beta testers a couple things that would be extremely helpful to me are if people could chime in on the following:
> >
> > a) current/desired dev environment (eg rstudio + shinystan vs r-repl+emacs vs ipython-notebook+pystan etc)
>
> Dev for what? I do C++ almost exclusively in emacs and the shell
> with a combination of make and Python scripting.
>
>
Dev for modeling/projects in stan - the question comes for what should be available in default environment(s). My thought would be that people would have access to a couple different pre-configured environments (plus shell access for customizations). As you mentioned below, this initially would be for simple models/teaching purposes.
> > b) project resource consumption - especially upper range of RAM consumption and range of model run times
>
> Upper range for whom? We have people running models that take tens of
> gigabytes and run for a week. But someone like Andrew moves to approximate
> methods if something's going to take a week to run with full Bayes.
>
> Building from source can take up to 8 GB due to all the templating
> in the parser (stanc).
>
Looking for what "reasonable" cloud-stan resource allocation would be. Having stan inside a docker image pre-compiled should negate the need during use to allocate that kind of memory, however want to get an idea of what ranges of RAM people run into during day-to-day modeling. I appreciate that this will be highly variable, but want to get a feel for the distribution if possible.
> > c) use case - transient (temp environment to be destroyed after logging off) vs day-to-day driver (maintain results/projects/etc).
>
> I don't understand the question. I don't log into any environments
> (not counting my notebook, GitHub, and our web host for static web
> pages, which just switched to GitHub).
>
transient would be a "one"-off notebook environment, so to Andrew's request in another thread, it may be forking a model from github, running it, then being ok with the environment being destroyed after downloading results. Another use case could be for testing, where a model could be run, the output collected, then the environment destroyed and resources re-allocated.
I don't know if that is much of a use case, or if everyone would 'always' want to be able to keep projects around long term.
> > d) If day-to-day use, what are ideal sync scenarios (dropbox vs github vs direct upload/download, etc)
>
> Synch for what? We use GitHub for everything Stan
> development related. Those who are shy often work in their
> own sandboxes, but most of us work on the stan-dev GitHub
> account in various repos.
Sync for modeling/results. Eg after running a model, and getting output, say in an ipython notebook. How would you (want to) backup said notebook/project.
> > e) yes/no to desire to ever self-host, and if so hosting platform (internal cluster vs something like AWS).
>
> I don't know what "self-hosting" is, but we're not doing any
> kind of hosting other than on GitHub.
>
> We'd like to be able to deploy two things --- a Jupyter-like
> environment --- Allen's already nailed this, I think, for
> Python, R, and Julia. We can't afford to host it for high
> volume, but Allen found a spare machine at Dartmouth we can
> use for demos. We'd also like to deploy a GUI that's not
> based on R, Python, or Julia --- just a simple demo-type web
> app. It's not fully designed, as you pointed out to Andrew
> in previous mail.
>
> - Bob
The question comes from both a testing/documentation perspective. If people would want to host (eg on the spare machine in dartmouth) vs deploy onto AWS or personal server to have their own personal sandbox, vs only be interested in just paying for compute resources but having everything managed. It comes down to a flexibility of the implementation. Sounds like for now should keep it flexible and not tied down to a specific environment/setup.
Regarding the GUI, I absolutely agree that the GUI should be language agnostic, however (I think) we can do some 'tricks' to provide hooks to the various implementations to make it easy for people to say, do data manipulation/creation in their language of choice, but manage the run/output in the gui.
Let me know if I can clarify further!
Devin