In general there are four "writers":
Info writer: key=value for per-run stuff, configuration and other single values (e.g.-adapted step size) always output complete configuration, is a bug otherwise, should be usable for reproducing a run, is a bug otherwise
Progress writer: key-value with log level for progress messages, respects log4j log levels
Output writer: csv-style output file, estimates and related diagnostic quantities
Diagnostic writer: csv-style diagnostic file, internal versions of parameters and related diagnostic quantities, potentially respects log4j log levels
If anyone has comments at this stage let me know.
Krzysztof
You and Daniel wrote the current inventory! :) K
I agree that it's not a big deal but after I spent a few minutes
with it I think the config(key:value)/messages(key:value)/output(csv)/diagnostics(csv) division
is pretty straightforward...
> What I do think we want is a way to round trip all of
> the following in a modular reader/writer fashion:
>
> 1) single parameter values
>
> 2) sequences of parameter values, perhaps named
> (e.g., MCMC init value vs. draws; VB init vs. draws vs.
> best fit)
Could you explain this a little more? I'm not clear on the
naming. I was thinking that the interfaces could get from messages
or config where init/warmup turns into samples.
>
> 3) vectors and matrices (if we don't want to make this
> just an instance of parameter values)
Honestly in the config/messages I was thinking of just writing
n_dim, n_rows, n_columns, n_..., ..., value1, value2, .... and
providing a function for the interfaces to use to read it back....
> 4) key-value pairs for config information (I see that you're
> structuring keys rather than allowing values to have
> structure --- just pointing that out, I can see arguments
> either way)
Yeah, I went for structuring keys since it keeps the C++
end simpler.
>
> 5) specific relevalnt types, like time
>
> It would be possible to put all this together into
> some kind of structure if we can stream it all out in
> a reasonable way.
>
> I think that's what's being targeted in the "BIG DESIGN DECISION"
> section, but I'm not sure.
I think Daniel wrote "BIG DESIGN DECISION"... I actually can' ttell for sure
what it meant...
>
> There are some nitpicky little details here and there. The timing
> estimate, for example, isn't really part of the config for HMC.
> Nor are the adapted step size or mass matrix, though you'd need to
> save those if you wanted to restart.
Yeah, 'config' is a mis-nomer and you are right that it's really
aggregating info needed ofr reproducibility.
> So what we need is a step size
> (or integration time) and mass matrix; you already have the former
> but we need to generalize to the latter in Stan 3.
> We don't need to build all of these on top of a base writer.
> So I'd be more inclined to have progress writer (or other loggers)
I was thinking that step 1 was just getting things structured and
formats agreed on and then the insides could shift around so
for a first round I was going to ignore the levels and use the
current writer framework... is that a bad idea?
> have methods .warn(), .error(), .info(), etc than have
> it use key-values with keys defined. For one things, keys are
> very slow. We also need an .if_warn(), ..., and so forth, so we
> can check what level the logger's at so we can avoid constructing
> messages that will get swallowed.
>
> I like the "sample writer" abstraction --- that's what I was talking
> about above. Then we'd replace the current CSV reader thing with
> something that modularly actually used a proper CSV reader (that is, one
> that only reads CSV files and ignores comment lines, not one that
> requires specific comment content).
Yep, on the same page here.
Krzysztof
Is there any reason to do this so that R/Python/CmdStan interfaces can't read each
others' output?
>
> >> What I do think we want is a way to round trip all of
> >> the following in a modular reader/writer fashion:
> >>
> >> 1) single parameter values
> >>
> >> 2) sequences of parameter values, perhaps named
> >> (e.g., MCMC init value vs. draws; VB init vs. draws vs.
> >> best fit)
> >
> > Could you explain this a little more? I'm not clear on the
> > naming. I was thinking that the interfaces could get from messages
> > or config where init/warmup turns into samples.
>
> I meant that init and draws are the same thing --- just
> mappings of parameters to values. But there are multiple
> draws, so we probably don't want to repeat the naming
Agreed, we just need to put the info somewhere that lets you
calculate which rows are inits, which are warmup, which are samples/estimates
I think we already had one go-around on this with Michael that came down
to a good interface:
1) https://github.com/stan-dev/stan/wiki/Logging-Spec
2) https://groups.google.com/forum/#!searchin/stan-dev/Betancourt%7Csort:date/stan-dev/YJvyzVTK_YQ/fzNhwbg4BgAJ
I'm guessing what you want is a spec that outlines the signatures for the calls
to the loggers? I can do that on the wiki and go from there.
Krzysztof
>
> - bob
I think we already had one go-around on this with Michael that came down
to a good interface:
1) https://github.com/stan-dev/stan/wiki/Logging-Spec
2) https://groups.google.com/forum/#!searchin/stan-dev/Betancourt%7Csort:date/stan-dev/YJvyzVTK_YQ/fzNhwbg4BgAJ
I'm guessing what you want is a spec that outlines the signatures for the calls
to the loggers? I can do that on the wiki and go from there.