The optimization command tests Daniel wrote has some known correct output for a simple model so I guess I could do that but with HMC it's more involved. Any unstated (or stated) policy on how we want to do this? I'll have some more time for writing tests next week. I have some current ones for the HMC commands but it's only the scaffolding for the tests---I didn't push because I wasn't sure what output to test against.
Krzysztof
Could also add some tests of simulating from basic distributions (normal and uniform?) and comparing with theoretical quantiles to a few decimal places of accuracy. That would catch things like the problem we had with 2.10 and shouldn't involve too much computation.
K
No, it just needs to be done better in the future.
I think everyone's on the same page.
- Bob
> On Aug 27, 2016, at 11:07 PM, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
>
> Daniel's test of the optim service command depended on some exact output so that's what I went with. Does this need to get addressed at a meeting?
>
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
I don't think every character we output should be considered
part of the API. That's what I object to here. If we
test exact values, we screw up with different compilers.
If we test spacing, etc., then minor typo changes have massive
repercussions on testing (this has bitten me more than once
in the past).
I think what we want to do is make the output testing more
unit-like as I've tried to do with the parser. That is,
don't test the entire stream of warning messages from the parser,
just make sure that a unit of that output (like a warning message)
shows up. That can be tested for exact text, because it is more
modular and changes have local tests.
- Bob
> On Aug 28, 2016, at 3:42 PM, Daniel Lee <bea...@alum.mit.edu> wrote:
>
> +1. No, it doesn't and shouldn't be done the way I did it, but... the reason I did it that way was because there aren't tests all the way down, so the only way I knew how to prevent behavior changes in an automated test framework was to capture everything and make sure everything looks the same after the change. (It allowed me to move faster than anything else I knew how to do.)
>
> Ideally, we would have tests at the lower levels. At this level, we want to make sure the model gets called, the writers get called, etc. but not necessarily with any particular output. We should be able to mock them and then check that the mocks get called.
>
> With that said, we should have another set of tests for regression testing. If this is truly the API, then we should have some sort of regression testing on the API in addition to unit tests.
>
>
> That wasn't very constructive, so I'll try to add something that we can do:
> - add unit tests. Just check that the model gets called, the writers get called, inputs are respected, etc. The tests should live here: src/test/unit/services/**
> - add regression tests. We should check that behavior is the same, so same output, etc. So the snapshots. Add a new folder for these: src/test/regression/services/**. One thing to consider is that we should have an easy way to serialize the output for testing. That'll make updating these easier. I still think we should be aware of when the behavior changes, but dealing with it shouldn't be too hard.
>
> Thoughts? I think we're all on the same page here. Michael's tests should go somewhere, but they're also bigger than unit tests and not quite regression tests.
>
>
> Daniel
>
>
> On Sat, Aug 27, 2016 at 5:18 PM, Bob Carpenter <ca...@alias-i.com> wrote:
> No, it just needs to be done better in the future.
> I think everyone's on the same page.
>
> - Bob
>
> > On Aug 27, 2016, at 11:07 PM, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
> >
> > Daniel's test of the optim service command depended on some exact output so that's what I went with. Does this need to get addressed at a meeting?
> >
> > --
> > You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
On Sunday, August 28, 2016 at 9:42:05 AM UTC-4, Daniel Lee wrote:
> That wasn't very constructive, so I'll try to add something that we can do:
> - add unit tests. Just check that the model gets called, the writers get called, inputs are respected, etc. The tests should live here: src/test/unit/services/**
> - add regression tests. We should check that behavior is the same, so same output, etc. So the snapshots. Add a new folder for these: src/test/regression/services/**. One thing to consider is that we should have an easy way to serialize the output for testing. That'll make updating these easier. I still think we should be aware of when the behavior changes, but dealing with it shouldn't be too hard.
Thanks Daniel, the overview and more specific instructions are very helpful. I'll modify what I have for unit tests along these lines. I'll see if I can make sense of where we have and don't have tests deeper down. It would really help to have this kind of stuff in a wiki.
> Thoughts? I think we're all on the same page here. Michael's tests should go somewhere, but they're also bigger than unit tests and not quite regression tests.
I agree with the scheme you have set out. Michael's tests could just go in src/test/statistical_validity/* (if we're ok with verbose) and I think they require a directory for output as well. I took a look at the stat_valid_test branch and it doesn't look like it would be too much work to update it to this branch. That seems like it should be its own pull request though. If this branch can go in with unit/regression tests it'll be a win all around.
> On Aug 28, 2016, at 4:20 PM, Daniel Lee <bea...@alum.mit.edu> wrote:
>
> On Sun, Aug 28, 2016 at 9:51 AM, Bob Carpenter <ca...@alias-i.com> wrote:
> I don't think every character we output should be considered
> part of the API. That's what I object to here. If we
> test exact values, we screw up with different compilers.
> If we test spacing, etc., then minor typo changes have massive
> repercussions on testing (this has bitten me more than once
> in the past).
>
> +1
>
> The one problem we have (and this isn't a hypothetical problem) is that some of the interfaces are parsing through the output by character / lines and using that. We found out that StataStan does this and so does RStan for some of its parsing of input.
Can't we just let the upstream interfaces fend for themselves?
What output's being parsed character by character by the interfaces?
I thought RStan worked by callback now and didn't take the text output
from the CmdStan routines.
I'm less concerned about StataStan and other command wrappers. Rather
than grabbing out text output, they should be taking the CSV output
and using a CSV parser, and for other output, just dumping it through
as-is. At least I think that's what they should be doing, but I havne't
thought it through deeply. We really need to make the output of
CmdStan less monolothic, such as putting the config output in standard
formats like CSV or JSON (so they can get config, get adaptation, and
get results).
OK, maybe we do need to discuss this at the meeting. All of these
dependencies on unpublished and undocumented parts of the API is
worrying. It basically locks us into the current broken way of doing things.