Introduction of LBFGS breaks backwards compatibility of *Stan interface arguments for BFGS optimization

33 wyświetlenia
Przejdź do pierwszej nieodczytanej wiadomości

Daniel Lee

nieprzeczytany,
5 lip 2014, 11:02:235.07.2014
do stan...@googlegroups.com
#### Information ####

Just FYI. I didn't notice it on its way in, but here were the old arguments for BFGS, defaults in parentheses:
- double init_alpha (0.001)
- double tol_obj (1e-8)
- double tol_grad (1e-8)
- double tol_param (1e-8)

The new arguments are below. Notice, tol_obj, tol_grad, and tol_param are now gone and have been replaced. If anyone has been setting these flags in RStan or CmdStan for BFGS, the same call will not work with the current develop (v2.4.0) branch. Arguments:
- double init_alpha (0.001)
- double tol_abs_x (1e-8)
- double tol_abs_f (1e-12)
- double tol_abs_grad (1e-8)
- double tol_rel_f (1e4)
- double tol_rel_grad (1e3)

CmdStan has already been patched to take in these arguments. I'm patching RStan now.



#### Possible actions ####

I'd like some feedback on this. As of right now, I'm seeing a few paths we can take.

1. Change arguments, announce changes with next release.
    This is the least amount of work. We would just warn people that the call to BFGS is not backwards compatible between <= 2.3.0 and 2.4.0.

2. Change arguments, announce changes with next release, write deprecation warnings with old parameters.
    This would require us reactivating the old arguments and putting in a deprecation message and quitting out correctly. All things that can be done, but given the way it's set up, this will have to be done in every interface separately.

3. Change arguments, announce changes with next release, write deprecation warnings with old parameters, automatically map the old parameters to the new ones.
   This would be useful, but I don't think it's a good idea in the long run.

4. Leave old arguments, add additional arguments.
    I'm going to trust that Marcus renamed the old arguments with purpose, so I'm going to think this isn't really a good option.

There may be other options that make sense that I'm not seeing right now.

I'm leaning towards 1 out of convenience, but could be convinced to do anything else.




Daniel

Bob Carpenter

nieprzeczytany,
5 lip 2014, 13:07:025.07.2014
do stan...@googlegroups.com
I'd rather not break backward compatibility unless there's
a very strong argument for it.

Andrew's designing a Stan 3 interface that'll break backward
compatiblity --- could we wait until then to rename parameters?
Because I'm guessing Andrew's going to have an opinion about
what to call these, and I'm guessing he's not going to like
the idea of replacing "param" with "x" because it doesn't match
his notation in BDA.

From reading these arguments, I'm guessing this is the arrangement

OLD NEW
init_alpha init_alpha
tol_obj tol_abs_f
tol_grad tol_abs_grad
tol_param tol_abs_x
tol_rel_f
tol_rel_grad
tol_rel_x

My proposal is to leave these old args exactly as they are
and add three new ones with qualifications:

OLD NEW
init_alpha init_alpha
tol_obj tol_obj
tol_grad tol_grad
tol_param tol_param
tol_rel_obj
tol_rel_grad
tol_rel_param

Would that make everyone happy until we move to a backward-compatibility
breaking Stan 3?

Having unmarked defaults ("tol_obj" vs. "tol_abs_obj") is a tradeoff
on arg size versus being able to understand what the command's doing
without knowing the default measurement. But for "tolerance", I think
the usual assumption is that it's absolute and that relative is
the "marked" case.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Daniel Lee

nieprzeczytany,
5 lip 2014, 23:09:255.07.2014
do stan...@googlegroups.com
Looks like I missed some of the details in how these things were named. It looks like it's still backwards compatible.

I was confused with the renaming of arguments within BFGS / LBFGS itself, but the old matches with the new.

So, in short, it is backwards compatible, but the code for BFGS and LBFGS call arguments something else.

Bob Carpenter

nieprzeczytany,
6 lip 2014, 14:31:336.07.2014
do stan...@googlegroups.com

I think it'd be a nice bonus if they had the same config parameters,
with the same names.

That would require changing names in L-BFGS to be backward compatible
with BFGS and adding relative convergence to BFGS.




That doesn't have to go in this release, but we should work out
whether we want them to be compatible so we don't have to change
anything going forward.

- Bob

Bob Carpenter

nieprzeczytany,
6 lip 2014, 14:36:046.07.2014
do stan...@googlegroups.com

Will we ever use BFGS now that we have L-BFGS?

If we won't use BFGS going forward, then naming should be
chosen to make sense for L-BFGS without regard to how it
is now in BFGS.

I still don't think Andrew's going to like "x" for the
parameters and it'd be nice to have the names synch as much
as possible across interfaces.

If we will use BFGS going forward, it would be nice if

1. they could have the same names for arguments they have
in common, and

2. we could add relative convergence measures to BFGS.

Only 1 needs to be decided now --- we could put off doing 2.

Marcus Brubaker

nieprzeczytany,
6 lip 2014, 15:07:346.07.2014
do stan...@googlegroups.com
So BFGS and L-BFGS use exactly the same (user-facing) parameters with one exception that L-BFGS has an additional parameter which specifies the number of update vectors to keep (basically, how much memory to use).  I.E., BFGS and L-BFGS (from the user perspective) are both 100% backwards compatible and, to the extent that they have similar parameters, completely consistent.  See the manual for more info :)

The only inconsistency, which I think is what confused Daniel, is that a few of them have slightly different names in the C++ implementation structure than they do on the user facing side.  This was to make the C++ code clearer, but as I said, it didn't change the user-facing parameters.  I was very careful to ensure that the user interface didn't break, so it should be 100% backwards compatible.

I think it's worth leaving BFGS if only because it's such a minimal code-base and it will be a bit faster for small to medium sized problems.  It shares 99% of it's code with L-BFGS (including parameters) so there is very little added maintenance overhead to keep it.

In other words, regarding Bob's points, 1&2 are already done :)

Cheers,
Marcus

Bob Carpenter

nieprzeczytany,
6 lip 2014, 15:20:286.07.2014
do stan...@googlegroups.com
Nice! Sorry for just fanning the noise.

- Bob

Marcus Brubaker

nieprzeczytany,
6 lip 2014, 15:30:126.07.2014
do stan...@googlegroups.com
No problem.  When we break the user-interface next (Stan 3.0?) it would be good to rename the user-facing parameters to be consistent with the C++ parameters, as they're more accurate and descriptive, but it's not really high priority in my mind.

Also, as a side note, the way the code is written right now should make it very easy to implement *any* line-search based optimization technique.  E.G., once we get higher-order autodiff done, it should be very easy to do a second order optimization method that shares the same code base and parameters like a Newton or Truncated Newton method.  I think even a non-linear conjugate gradients should fit into the framework if someone was motivated but it's not clear that there is value to doing it.

Cheers,
Marcus

Odpowiedz wszystkim
Odpowiedz autorowi
Przekaż
Nowe wiadomości: 0