------------------------------------
In maximum-likelihood parameter estimation, one typically minimizes negative log-likelihood of the data. In some important cases (for example, estimating parameters of a Gaussian mixture), the objective has the following form:
L = -log(sum(exp(x))) = -log(exp(x[1]) + exp(x[2]) + ... + exp(x[N])), where x[1], x[2], ..., x[N] << 0. Evaluating this expression directly is very likely to cause underflow for big negative x[_]. The common numerical trick for dealing with this is to scale the exponents so that the maximum exponent equals to 1. This leads the 'logsumexp' function:
logsumexp(x) = log(sum(exp(x - max(x)))) + max(x).
There are two problems with the max() function:
1. It does not exist in CasADi
2. It is not-smooth, therefore not suitable for using in an objective function.
It is possible to avoid using max() by introducing slack variables. There will be one one slack variable and corresponding constraint for every logsumexp() in the objective function. If the objective is a sum of many logsumexps (= log of a product many sums of exponents), which is sometimes the case, the number of slack variables and constraints grows dramatically and kills the performance.
PROPOSED SOLUTION
---------------------------------
I propose to introduce logsumexp() and a new special function in CasADi. It is a smooth function, but the implementation should employ the numerical trick to avoid underflow. The derivative of logsumexp() can be nicely expresses via logsumexp() itself: