On Thursday, June 20, 2013 1:01:58 PM UTC-7, David Samborski wrote:
Thanks Brendan and Dave for the feedback. There are a lot of good points in there, so I've been mulling them a bit. I sort of see two issues at the moment. Package structure and general types (like function or how differentials fit with other packages). Here are my thoughts on each of these issues.
Package Structure
As Brendan pointed out, we should impose certain requirements on our package structure.
1) Flexibility
2) Good at one thing idiom
3) Orthogonality
4) Scalability
I would also add one more.
5) Readability (for example force.Pound instead of unit.Pound): This should be for precision and accuracy from the user's perspective.
Agreed. The trick is in balancing these requirements (especially balancing 1 and 5 can be difficult), which is why I like having this group.
For the differentiation package, I don't see a need to split it up further,
gonum/differentiation/finitedifference ( though maybe just fd)
so that later we can also have
gonum/differentiation/complexstep
I'm not sure what the right answer is, just pointing out a possible issue.
though maybe the names of ThreePoint(...) and FivePoint(...) aren't the greatest. I would suggest a shorter name if possible, and I was thinking "deriv" or "derive". I think having all the differentiation algorithms start with "derive.(Algo Name)" is probably a good idea and less confusing.
Maybe just "diff"? Deriv can be confused with a proof. Diff can be confused with subtraction, but I don't think we'd ever need a package for subtraction. Maybe there are different names in other fields, but I would vote for "Forward/Backward/Central difference", which are the names wikipedia has
For the root finding package, I'd suggest keeping it separate from optimize, though as you say, optimize will depend on it quite heavily. The main reason I say this is because I wouldn't have looked in optimize for Newton's method.
Agreed. This also matches the Matlab convention with Fzero being one tool and Fminunc being another.
That's due to the fact that I haven't worked with optimization methods much, but that's bound to happen so I think it's a good argument to make. I think as long as the orthogonality is well satisfied by how these algorithm's fit together, it shouldn't be a problem. That segs well into the next issue.
Yea, my point was more that the convensions between the two packages should match. For example, on the next point, our equivalents of Fmin and Fzero should use the same Functioner interfaces, rather than one taking a type and the other taking a function handle.
General Types
I think we should have a "general" or "utilities" package that defines some basic interfaces and types. I think Function is important, but the more I think about it, the more I think it should be an interface. Here's how I see the interfaces working.
I think something like this is the right answer as well. In the current optimization package, I have Eval() and Deriv() rather than Function and Deriv.
It is important that we be as picky as possible with minimizing function evaluations. In my experience, for problems where optimization is the bottleneck, it's the cost of the function itself that dominates rather than all the rest of the optimization code. For the problems I work with, reducing the number of function evaluations is more important than, say, simplicity and legibility of code. It's easiest to write a code that just uses finite difference to compute the derivative, but I cannot use that tool for the kinds of problems I run.
That was a bit of a tangent, but I bring it up because I'm not sure what to do about Eval() and Deriv(). On the one hand, it's good to separate them out because some codes don't need to compute the function value and the derivative, and it's good to save the computation when possible. On the other hand, some codes share significant computational overlap between computing the evaluation and the derivative, and so calling both at once would be more efficient that calling each separately. There are tricks that can be done to avoid this, and I think we should think about how those tricks would work so that there's a common assumption throughout gonum. Either that, or we could have
Evaler interface{
Eval(float64) float64
}
Deriver interface{
Deriv(float64) float64
}
EvalAndDeriver interface{
EvalAndDeriv(float64) (float64, float64)
}
It's not clear to me at the moment what the best way forward is.
This becomes easy to add types since you need to satisfy the interface, then you can use any algorithm you choose that requires that interface. It's trivial if you already have your function and derivative (should already be defined). There is a bit of a circular declaration issue. util declares the interfaces, differentiation depends on the interfaces, and util wants to declare types using differentiation. But I think that's solvable by moving the interfaces into a sub directory.
As for how algorithms will store their parameters before computation, I think we should define structs that contain parameters like step size. One of the things that separates go is it's ability to do concurrent computation. We should aim to have our solvers be self contained such that they can be passed to different goroutines to be called. My example above doesn't do this, but I intend to add it to my code in the previous post.
Yep. It's also nice because it allows default arguments
CentralDifference(Evaler, Settings)
and Settings could be got from
diff.DefaultSettings() *Settings
That way specific arguments can be adjusted by the caller as needed.
For optimize and matrix, I probably don't have much to add. For optimize it's because I don't have much experience, and for matrix I think others generally have it covered (though I agree a basic package should be added to gonum).
If there was anything I didn't cover it was me who missed it. I think all the points were valid, it was just a lot to cover. I probably won't work on this much over the weekend, but I might check in later today. The weekend is a wash for me, but I'll get back into it next week.
That's the problem. So many things to do, so little time to do it.