You may have noticed a flurry of activity in the breeze github in the last few days. I fixed or closed something like 35 tickets in the last two days. (I think mostly fixes.)
I'm finally making a Breeze 1.0 release in the next couple of weeks, marking the end of a comically long pre-1.0 stage. It doesn't have everything I originally intended for Breeze 1, but it's past time to just go ahead and release it. You can see what else I'm targeting here: https://github.com/scalanlp/breeze/milestone/5
I don't promise to get all of those in the release, but I'm gonna try on most of them.
The reason I'm cutting a release now is that I now think that the basic architectural design has reached its limits, especially in terms of API fluency and performance. This is also one of the reasons I've been kind of negligent in maintaining Breeze for the last year or two: I wasn't sure how to make the changes I want to make.
There are lots of things I like about Breeze. I like that the user-side syntax is usually pretty nice, that it's still performant most of the time, especially in terms of avoiding boxing. I like that the UFunc allows for automatic extension to Breeze Tensor types and addition of new or optimized variants of operations.
There are, however, a lot of problems with the design that make it hard to grow it further, especially in the modern "tensors and autodiff" era that we're finding ourselves in.
Modern numerical libraries seem to have five properties that Breeze lacks:
1) arbitrary arity tensors, or at least up to some fairly high arity
2) "write once, run everywhere" in terms of compute device (GPU, CPU, etc)
3) some mechanism for reification of compute graphs
4) optimization of those graphs to fuse or eliminate expressions, reuse memory, etc.
5) automatic differentiation (usually implemented using (3))
Breeze has none of these properties, and I don't think it's easy to add them in the current design. And while Breeze shouldn't necessarily be a full-fledged deep learning framework, I think most of these are necessary for being "relevant" in the current landscape. I'd be willing to punt on autodiff, but I'd want there to be at least hooks to build it on top of whatever expression reification breeze 2 would have.
In addition, I find the following problematic:
1) No good story for "named configuration parameters" in UFunc land
2) It's kind of a pain to thread through UFunc implementations if you want your code to be generic over kind of Vector or Tensor, or especially kind of numeric type (e.g. Float/Double).
3) Error messages are sometimes cryptic (not abnormal for implicit-heavy Scala libraries, but we should do better.)
4) Hard to know what combinations of UFunc/arguments are supported.
5) I'm sure other people have lots of other pain points, and I would like to hear them.
Separately it would be good to build on top of the typelevel stack (most obviously Spire) if it's possible without sacrificing too much performance.
I don't really know the answers here yet. Various thoughts I have are:
* Maybe have a single concrete DenseTensor[Shape, Type, Backend], and maybe analogous types for Sparse types? Might be hard to keep indexing efficient in this case, though we can maybe work around it with macros.
* Operations should continue to appear to be strict by default, but they should actually be intercepted by an increasingly fancy macro that destructures entire expression trees and optimizes them.
* Said increasingly fancy macro could maybe search for some kind of implicit typeclass like Expression[RefiedExpressionType] to allow easy-ish extension without touching the macro too often.
* Alternatively, the macro could maybe inline generate calls to XLA or novel OpenCL kernels or something.
* That same reification should be accessible by the client API, for supporting autodiff or generating TensorFlow XLA, etc.
* better bundling of operators so that single context bounds (or maybe implicits) can be used to write generic functions. In particular, BLAS[Backend] should be a thing.
Maybe this is all too ambitious to actually work. I'm pretty sure I couldn't do it by myself in a reasonable amount of time.
Anyway, I'd love to hear your thoughts about what you'd like to see in the future.
 Obviously we can keep adding new functions and tweaking existing functionality, but I don't think we can get the kinds of step-changes in quality that I want.
 There's actually an old PR for basically symbolic differentiation, but I think it isn't a particularly friendly API and it doesn't generate particularly optimized expressions.