--
You received this message because you are subscribed to the Google Groups "clojure-cortex" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-corte...@googlegroups.com.
To post to this group, send email to clojure...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure-cortex/6bca2423-74cd-4a15-b50b-cf1456c4b2a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Christian,This is a great response and there's a lot of good info to mine here, so I'm sure I'll be re-reading it. I have two main follow on questions, but first some context. I'm not associated with ThinkTopic or Cortex any more but I am very interested in solving this problem. I think you're correct that there's not a lot of design-first work in Cortex and that the requirements it's trying to address are not clearly enumerated at present. That aside, I'd rather not dive into the minutia there but peel apart one of the assumptions underlying your post.
It seems like you're presenting an either/or framework design which assumes the approach will be derivative of other frameworks, and I lean strongly in the TensorFlow rather than the Torch/PyTorch direction on the either/or. That is, I'm of a similar mind to Kovas and his post here on TensorFlow's design tradeoffs: https://hackernoon.com/ml-framework-design-reliability-composition-flexibility-9314f72d2c73 -- I know that researchers are not big fans of TF's tradeoffs, but researchers in general are not trying to solve the same problems that software engineers are when they wire machine learning into production systems, or stick it in embedded devices, etc. and Clojure is to me a language heavily tilted that way as well - e.g., shipping working software > algebraic data types.
So question (1) is, re: "if we do this by default, we lose all the flexibility needed for experienced optimization people to build their own optimizers." -- is there some better middle ground? Is it possible we could be better at shipping models than PyTorch and better at allowing custom optimization, etc. than TensorFlow, with Cortex or whatever other deep learning solution turns out to be the best for Clojure?
And question (2) is, what exactly is the appeal for you in doing this in Clojure? Just the language design? Some lispy aspect of generating code, etc.? I'm about at the point personally where I don't think a small group of clever and conscientious hackers in Clojure is going to keep up with the momentum in the Python ecosystem and the 100x+ development effort there (or the expanded ecosystem which takes in the various C/C++, Lua, maybe Julia work as well), and if Cortex or w/e is just supposed to be like TensorFlow or like Torch, or even TensorFlow bundled with Keras I'd much rather put that effort personally (and make the case for) getting a viable Clojure on CPython, like we have for JavaScript today, and which gives front end Clojure devs reach to React, Electron, etc. and would give ML devs access to everything.
The ideal case for Clojure (at least the present JVM instance of that language) would be that you had a clear answer to (2) that you could hoist into a better solution for (1) than just following TF or PyT/Torch. The case you make for how to approach the problem, i.e. discarding the Clojure abstractions that get in the way of perf, does not really sound like a compelling case to go to all the trouble to build the framework in Clojure in the first place. If it's just going to boil down writing Clojure like its C/C++, well I can get things done in both TensorFlow, PyTorch, and hell throw Caffe2 and MXNet in the mix and solve interesting problems rather than dig down the YAF rabbit hole.
Christian, Ben -This is a good discussion so far so let's keep it going:1. First, Christian, neanderthal is written basically as a datatype-specific core.matrix backend. It does not allow me to, for instance, create a generic buffer of data and put a banded matrix on it at one point and a dense matrix on it at another point. It is also written with some significant java bindings similar to how the existing core.matrix backends are implemented. With the tensors I can create a floating point buffer and associated a description to it in an ad-hoc manner which is a far more clojure-esque way of doing computation than creating a strided matrix java class and then implemented a set of interfaces. Furthermore with the cortex tensors it is easy to schedule computation in various execution environments. So neanderthal gives me as the engine writer far fewer options to do clever things. Please note how the cortex tensors are built the distinction between the heavy data buffers which have no definition at all (aside from device, datatype, and size) and a description mechanism that tells the system how to interpret those buffers. Here is the completely implementation of transpose in cortex:
This re-works the strides and the shape and productions a new tensor which shares the heavy buffer with the original but has a different description. Note that I can take the same buffer and assoc different unrelated descriptions to it and that is just a clojure assoc operation; it isn't creating a subclass of the engine of the implementation of the this or the that. Not also that it is a completely datatype independent operation, meaning I can operation on floats, or bytes or shorts and this doesn't change.Here is the definition of transpose in neanderthal:This delegates to a backend-specific implementation which means you have to go engine by engine to get to what is happening because the types of the underlying system actually have to change; it is a far less clojure-esque way of doing things. This is less power in the user's hands, not more.
As far as the dynamic kernel loading goes, both systems load kernels dynamically as you have to for cuda, openCL. I don't understand where you are going here and I stated that the cortex tensors are in fact designed to be compatible with openCL. Whether I personally have the time to implement this or not is another question.
Basically, we have designed the tensors for cortex to enable everything that we need and nothing else has those features in the clojure ecosystem. They implement everything required for a what our NN engine supports in far *far* fewer lines of code (and interfaces and layers) than any other implementation and beside that they can operate on bytes, shorts, integers, etc. So let's at this point pause the conversation on using neanderthal or cortex tensors as I do not believe it is actually that important of a discussion at this point nor do I see it resolving.
The benefit of investing in nnvm is that you have offloaded a large amount of very difficult work, fundamentally. You also have a simpler way to run the inference models of other systems in clojure, open the door of running models trained in clojure in contexts that potentially do not support the JVM in the first place. NNVM is still lower level than tensorflow so there is that aspect also. It is basically a low-level common ground where we can reuse work efficiently done in other ecosystems. Having written many importers and imported many models into clojure for cortex I find this appealing as I find the possibility of running complex models in embedded environments from a language other than clojure.
The entire above discussion I feel is actually is independent of your points A,B,C.A - Choose a good set of primitives. These seem to me to be implementation independent; nothing to do with neanderthal, nnvm, TF, or cortex save for the fact that I would probably carefully consider the TF or caffe2 primitives as a reference set.B - Build a transformation system that transforms a graph of primitives from A -> a graph of primitives A' potentially encoding runtime information (which branch of a control flow statement or something like that) discovered during A.C - A high level framework that allows people to be ignorant of A,B for most use cases.
I think what you are arguing is that you would like to be able to add to the set of primitives in an ad-hoc fashion. You are concerned that relying on the primitives exposed by nnvm will artificially limit your ability to add to A,B in ad-hoc manner and I can certainly sympathize with this.
I am not as concerned with this as I am with the more practical abilities to train/infer efficiently and reuse existing models along with export new models for other systems.
Is this an accurate description of where the conversation lies, excluding the arguments around which exact implementation to use to write a backend the the primitives in A in?
Hi Christian,
Today I can train a model in TensorFlow from Python, export the graph, optimize it through multiple steps (freezing to constants, removing gradients, in-place substitutions, emitting XLA code), etc., and export it in a format I can load in a microservice, monolith app, phone, w/e for inference. It supports all the expression primitives I need to implement, well, any paper I've had an interest in since it came out, and which I would've run (with success) in Theano before. Which has admittedly not been anything with heavily customized optimizers, but does include all kinds of wacky loss functions and data flow models through the network. So it's a bit silly to talk about how one can theorize that it shouldn't have properties it clearly has, like being cross platform or deployable, etc. Moreover I can do tons of stuff for real (FCNNs, non-trivial RNN/CNN hybrids, loss functions arbitrarily assigned to layers, etc.) which are mostly not realized in the Clojure ecosystem (a few working things in Cortex, hard-won by Chris N. and others at TT) - nor are their decent documented examples in DL4J, etc. in the Java ecosystem. Many of these things are already commodity level drop-in training with TensorFlow, e.g.: https://github.com/tensorflow/models/tree/master/research/object_detection
The Clojure data science/ML ecosystem has tolerated a lot of untested discussion about how we can just hit the same fast perf targets, or Neanderthal will wrap this and core.matrix will wrap that and it will solve all our problems. We're several years in without much traction in that direction. These, for example, seem to be the CUDNN bindings in Neanderthal you're talking about: https://github.com/uncomplicate/clojurednn -- I totally agree that Clojure is built for stability and its deps, etc. are unlikely to break in comparison with Python's, that containerization is putting a band-aid on the problem, etc. but again you're hand-waving re: native lib deps. If it's CUDNN you want, this is generally one of the most painful points of setup, whether you're on the JVM or in Python, and there are tons of little specific bells and whistles you want to go along with the native libs, specific to your platform, architecture, w/e if you want to push it for all you can. Which is, again, as Chris N. mentions, something you have to take seriously if you want to train in hours instead of days or weeks.
I totally agree that all the TF stuff is just a local optimum, a snapshot of today! And I'm not trying to come across as overly critical here, I love Clojure and would prefer to work in it over Python, even when I'm "just training" a model. But no one will get to that point if they don't take the engineering problems in this space seriously. Assuming, for example, the engineers who built TensorFlow at Google are just clueless simpletons who have never felt the joy of functional programming and built a silly system that doesn't solve real problems is a recipe for disaster. It may be overcomplicated, it may be a local optimum, but there is a hard problem behind it and a reason they landed where they did, and if you can't enumerate the pieces (without hand-waving or glossing over some of the hardest problems, like pushing performance to the max and not assuming you can let it slack 10% here, and 5% here, and 22% over here, and still train a net), and just shoot from the hip when proposing alternatives, you're unlikely to do any better yourself.
Re: interactivity in Python, it's amusing because I agree with you! But no one in the python ecosystem would! The ad hoc command line-ish ness of Jupyter notebooks and IPython at the terminal seemed OK to me before I started using a language with a real REPL, wired into my editor or IDE or w/e for eval/re-eval versus dev, etc. I don't think the average Python developer would think there's much difference between that and using %%autoreload and would probably yell back that you can't have real interactivity in Clojure because the lack of objects means you can't type pd.DataFrame(x). ... and then have it complete all the possible methods for you :) It's true that the perf comes from native BLAS libraries, etc. which is why the Python ecosystem leverages them heavily and more seamlessly than in the JVM experience, (really, no one's been bitten by native bytebuffers or off-heap memory before). And besides, spark, hadoop, etc. give you Python APIs, sklearn will release the GIL, and distributed deep learning and parameter averaging have not proved that interesting in general, etc. etc.
Again: I am not on team Python! It's not ahead because it's objectively good -- it is a mediocre language that I find more and more painful to use the better I get as a generalist dev and Clojurist. But programming language adoption and ecosystem health has a lot more to do with social dynamics, marketing, least cost paths taken by herds of humans, etc. than the direction taken by a few rational optimizers. JavaScript is a case in point, lots of better languages have failed to replace it, it itself is morphing into something different, and people have come up with ways to transpile away its pain or inject themselves into its ecosystem.
Re: Tim Baldridge's previous efforts, yes Python's own mechanics are not great if you're targeting it as a general purpose language, but he was primarily interested in possible performance gains from a tracing JIT, provided by PyPy, and this was always highlighted in the readme, rationale, etc. Being able to take the Clojure-in-Clojure stuff from Cljs as a starting point for persistent data structures, etc. too was not an option during those early efforts. Getting access to CPython for numerics, ML, etc. is a problem with a different shape, and to me not that different from the specific scope of Cljs. That said, I'm just justifying my interest here, less concerned with winning anyone over, until I have something more to show/talk about here.
Also, all of that said, if your goal is to make things work in Clojure, I think getting some consensus between where you are with autodiff and the work (more battle tested w/real perf problems than most stuff in the Clojure ecosystem) that Chris N. has put into the tensor/compute stuff in Cortex.
--
You received this message because you are subscribed to the Google Groups "clojure-cortex" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-cortex+unsubscribe@googlegroups.com.
To post to this group, send email to clojure-cortex@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure-cortex/b224eebc-e74f-46dd-a11e-d11bf63a75c5%40googlegroups.com.
I've been following this conversation with great interest! I'd pay money to watch you guys chat about it on Slack ;)Maybe it's worth scheduling a time to have a chat on there?
On 23 October 2017 at 04:04, <ml...@topiq.es> wrote:
Great! I have just seen this post. I will reply here as soon as I have some time.
On Saturday, October 21, 2017 at 12:41:27 AM UTC+2, Chris Nuernberger wrote:Hi ChristianI took some time today to create a couple design documents that outline some of the key concepts:The compute document is first and the tensor document is second and builds on the concepts in the compute document. I was hoping you could find some time to peruse them because they outline the larger picture regarding the cortex compute abstraction and how the tensor abstraction specifically relates to this.
--
You received this message because you are subscribed to the Google Groups "clojure-cortex" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-corte...@googlegroups.com.
To post to this group, send email to clojure...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure-cortex/b224eebc-e74f-46dd-a11e-d11bf63a75c5%40googlegroups.com.
--Alistair