Hello Huahai!
I appreciate the kind words. :) I'm not a Spark user, but my understanding is that it's extremely fast, offers batch and streaming via mirrored APIs, and presents a functional interface to express computation. Onyx diverges in its aggressive use of data structures to express computations as maps and vectors, rather than expression as functions over collections. It also builds batching operations on top of streaming operations. I believe Spark does it the other way around. Also, Spark is much faster than Onyx. That's a result of their team being more talented than me. :P
I'm trying to accomplish two things with Onyx. Primarily, I'm trying to rip apart all of the different things that contemporary distributed computation frameworks do, and put them back together in a composable manner. That's really what the heart of my talk was about, and I'm going to be blogging about this a lot in the next few weeks. The critical thing to take apart is the structure of the computation - as a simple data structure! I don't know of a lot of frameworks that do this.
The second thing that I'm trying to do is get reach to the browser. When you describe your computation as data, you can create it in JavaScript - something a lot of customer solutions need at the moment. Couple that with using regular functions to describe your computations, and you get the magical ability to use something like Cljx to cross-compile and do computational sampling in the browser. I think Clojure has the raw power to be a serious competitor in large scale data processing, given these goals. If not Onyx, something else for sure.
So that's what I'm trying to do. Time will tell if any of this is a good idea. :P