I haven't done anything with Onyx since many releases ago. I was wondering if there is any prescribed methodology, if at all, to do database style joins (inner, outer) on data in Onyx or anything on the horizon that could be used to perform joins (ex: datalog). I believe I read in a post from awhile ago that it was not possible to do joins in Onyx, however given some of the recent updates, I am wondering if that changed.
For instance, I was thinking the new windowing, triggers, and aggregation functionality could be used together to achieve at least some kind of results. Another crude solution is to do queries/joins in the tasks but it is my view that doing any kind of queries in a processing step whether in Onyx, Storm, or others is generally a bad idea.
For now I am performing joins outside Onyx, but I want to keep things contained in Onyx as much as possible. The main reason is that the minute I have to use another framework, not only do I have more moving parts, but I generally lose a lot of the dynamic data-driven approach that is the reason I am using Onyx in the first place. Moreover, this forces me to break up workflows at points where I need to join data which introduces all kinds of extra overhead.
Anyway, other streaming frameworks do have full or limited support for joins as I am sure you aware. For example in Spark, you can do this a few ways (pair RDDs, data frames, etc.) -
Please let me know if you have any recommendations about joins in the meantime or if there is anything in planning.
Thanks once again for building Onyx and I just want to say I am very pleased with the last release, you really delivered exactly what I needed. Keep up the good work.