Hi everybody,
I have been doing some random machine learning doodling and have always been haunted with this problem.
When I just wanna set out and try if my methodology would work, I would program in Clojure aiming at a small dataset available within one single host memory (or even just doodle R or Python. I know what's blasphemy).
When I wanna implement the results, I would usually have to re-write the whole process in Cascalog just in order to apply them in Hadoop data. I know I can still just start up to write Hadoop functions. But that's a bit overshoot since I have to run simulation study to verify the performance, correctness etc before even going into production development. Besides, having hadoop in memory to sit between my program and the computer just slows down computation.
I've day-dreamed a lot some smart ways to do prototyping and, with the flip of a switch, my functions just become easily applicable to Hadoop data. Wondering if there's any ways to do that?
I actually asked around and some one who I admire a lot suggested writing a Hadoop emulator-like wrapper for my input data. Wondering if there's an easier way? I would appreciate any input. Thanks.
Hesen