I'm working on a wrapper for Apache Spark in Julia. Essentially, workflow looks like this:
1. Julia driver creates instance of `JuliaRDD` class in JVM and passes serialized Julia function to it.
2. Spark core copies JuliaRDD to each machine in a cluster and runs its `.compute()` method.
3. `JuliaRDD.compute()` starts new Julia process and invokes function `launch_worker`.
4. Launched worker reads and deserializes original function and applies it to a local chunk of data.
So workers are not managed by any kind of Julia's `ClusterManager` and in general know nothing about definitions in the main driver program. The only 2 pieces of information they have are serialized function and data to process.
My question is: does Julia's serialization produce completely self-containing code that can be run on workers? In other words, is it possible to send serialized function over network to another host / Julia process and applied there without any additional information from the first process?
I made some tests on a single machine, and when I defined function without `@everywhere`, worker failed with a message "function myfunc not defined on process 1". With `@everywhere`, my code worked, but will it work on multiple hosts with essentially independent Julia processes?