Hi Dylan,
Thanks for trying this out. I'm breaking this message into some points. Here's the summary:
1. GryoRegistrator needs to copy over several registrations from GryoSerializer's constructor that I thought were safe to drop. Clearly, some of these are still required. This is a bug -- when it is fixed, you should not have to register CompactBuffer[] in your IoRegistry.
2. If you have custom types (besides CompactBuffer[]), consider spark.serializer=IoRegistryAwareKryoSerializer
3. I can't tell what's going on with the compile errors and would like to see the source you used before commenting on those
Here's the longer version:
1. GryoSerializer registered a slew of scala runtime and Spark types. I deliberately dropped all of these in GryoRegistrator, thinking that KryoSerializer registered all scala runtime or Spark types that TP jobs wolud need, so that GryoRegistrator could just carry over every TinkerPop registration and function equivalently. I was wrong. For instance, GryoRegistrator registers both CompactBuffer and CompactBuffer[]. CompactBuffer is registered in KryoSerializer, but CompactBuffer[] (the array type) is not. I think this means that TinkerPop is passing around CompactBuffer[] in the course of running jobs (not sure where/why), whereas Spark does not necessarily need CompactBuffer[] serialization for its own internals.
Not registered anywhere:
CompactBuffer[]: NOT REGISTERED (should be registered in GryoRegistrator)
BoxedUnit: (probably) NOT REGISTERED (should be registered in GryoRegistrator)
Registered somewhere:
Tuple2, Tuple3: registered in KS (via KS calling chill's AllScalaRegistrar which calls chill's ScalaTupleSerialization which does the actual registration)
Tuple2[], Tuple3[]: registered in KS
CompactBuffer: registered in KS
CompressedMapStatus: registered in KS
BlockManagerId: registered in KS
HighlyCompressedMapStatus: registered in KS
HttpBroadcast: registered in KS
PythonBroadcast: registered in KS
scala.reflect.ClassTag$$anon$1: registered in GryoRegistrator (only known to come up testing though)
scala.reflect.ManifestFactory$$anon$1: same as last line
WrappedArray.ofRef: registered in GryoRegistrator
I'm going to make a PR that adds CompactBuffer[] and probably BoxedUnit to GryoRegistrator. I may try to see whether BoxedUnit just eluded my source reading by starting up a test Spark environment and finding a way to do something like <spark's KryoSerializer>.newKryo().getClassResolver().getRegistration(BoxedUnit.class).
2. IoRegistryAwareKryoSerializer is a subclass of KryoSerializer that looks for a gremlin.io.registry in the SparkConf and applies it if found. The reason this could not be done solely through spark.kryo.registrator is that spark.kryo.registrator does not have access to the job configuration, whereas the spark.serializer does, and in the presence of custom types, the config must be available at or before the point when serialization first occurs (because of gremlin.io.registry). This is probably the spark.serializer that you'll want to use under the new serialization setup if you have an IoRegistry.
3. Maybe I just overlooked one, but I didn't see a link to your source. GROOVY-6617 is an attention grabber, especially since TP uses 2.4.5, but I don't know whether that's actually causing the pasted error. I don't want to get into speculation about what's causing those compile errors without seeing exactly what went into the compiler, in other words.
thanks,
Dan