Hi,
I'm trying to get the scala protos to work in a spark env. The protos are not generated by scala, but i have access to the raw SequenceFiles for them.
I was interested in using this lib to allow the nice "protoToDataFrame" niceness that makes complex messages much easier to deal with. However i'm pretty much getting a slew of java.lang.AbstractMethodError errors whenever i use the Scala objects (the java objects work fine, but don't have the nice implicit dataframe converters of course).
version info:
spark: 2.3 (tried 2.2 as well)
scalapb: 0.70 (have tried 0.7.1 and 0.6.7 as well)
scala: 2.11.12 (have tried 2.12.5 as well)
protobuf lib is shaded in the assembly jar of generated objects (otherwise nothing works)
For example.
import com.thing.message.{Message => ScalaMessage}
import com.thing.{Message => JavaMessage}
val byteBlob: Array[byte] = xxxx
// works just fine
val jdecode = JavaMessage.parseFrom(byteBlob)
// also seems to work
val sdecode = ScalaMessage.fromJavaProto(jdecode)
// fails
ScalaMessage.parseFrom(byteBlob)
```
Caused by: java.lang.AbstractMethodError: com.thing.message.Message$.parseFrom(Lcom/google/protobuf/CodedInputStream;)Lscalapb/GeneratedMessage;
at scalapb.GeneratedMessageCompanion$class.parseFrom(GeneratedMessageCompanion.scala:204)
at com.asapp.schemas.generic.message.Message$.parseFrom(Message.scala:101)
at $anonfun$1.apply(<console>:38)
at $anonfun$1.apply(<console>:38)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$
class.to(TraversableOnce.scala:310)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1358)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1358)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
val byteRDD = .. some rdd of bytes ...
// fails w/ the same message
val objRDD = byteRDD.map(r => AsappMessage.parseFrom(r._1)).take(1)
// does ok
val objRDD = byteRDD.map(r => AsappMessage. fromJavaProto(JavaMessage.parseFrom(r._1))).take(1)
// fails
spark.sqlContext.protoToDataFrame(objRDD)
```
java.lang.AbstractMethodError: com.thing.message.Message$.javaDescriptor()Lcom/google/protobuf/Descriptors$Descriptor;
at scalapb.spark.ProtoSQL$.schemaFor(ProtoSQL.scala:26)
at scalapb.spark.ProtoSQL$.protoToDataFrame(ProtoSQL.scala:15)
at scalapb.spark.ProtoSQL$.protoToDataFrame(ProtoSQL.scala:20)
at scalapb.spark.package$ProtoSQLContext$.protoToDataFrame$extension(package.scala:11)
...
```
// tried a more manual approach to get the Struct Spark schema but run into the "shading" issue''
import org.apache.spark.sql.types._
import collection.JavaConverters._
import scalapb.spark.ProtoSQL
StructType(JavaMessage.getDescriptor.getFields.asScala.map(ProtoSQL.structFieldFor))
```
<console>:43: error: type mismatch;
found : com.google.protobuf.Descriptors.FieldDescriptor => org.apache.spark.sql.types.StructField
required: shadeproto.Descriptors.FieldDescriptor => ?
StructType(JavaMessage.getDescriptor.getFields.asScala.map(ProtoSQL.structFieldFor))
```
Any help or thoughts that i may be missing would be most helpful.
Much thanks
bo