Can anyone help me with this? I have not had any luck with Bagel framework so far.
Posting the error message FYI.
13/11/22 22:36:19 INFO scheduler.DAGScheduler: Stage 15 (combineByKey at Bagel.scala:78) finished in 0.190 s
13/11/22 22:36:19 INFO scheduler.DAGScheduler: looking for newly runnable stages
13/11/22 22:36:19 INFO scheduler.DAGScheduler: running: Set()
13/11/22 22:36:19 INFO scheduler.DAGScheduler: waiting: Set(Stage 14)
13/11/22 22:36:19 INFO scheduler.DAGScheduler: failed: Set()
13/11/22 22:36:19 INFO scheduler.DAGScheduler: Missing parents for Stage 14: List()
13/11/22 22:36:19 INFO scheduler.DAGScheduler: Submitting Stage 14 (FlatMappedValuesRDD[117] at flatMapValues at Bagel.scala:220), which is now runnable
13/11/22 22:36:19 INFO scheduler.DAGScheduler: Failed to run foreach at Bagel.scala:237
Exception in thread "main" org.apache.spark.SparkException: Job failed:
java.io.NotSerializableException: <masked class name>
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$16.apply(DAGScheduler.scala:670)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$16.apply(DAGScheduler.scala:668)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:668)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:376)
at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)