BigBench with Spark SQL: ClassCastException of Kryo

380 views
Skip to first unread message

Hailong Sun

unread,
Apr 14, 2016, 11:40:05 PM4/14/16
to Big Data Benchmark for BigBench
Hi,

I am running BigBench with Spark 1.6.  And I am using Hive v1.2.1 as the metastore. I ran into the same problems with 8 queries including q1, q2, q10, q18,q19,q27,q29 and q30. It seems to be a problem of a class casting from "org.apache.hive.com.esotericsoftware.kryo.Kryo" to "com.esotericsoftware.kryo.Kryo". The former comes from hive, the latter is from Spark. Any idea how to solve this problem? The following is some output snippet from running q1. Thanks!!

TungstenExchange hashpartitioning(item_sk_1#385,item_sk_2#386,200), None
+- TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Partial,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,count#396L])
   +- !Generate HiveGenericUDTF#io.bigdatabenchmark.v1.queries.udf.PairwiseUDTF(sort_array(itemArray#335,true),false), false, false, [item_sk_1#385,item_sk_2#386]
      +- SortBasedAggregate(key=[ss_ticket_number#346L], functions=[(hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet@4cc9d672),ss_item_sk#339L,false,0,0),mode=Complete,isDistinct=false)], output=[itemArray#335])
         +- ConvertToSafe
            +- Sort [ss_ticket_number#346L ASC], false, 0
               +- TungstenExchange hashpartitioning(ss_ticket_number#346L,200), None
                  +- Project [ss_ticket_number#346L,ss_item_sk#339L]
                     +- BroadcastHashJoin [ss_item_sk#339L], [i_item_sk#360L], BuildRight
                        :- Project [ss_ticket_number#346L,ss_item_sk#339L]
                        :  +- Filter ss_store_sk#344L IN (10,20,33,40,50)
                        :     +- HiveTableScan [ss_ticket_number#346L,ss_item_sk#339L,ss_store_sk#344L], MetastoreRelation bigbench, store_sales, Some(s)
                        +- Project [i_item_sk#360L]
                           +- Filter i_category_id#371 IN (1,2,3)
                              +- HiveTableScan [i_item_sk#360L,i_category_id#371], MetastoreRelation bigbench, item, Some(i)

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 51 more
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Partial,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,count#396L])
+- !Generate HiveGenericUDTF#io.bigdatabenchmark.v1.queries.udf.PairwiseUDTF(sort_array(itemArray#335,true),false), false, false, [item_sk_1#385,item_sk_2#386]
   +- SortBasedAggregate(key=[ss_ticket_number#346L], functions=[(hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet@4cc9d672),ss_item_sk#339L,false,0,0),mode=Complete,isDistinct=false)], output=[itemArray#335])
      +- ConvertToSafe
         +- Sort [ss_ticket_number#346L ASC], false, 0
            +- TungstenExchange hashpartitioning(ss_ticket_number#346L,200), None
               +- Project [ss_ticket_number#346L,ss_item_sk#339L]
                  +- BroadcastHashJoin [ss_item_sk#339L], [i_item_sk#360L], BuildRight
                     :- Project [ss_ticket_number#346L,ss_item_sk#339L]
                     :  +- Filter ss_store_sk#344L IN (10,20,33,40,50)
                     :     +- HiveTableScan [ss_ticket_number#346L,ss_item_sk#339L,ss_store_sk#344L], MetastoreRelation bigbench, store_sales, Some(s)
                     +- Project [i_item_sk#360L]
                        +- Filter i_category_id#371 IN (1,2,3)
                           +- HiveTableScan [i_item_sk#360L,i_category_id#371], MetastoreRelation bigbench, item, Some(i)

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:164)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 59 more
Caused by: org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 68 more
Caused by: java.lang.ClassCastException: org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to com.esotericsoftware.kryo.Kryo
at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 80 more
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Final,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,cnt#336L])
+- TungstenExchange hashpartitioning(item_sk_1#385,item_sk_2#386,200), None
   +- TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Partial,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,count#396L])
      +- !Generate HiveGenericUDTF#io.bigdatabenchmark.v1.queries.udf.PairwiseUDTF(sort_array(itemArray#335,true),false), false, false, [item_sk_1#385,item_sk_2#386]
         +- SortBasedAggregate(key=[ss_ticket_number#346L], functions=[(hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet@4cc9d672),ss_item_sk#339L,false,0,0),mode=Complete,isDistinct=false)], output=[itemArray#335])
            +- ConvertToSafe
               +- Sort [ss_ticket_number#346L ASC], false, 0
                  +- TungstenExchange hashpartitioning(ss_ticket_number#346L,200), None
                     +- Project [ss_ticket_number#346L,ss_item_sk#339L]
                        +- BroadcastHashJoin [ss_item_sk#339L], [i_item_sk#360L], BuildRight
                           :- Project [ss_ticket_number#346L,ss_item_sk#339L]
                           :  +- Filter ss_store_sk#344L IN (10,20,33,40,50)
                           :     +- HiveTableScan [ss_ticket_number#346L,ss_item_sk#339L,ss_store_sk#344L], MetastoreRelation bigbench, store_sales, Some(s)
                           +- Project [i_item_sk#360L]
                              +- Filter i_category_id#371 IN (1,2,3)
                                 +- HiveTableScan [i_item_sk#360L,i_category_id#371], MetastoreRelation bigbench, item, Some(i)

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.Filter.doExecute(basicOperators.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.TakeOrderedAndProject.collectData(basicOperators.scala:213)
at org.apache.spark.sql.execution.TakeOrderedAndProject.doExecute(basicOperators.scala:223)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:201)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:749)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:224)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:130)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange hashpartitioning(item_sk_1#385,item_sk_2#386,200), None
+- TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Partial,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,count#396L])
   +- !Generate HiveGenericUDTF#io.bigdatabenchmark.v1.queries.udf.PairwiseUDTF(sort_array(itemArray#335,true),false), false, false, [item_sk_1#385,item_sk_2#386]
      +- SortBasedAggregate(key=[ss_ticket_number#346L], functions=[(hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet@4cc9d672),ss_item_sk#339L,false,0,0),mode=Complete,isDistinct=false)], output=[itemArray#335])
         +- ConvertToSafe
            +- Sort [ss_ticket_number#346L ASC], false, 0
               +- TungstenExchange hashpartitioning(ss_ticket_number#346L,200), None
                  +- Project [ss_ticket_number#346L,ss_item_sk#339L]
                     +- BroadcastHashJoin [ss_item_sk#339L], [i_item_sk#360L], BuildRight
                        :- Project [ss_ticket_number#346L,ss_item_sk#339L]
                        :  +- Filter ss_store_sk#344L IN (10,20,33,40,50)
                        :     +- HiveTableScan [ss_ticket_number#346L,ss_item_sk#339L,ss_store_sk#344L], MetastoreRelation bigbench, store_sales, Some(s)
                        +- Project [i_item_sk#360L]
                           +- Filter i_category_id#371 IN (1,2,3)
                              +- HiveTableScan [i_item_sk#360L,i_category_id#371], MetastoreRelation bigbench, item, Some(i)

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 51 more
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[item_sk_1#385,item_sk_2#386], functions=[(count(1),mode=Partial,isDistinct=false)], output=[item_sk_1#385,item_sk_2#386,count#396L])
+- !Generate HiveGenericUDTF#io.bigdatabenchmark.v1.queries.udf.PairwiseUDTF(sort_array(itemArray#335,true),false), false, false, [item_sk_1#385,item_sk_2#386]
   +- SortBasedAggregate(key=[ss_ticket_number#346L], functions=[(hiveudaffunction(HiveFunctionWrapper(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet,org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet@4cc9d672),ss_item_sk#339L,false,0,0),mode=Complete,isDistinct=false)], output=[itemArray#335])
      +- ConvertToSafe
         +- Sort [ss_ticket_number#346L ASC], false, 0
            +- TungstenExchange hashpartitioning(ss_ticket_number#346L,200), None
               +- Project [ss_ticket_number#346L,ss_item_sk#339L]
                  +- BroadcastHashJoin [ss_item_sk#339L], [i_item_sk#360L], BuildRight
                     :- Project [ss_ticket_number#346L,ss_item_sk#339L]
                     :  +- Filter ss_store_sk#344L IN (10,20,33,40,50)
                     :     +- HiveTableScan [ss_ticket_number#346L,ss_item_sk#339L,ss_store_sk#344L], MetastoreRelation bigbench, store_sales, Some(s)
                     +- Project [i_item_sk#360L]
                        +- Filter i_category_id#371 IN (1,2,3)
                           +- HiveTableScan [i_item_sk#360L,i_category_id#371], MetastoreRelation bigbench, item, Some(i)

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:164)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 59 more
Caused by: org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 68 more
Caused by: java.lang.ClassCastException: org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to com.esotericsoftware.kryo.Kryo
at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 80 more

 

Yan Tang

unread,
Apr 15, 2016, 4:52:40 AM4/15/16
to Big Data Benchmark for BigBench
hi, hailong
It may be caused by the shading of kryo. How did you build your spark? Maybe you can try this patch: https://github.com/apache/spark/pull/12215 
Thanks.


Best Regards,
Yan

在 2016年4月15日星期五 UTC+8上午11:40:05,Hailong Sun写道:

Hailong Sun

unread,
Apr 22, 2016, 6:53:43 PM4/22/16
to Big Data Benchmark for BigBench
Hi Yan,

Thanks for you response. I was stuck by something else. Now I am back.

I compiled Spark with "-Phive -Phive-thriftserver". The "com/esotericsoftware/kryo/Kryo.class" in the "*assembly*" jar was not shaded while the "Kryo.class" in hive-exec-1.2.1.jar was shaded.

This should be the reason why I ran into the ClassCastException.

I do not understand how to use the patch to fix this problem. Can you give more instructions?

Thanks,
Hailong

在 2016年4月15日星期五 UTC-4上午4:52:40,Yan Tang写道:
Reply all
Reply to author
Forward
0 new messages