Using Kryo serialization in the spark-shell?

2,020 views
Skip to first unread message

John Salvatier

unread,
Aug 27, 2013, 4:53:15 PM8/27/13
to spark...@googlegroups.com
Hello, 

I'd like to do some timings to compare Kryo serialization and normal serializations, and I've been doing my timings in the shell so far. 

Is there any way to use Kryo serialization in the shell? 

Matei Zaharia

unread,
Aug 28, 2013, 8:17:04 PM8/28/13
to spark...@googlegroups.com
Yup, just add it to the SPARK_JAVA_OPTS environment variable before you launch the shell, like this:

SPARK_JAVA_OPTS="-Dspark.serializer=spark.KryoSerializer" ./spark-shell

Or like this:

export SPARK_JAVA_OPTS="-Dspark.serializer=spark.KryoSerializer"
./spark-shell

Matei

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

John Salvatier

unread,
Sep 11, 2013, 2:38:44 PM9/11/13
to spark...@googlegroups.com
Thanks Matei! Very helpful

Debasish Das

unread,
Dec 18, 2013, 12:10:19 PM12/18/13
to spark...@googlegroups.com
Hi Matei, John,

My default Spark jobs run fine but I am not noticing significant speedup compared to scalding for wordcount at least. One of reason might be the default serialization in Spark.

Scalding most likley is using Kryo.

The following job runs fine:

SPARK_MEM=2g ./run-example org.apache.spark.examples.HdfsWordCount master inputPath outputPath

Now I tried to use Kryo serializer

SPARK_JAVA_OPTS="-Dspark.serializer.spark.KryoSerializer" SPARK_MEM=2g ./run-example org.apache.spark.examples.HdfsWordCount master inputPath outputPath

and the job fails.

What's the recommended serialization for large workloads you have tested ? Avro or Kryo ?

Thanks.
Deb

Matei Zaharia

unread,
Dec 19, 2013, 1:37:31 PM12/19/13
to spark...@googlegroups.com
You need to use -Dspark.serializer=org.apache.spark.KryoSerializer to pass that flag in. You can see whether it was passed correctly by looking at the “environment” tab of the Spark application UI (http://<your-machine>:4040).

Matei
Reply all
Reply to author
Forward
0 new messages