Using Kryo serialization in the spark-shell?

John Salvatier

unread,

Aug 27, 2013, 4:53:15 PM8/27/13

to spark...@googlegroups.com

Hello,

I'd like to do some timings to compare Kryo serialization and normal serializations, and I've been doing my timings in the shell so far.

Is there any way to use Kryo serialization in the shell?

Matei Zaharia

unread,

Aug 28, 2013, 8:17:04 PM8/28/13

to spark...@googlegroups.com

Yup, just add it to the SPARK_JAVA_OPTS environment variable before you launch the shell, like this:

SPARK_JAVA_OPTS="-Dspark.serializer=spark.KryoSerializer" ./spark-shell

Or like this:

export SPARK_JAVA_OPTS="-Dspark.serializer=spark.KryoSerializer"

./spark-shell

Matei

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

John Salvatier

unread,

Sep 11, 2013, 2:38:44 PM9/11/13

to spark...@googlegroups.com

Thanks Matei! Very helpful

Debasish Das

unread,

Dec 18, 2013, 12:10:19 PM12/18/13

to spark...@googlegroups.com

Hi Matei, John,

My default Spark jobs run fine but I am not noticing significant speedup compared to scalding for wordcount at least. One of reason might be the default serialization in Spark.

Scalding most likley is using Kryo.

The following job runs fine:

SPARK_MEM=2g ./run-example org.apache.spark.examples.HdfsWordCount master inputPath outputPath

Now I tried to use Kryo serializer

SPARK_JAVA_OPTS="-Dspark.serializer.spark.KryoSerializer" SPARK_MEM=2g ./run-example org.apache.spark.examples.HdfsWordCount master inputPath outputPath

and the job fails.

What's the recommended serialization for large workloads you have tested ? Avro or Kryo ?

Thanks.

Deb

Matei Zaharia

unread,

Dec 19, 2013, 1:37:31 PM12/19/13

to spark...@googlegroups.com

You need to use -Dspark.serializer=org.apache.spark.KryoSerializer to pass that flag in. You can see whether it was passed correctly by looking at the “environment” tab of the Spark application UI (http://<your-machine>:4040).

Matei

Reply all

Reply to author

Forward