[scalding] Exception while serializing jobConf

191 views
Skip to first unread message

Hagai Attias

unread,
Jul 6, 2015, 4:44:52 AM7/6/15
to cascadi...@googlegroups.com
I need to read parameters from the jobConf in my job.

I have the following code:

class FPNeedle(args: Args) extends AvroJob(args) {

val jobConfigTest = implicitly[Mode] match {
case Hdfs(_, configuration) => configuration
case _ => throw new RuntimeException("Not running on Hadoop! (maybe cascading local mode?)")
}

...
val threshold = jobConfigTest.getInt("threshold")
...
}

While running it I get:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1165)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1535)
...
com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
Serialization trace:
classes (sun.misc.Launcher$AppClassLoader)
classLoader (org.apache.hadoop.mapred.JobConf)
JobConfigTest (com.akamai.csi.jobs.needles.scalding.fpneedle.job.FPNeedle)
$outer (com.akamai.csi.jobs.needles.scalding.fpneedle.job.FPNeedle$$anonfun$60)
fn$1 (com.twitter.scalding.typed.KeyedListLike$$anonfun$filter$1)
fn$3 (com.twitter.scalding.typed.IteratorMappedReduce$$anonfun$7)

...
Caused by: java.util.ConcurrentModificationException
	at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
	at java.util.AbstractList$Itr.next(AbstractList.java:343)
	at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:74)
	at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
	at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
	... 89 more


What's the concurrent modification exception means? I'm not writing to the jobconf anywhere, only read from it.

Thanks.

Oscar Boykin

unread,
Jul 6, 2015, 10:09:54 AM7/6/15
to cascadi...@googlegroups.com
Serialization problems like this can be a pain. There are several techniques to deal with them.

If you only need a val at submission time, mark it @transient. If you only need a val on the mappers, mark it lazy. (Obviously, in some cases this won't work, but in many cases it will).

Here, if you change to:

@transient val jobConfigTest = 

it should work.

If that fails, try not capturing jobConfigTest at all:

val threshold: Int = implicitly[Mode] match {
  case Hdfs(_, configuration) =>      
    configuration.getInt("threshold")
  case m =>
    sys.error("Not running on Hadoop! (maybe cascading local mode?): " + m.toString)
}

Lastly, args("threshold").toInt and passing the arg as: --threshold 10 or something, would sidestep this issue.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/5a925a43-4345-4d54-930a-eb66610d0d6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

Hagai Attias

unread,
Jul 6, 2015, 11:07:38 AM7/6/15
to cascadi...@googlegroups.com
Thanks, Oscar. 

For now we've managed to bypass this issue by using the config() method in Job.

This method returns a Map[AnyRef,AnyRef] which seems to contain what we need from the JobConf. 

Is the (jobConf which is of type Configuration) only a wrapper around this map?

Ian O'Connell

unread,
Jul 6, 2015, 11:11:55 AM7/6/15
to cascadi...@googlegroups.com
No the Config is an immutable form of a Configuration, hadoop knows nothing of Config, its a purely scalding concept. Configuration is the hadoop thing. So not everything may be in Config, mostly due to Configuration being not really a map but full of strange logic too.

Reply all
Reply to author
Forward
0 new messages