LinkageError while running a Pipeline consisting of a Python transform in Native mode

205 views
Skip to first unread message

Gopi Krishna Kapagunta

unread,
Jun 15, 2020, 12:21:49 PM6/15/20
to CDAP User
Hi All,

I am trying to create a pipeline that reads some records from a database and processes it via python transform (native mode). I have installed Anaconda and configured its binary location in the stage.
I am getting the following error and unable to trace the RCA:

2020-06-15 15:48:13,765 - ERROR [Executor task launch worker for task 0:o.a.s.e.Executor@91] - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext"
	at java.lang.Class.forName0(Native Method) ~[na:1.8.0_252]
	at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_252]
	at com.sun.proxy.$Proxy133.<clinit>(Unknown Source) ~[na:na]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_252]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_252]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_252]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_252]
	at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739) ~[na:1.8.0_252]
	at py4j.Gateway.createProxy(Gateway.java:368) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at py4j.CallbackClient.getPythonServerEntryPoint(CallbackClient.java:418) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at py4j.GatewayServer.getPythonServerEntryPoint(GatewayServer.java:803) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:194) ~[1592236085781-0/:na]
	at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:160) ~[1592236085781-0/:na]
	at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:72) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.common.plugin.WrappedTransform.initialize(WrappedTransform.java:71) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:43) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
	at io.cdap.cdap.etl.spark.Compat$FlatMapAdapter.call(Compat.java:126) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1016) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:947) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:711) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1016) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:947) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:711) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.scheduler.Task.run(Task.scala:100) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_252]

Terence Yim

unread,
Jun 15, 2020, 12:30:40 PM6/15/20
to CDAP User
Hi,

Do you have more than one python transform nodes or just one?

Terence

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/7ebf27a6-18f1-49bd-969a-3b39268cf5c9o%40googlegroups.com.


--
Terence Yim | Staff Software Engineer | tere...@google.com | 

Gopi Krishna Kapagunta

unread,
Jun 15, 2020, 12:31:47 PM6/15/20
to CDAP User
Hi Terence,
I have a total of 3 python transforms in the pipeline. Is that an issue?
To unsubscribe from this group and stop receiving emails from it, send an email to cdap...@googlegroups.com.

Gopi Krishna Kapagunta

unread,
Jun 17, 2020, 12:05:05 PM6/17/20
to CDAP User
 Hi Terence,

Is there any guidance you can give us on resolving the bug? Would appreciate any pointers

Terence Yim

unread,
Jun 19, 2020, 4:24:09 AM6/19/20
to CDAP User
Hi,

I suspect it is due to multiple python nodes. Is it possible for you to attach the pipeline json for us to reproduce the problem? Also, is it possible for you to combine the multiple python transformation nodes into one?

Terence

To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/aaa75282-4279-4dfd-8521-c0b8de4d1a98o%40googlegroups.com.

Gopi Krishna Kapagunta

unread,
Jun 19, 2020, 4:59:18 AM6/19/20
to CDAP User
Hi Terence,

I tried removing the other python transforms and keeping only 1. It is still showing me the error. Attaching the JSON file of the pipeline, input file, and the logs. (confidential information removed). In this pipeline, if I remove the python transform and relay the source output directly to the sink, the pipeline succeeds.
I am running this on a docker container with image "caskdata/cdap-sandbox:6.2.0" and anaconda installed and configured on the container.
Please let me know if you need any other information
TestNativePython_v2-cdap-data-pipeline.json
entity.csv
TestNativePython_v2.log

Gopi Krishna Kapagunta

unread,
Jun 19, 2020, 5:01:35 AM6/19/20
to CDAP User
Hi Terence,

I also observed that if I switch from a "Database" source to a "File" source, the error goes away. Perhaps it has something to do with a combination of a Database Source and a Python transform?

Gopi Krishna Kapagunta

unread,
Jun 19, 2020, 2:03:08 PM6/19/20
to CDAP User
Hi Terence,

I have 2 code snippets for Python Transform. One works and one doesn't, and the only difference is that I am creating a python variable in the one that doesn't work.

1. Erroring Code
def transform(record, emitter, context):
    file_list="new key"
    record['file_list']=file_list
    emitter.emit(record)

2.  Working Code
def transform(record, emitter, context):
    record['form_type']='P'
    emitter.emit(record)

Could this potentially mean that I need to install another version of CDAP or any other steps I need to take?

Bao Ngo

unread,
Oct 26, 2020, 10:36:22 AM10/26/20
to CDAP User
Hi,
Is there any update on this? I get the same error without change any code in the python node. Just read from mysql DB, send to python then send to trash.

Darsh Shukla

unread,
Jan 25, 2021, 8:52:04 AM1/25/21
to CDAP User
Hi,

Is there any update on this? I am getting same error. Only using one node for Python transform. Pipeline is GCS >> Wrangler >> PythonTransform(v2.2.1) >> File.

Regards

Mohammed Eseifan

unread,
Jan 26, 2021, 5:30:55 PM1/26/21
to CDAP User
Hi Darsh,

Can you confirm which version of CDAP are you using? Are you using the sandbox zip or another setup? 

Thanks,
Mo

Darsh Shukla

unread,
Jan 30, 2021, 12:29:08 AM1/30/21
to CDAP User
Hi Mohammed,

I am using CDAP version - cdap-sandbox-6.2.3, sandbox zip.

Thanks,
Darsh

Shannon Duncan

unread,
Mar 3, 2021, 2:04:12 PM3/3/21
to CDAP User
Curious about this as well. We are running into the same issue and there is only one Python Transform in the pipeline.

Waseem Qureshi

unread,
Mar 17, 2021, 9:39:47 PM3/17/21
to CDAP User
Experiencing the same, running on version 6.3.0, just 1 Python Transform in the pipeline
Reply all
Reply to author
Forward
0 new messages