LinkageError while running a Pipeline consisting of a Python transform in Native mode

Gopi Krishna Kapagunta

unread,

Jun 15, 2020, 12:21:49 PM6/15/20

to CDAP User

Hi All,

I am trying to create a pipeline that reads some records from a database and processes it via python transform (native mode). I have installed Anaconda and configured its binary location in the stage.

I am getting the following error and unable to trace the RCA:

2020-06-15 15:48:13,765 - ERROR [Executor task launch worker for task 0:o.a.s.e.Executor@91] - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext"
	at java.lang.Class.forName0(Native Method) ~[na:1.8.0_252]
	at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_252]
	at com.sun.proxy.$Proxy133.<clinit>(Unknown Source) ~[na:na]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_252]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_252]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_252]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_252]
	at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739) ~[na:1.8.0_252]
	at py4j.Gateway.createProxy(Gateway.java:368) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at py4j.CallbackClient.getPythonServerEntryPoint(CallbackClient.java:418) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at py4j.GatewayServer.getPythonServerEntryPoint(GatewayServer.java:803) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:194) ~[1592236085781-0/:na]
	at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:160) ~[1592236085781-0/:na]
	at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:72) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.common.plugin.WrappedTransform.initialize(WrappedTransform.java:71) ~[cdap-etl-core-6.2.0.jar:na]
	at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:43) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
	at io.cdap.cdap.etl.spark.Compat$FlatMapAdapter.call(Compat.java:126) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1016) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:947) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:711) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1016) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:947) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1007) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:711) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.scheduler.Task.run(Task.scala:100) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_252]

Terence Yim

unread,

Jun 15, 2020, 12:30:40 PM6/15/20

to CDAP User

Hi,

Do you have more than one python transform nodes or just one?

Terence

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/7ebf27a6-18f1-49bd-969a-3b39268cf5c9o%40googlegroups.com.

--

Terence Yim | Staff Software Engineer | tere...@google.com |

Gopi Krishna Kapagunta

unread,

Jun 15, 2020, 12:31:47 PM6/15/20

to CDAP User

Hi Terence,

I have a total of 3 python transforms in the pipeline. Is that an issue?

To unsubscribe from this group and stop receiving emails from it, send an email to cdap...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/7ebf27a6-18f1-49bd-969a-3b39268cf5c9o%40googlegroups.com.

Gopi Krishna Kapagunta

unread,

Jun 17, 2020, 12:05:05 PM6/17/20

to CDAP User

Hi Terence,

Is there any guidance you can give us on resolving the bug? Would appreciate any pointers

Terence Yim

unread,

Jun 19, 2020, 4:24:09 AM6/19/20

to CDAP User

Hi,

I suspect it is due to multiple python nodes. Is it possible for you to attach the pipeline json for us to reproduce the problem? Also, is it possible for you to combine the multiple python transformation nodes into one?

Terence

To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/aaa75282-4279-4dfd-8521-c0b8de4d1a98o%40googlegroups.com.

Gopi Krishna Kapagunta

unread,

Jun 19, 2020, 4:59:18 AM6/19/20

to CDAP User

Hi Terence,

I tried removing the other python transforms and keeping only 1. It is still showing me the error. Attaching the JSON file of the pipeline, input file, and the logs. (confidential information removed). In this pipeline, if I remove the python transform and relay the source output directly to the sink, the pipeline succeeds.

I am running this on a docker container with image "caskdata/cdap-sandbox:6.2.0" and anaconda installed and configured on the container.

Please let me know if you need any other information

To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/aaa75282-4279-4dfd-8521-c0b8de4d1a98o%40googlegroups.com.

TestNativePython_v2-cdap-data-pipeline.json

entity.csv

TestNativePython_v2.log

Gopi Krishna Kapagunta

unread,

Jun 19, 2020, 5:01:35 AM6/19/20

to CDAP User

Hi Terence,

I also observed that if I switch from a "Database" source to a "File" source, the error goes away. Perhaps it has something to do with a combination of a Database Source and a Python transform?

Gopi Krishna Kapagunta

unread,

Jun 19, 2020, 2:03:08 PM6/19/20

to CDAP User

Hi Terence,

I have 2 code snippets for Python Transform. One works and one doesn't, and the only difference is that I am creating a python variable in the one that doesn't work.

1. Erroring Code

def transform(record, emitter, context):
    file_list="new key"
    record['file_list']=file_list
    emitter.emit(record)

2. Working Code

def transform(record, emitter, context):
    record['form_type']='P'
    emitter.emit(record)

Could this potentially mean that I need to install another version of CDAP or any other steps I need to take?

Bao Ngo

unread,

Oct 26, 2020, 10:36:22 AM10/26/20

to CDAP User

Hi,

Is there any update on this? I get the same error without change any code in the python node. Just read from mysql DB, send to python then send to trash.

Darsh Shukla

unread,

Jan 25, 2021, 8:52:04 AM1/25/21

to CDAP User

Hi,

Is there any update on this? I am getting same error. Only using one node for Python transform. Pipeline is GCS >> Wrangler >> PythonTransform(v2.2.1) >> File.

Regards

Mohammed Eseifan

unread,

Jan 26, 2021, 5:30:55 PM1/26/21

to CDAP User

Hi Darsh,

Can you confirm which version of CDAP are you using? Are you using the sandbox zip or another setup?

Thanks,

Mo

Darsh Shukla

unread,

Jan 30, 2021, 12:29:08 AM1/30/21

to CDAP User

Hi Mohammed,

I am using CDAP version - cdap-sandbox-6.2.3, sandbox zip.

Thanks,

Darsh

Shannon Duncan

unread,

Mar 3, 2021, 2:04:12 PM3/3/21

to CDAP User

Curious about this as well. We are running into the same issue and there is only one Python Transform in the pipeline.

Waseem Qureshi

unread,

Mar 17, 2021, 9:39:47 PM3/17/21

to CDAP User

Experiencing the same, running on version 6.3.0, just 1 Python Transform in the pipeline

Reply all

Reply to author

Forward