Py4JJavaError:

Vendaim Part2

unread,

Dec 17, 2020, 2:09:11 AM12/17/20

to User Group for BigDL and Analytics Zoo

Hi all,

I'm very new to Bigdl, I install bigdl=0.11 without pip, spark 2.4.0, and using python=3.5.

I run all the https://github.com/intel-analytics/BigDL-Tutorials and find it fine with me...

then I try to train my own Dataset, but when I run:

# Boot training process
trained_model = optimizer.optimize()
print("Optimization Done.") it gives me the following error

--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) <ipython-input-6-ff3780fa1893> in <module> 1 # Boot training process ----> 2 trained_model = optimizer.optimize() 3 print("Optimization Done.") /tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/optim/optimizer.py in optimize(self) 788 Do an optimization. 789 """ --> 790 jmodel = callJavaFunc(self.value.optimize) 791 from bigdl.nn.layer import Layer 792 return Layer.of(jmodel) /tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py in callJavaFunc(func, *args) 643 """ Call Java Function """ 644 gateway = _get_gateway() --> 645 result = func(*args) 646 return _java2py(gateway, result) 647 ~/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args: ~/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString() ~/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( Py4JJavaError: An error occurred while calling o120.optimize. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 566, in _get_port with open(path) as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/gateway_port' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in main process() File "/home/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 367, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 390, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper return f(*args, **kwargs) File "<ipython-input-2-f4a922899213>", line 24, in <lambda> File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/transform/vision/image.py", line 329, in __init__ super(Resize, self).__init__(bigdl_type, resize_h, resize_w, resize_mode, use_scale_factor) File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/transform/vision/image.py", line 34, in __init__ bigdl_type, JavaValue.jvm_class_constructor(self), *args) File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 589, in callBigDlFunc gateway = _get_gateway() File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 579, in _get_gateway gateway_port = _get_port() File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 573, in _get_port " executor side." % e.filename) RuntimeError: Could not open the file /tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/gateway_port, which contains the listening port of local Java Gateway, please make sure the init_executor_gateway() function is called before any call of java function on the executor side. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD.count(RDD.scala:1168) at com.intel.analytics.bigdl.dataset.DistributedDataSet$$anon$5.cache(DataSet.scala:195) at com.intel.analytics.bigdl.optim.AbstractOptimizer.prepareInput(AbstractOptimizer.scala:281) at com.intel.analytics.bigdl.optim.DistriOptimizer.prepareInput(DistriOptimizer.scala:809) at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 566, in _get_port with open(path) as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/gateway_port' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in main process() File "/home/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 367, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 390, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper return f(*args, **kwargs) File "<ipython-input-2-f4a922899213>", line 24, in <lambda> File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/transform/vision/image.py", line 329, in __init__ super(Resize, self).__init__(bigdl_type, resize_h, resize_w, resize_mode, use_scale_factor) File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/transform/vision/image.py", line 34, in __init__ bigdl_type, JavaValue.jvm_class_constructor(self), *args) File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 589, in callBigDlFunc gateway = _get_gateway() File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 579, in _get_gateway gateway_port = _get_port() File "/tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/bigdl-0.11.0-python-api.zip/bigdl/util/common.py", line 573, in _get_port " executor side." % e.filename) RuntimeError: Could not open the file /tmp/spark-2d66adba-0114-48fc-891e-8e93238f9295/userFiles-ca2ed079-2821-4244-8789-fc385426b662/gateway_port, which contains the listening port of local Java Gateway, please make sure the init_executor_gateway() function is called before any call of java function on the executor side. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

Please! help me out,if required I'll share other details.

Thanks!

Vendaim

Vendaim Part2

unread,

Dec 17, 2020, 1:09:34 PM12/17/20

to User Group for BigDL and Analytics Zoo

import matplotlib
matplotlib.use('Agg')

%pylab inline

import datetime as dt
from os import listdir
from os.path import join, basename

from bigdl.nn.layer import *
from bigdl.nn.criterion import *
from bigdl.optim.optimizer import *
from bigdl.util.common import *
from bigdl.dataset import mnist
#from utils import get_mnist
from matplotlib.pyplot import imshow
import matplotlib.pyplot as plt
from pyspark import SparkContext
sc=SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[4]").set("spark.driver.memory","20g"))
from scipy import misc
init_engine()

#build file structure and associated labels
def read_local_path(folder, has_label=True):
    # read directory, create map of pics and labels
    dirs = listdir(folder)
    # create image path and label list
    image_paths = []
    if has_label:
        dirs.sort()
        for d in dirs:
            for f in listdir(join(folder, d)):
                image_paths.append((join(join(folder, d), f), dirs.index(d) + 1))
    return image_paths

#building RDD from Traingdata set
def read_local_with_name(sc, folder, normalize=255.0, has_label=True):
    # read directory, create image paths list
    image_paths = read_local_path(folder, has_label)

    image_paths_rdd = sc.parallelize(image_paths)
    print(image_paths_rdd)
    features_label_name_rdd = image_paths_rdd.map(lambda path_label: (misc.imread(path_label[0]), np.array(path_label[1]), basename(path_label[0]))) \
        .map(lambda img_label_name:
             (Resize(128, 128)(img_label_name[0]), img_label_name[1], img_label_name[2])) \
        .map(lambda features_label_name:
             (((features_label_name[0] & 0xff) / normalize).astype("float32"), features_label_name[1], features_label_name[2]))
    return features_label_name_rdd

#build and test RDD dataset
imFolder = "/home/spark/dataset/ct_scans_Train"
localPath = read_local_path(imFolder)

print(len(localPath)) #print total number of elements
#test of random chosen path and associated label.
rndmItm = np.random.choice(range(len(localPath)))
print("PATH..... {} \nLABEL.... {}".format(localPath[rndmItm][0],localPath[rndmItm][1]))

Train_RDD = read_local_with_name(sc, imFolder, normalize=255.0, has_label=True)

#build file structure and associated labels
def read_local_path_T(folder_T, has_label=True):
    # read directory, create map of pics and labels
    dirs_T = listdir(folder_T)
    # create image path and label list
    image_paths_T = []
    if has_label:
        dirs_T.sort()
        for d_T in dirs_T:
            for f_T in listdir(join(folder_T, d_T)):
                image_paths_T.append((join(join(folder_T, d_T), f_T), dirs_T.index(d_T) + 1))
    return image_paths_T

#building RDD from Testing Dataset
def read_local_with_name_T(sc, folder_T, normalize=255.0, has_label=True):
    # read directory, create image paths list
    image_paths_T = read_local_path_T(folder_T, has_label)

    image_paths_rdd_T = sc.parallelize(image_paths_T)
    print(image_paths_rdd_T)
    features_label_name_rdd_T = image_paths_rdd_T.map(lambda path_label_T: (misc.imread(path_label_T[0]), np.array(path_label_T[1]), basename(path_label_T[0]))) \
        .map(lambda img_label_name_T:
             (Resize(128, 128)(img_label_name_T[0]), img_label_name_T[1], img_label_name_T[2])) \
        .map(lambda features_label_name_T:
             (((features_label_name_T[0] & 0xff) / normalize).astype("float32"), features_label_name_T[1], features_label_name_T[2]))
    return features_label_name_rdd_T

#build and test RDD dataset
imFolder_T = "/home/spark/dataset/ct_scans_Test/"
localPath_T = read_local_path_T(imFolder_T)

print(len(localPath_T)) #print total number of elements
#test of random chosen path and associated label.
rndmItm_T = np.random.choice(range(len(localPath_T)))
print("PATH..... {} \nLABEL.... {}".format(localPath_T[rndmItm_T][0],localPath_T[rndmItm_T][1]))

Test_RDD = read_local_with_name_T(sc, imFolder_T, normalize=255.0, has_label=True)

# Create a model
def build_model(class_num):
    model = Sequential()
    model.add(Reshape([1, 28, 28]))
    model.add(SpatialConvolution(1, 32, 5, 2, 1).set_name('conv1'))
    model.add(ReLU())
    model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool1'))

    model.add(SpatialConvolution(32, 64, 5, 1, 2).set_name('conv2'))
    model.add(ReLU())
    model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool2'))

model.add(SpatialConvolution(64, 128, 5, 1, 2).set_name('conv3'))

model.add(ReLU())
model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool3'))

model.add(SpatialConvolution(128, 256, 5, 1, 2).set_name('conv4'))

    model.add(ReLU())
    model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool4'))

    model.add(Reshape([7 * 7 * 256]))
    model.add(Dropout(0.5))
    model.add(Linear(7 * 7 * 256, 1000).set_name('fc1'))
    model.add(ReLU())
    model.add(Linear(1000, class_num).set_name('score'))
    model.add(Sigmoid())
    return model
model_Train = build_model(2)

optimizer = Optimizer(
    model= model_Train,
    training_rdd=Train_RDD,
    criterion=BCECriterion(),
    optim_method=Adam(learningrate=0.4,learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8, bigdl_type="float"),
    end_trigger=MaxEpoch(7),
    batch_size=2048)

# Set the validation logic
optimizer.set_validation(
    batch_size=2048,
    val_rdd=Test_RDD,
    trigger=EveryEpoch(),
    val_method=[Top1Accuracy()]
)

app_name='-cNNModel-'+dt.datetime.now().strftime("%Y%m%d-%H%M%S")
train_summary = TrainSummary(log_dir='/tmp/bigdl_summaries',
                                     app_name=app_name)
train_summary.set_summary_trigger("Parameters", SeveralIteration(50))
val_summary = ValidationSummary(log_dir='/tmp/bigdl_summaries',
                                        app_name=app_name)
optimizer.set_train_summary(train_summary)
optimizer.set_val_summary(val_summary)
print("saving logs to ",app_name)

# Boot training process
trained_model = optimizer.optimize()
print("Optimization Done.")

This is the code I used.

Give me some suggestions on what am I doing wrong, how can I solve this problem, and also by using an Analytical zoo can I train a CNN model on my own dataset?

bests,

Vendaim

glor...@gmail.com

unread,

Dec 17, 2020, 9:28:01 PM12/17/20

to User Group for BigDL and Analytics Zoo

Hi Vendaim，

Thanks for trying bigdl and analytics-zoo. We will try to reproduce your code and come back to you.

Regards,

Dongjie

alett...@gmail.com

unread,

Dec 18, 2020, 4:35:18 AM12/18/20

to User Group for BigDL and Analytics Zoo

Hi Vendaim，

I am trying to reproduce your question. Could you share your image folder structure? Additionally, can you check if spark home version and the compiled "*-python-api.zip" are the same spark version?

Best regards,

Le

Vendaim Part2

unread,

Dec 21, 2020, 3:22:26 AM12/21/20

to User Group for BigDL and Analytics Zoo

Hi There,

Sorry for the late reply, Yes I'll share it, and I also checked the spark version in both is =2.4.0. I'll share that too.

Thanks,

Vendaim

Vendaim Part2

unread,

Dec 21, 2020, 3:40:44 AM12/21/20

to User Group for BigDL and Analytics Zoo

--
You received this message because you are subscribed to a topic in the Google Groups "User Group for BigDL and Analytics Zoo" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bigdl-user-group/Rtrl3A6IhpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bigdl-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigdl-user-group/7f8a12b7-c446-4bed-b27f-a46eee10d9d2n%40googlegroups.com.

dataset.png

pyspark.png

TestImages.png

TrainImages.png

alett...@gmail.com

unread,

Dec 21, 2020, 10:26:47 AM12/21/20

to User Group for BigDL and Analytics Zoo

Hi Vendaim,

Thanks for your reply. When trying the code, it occurs error on my env "PIL.UnidentifiedImageError: cannot identify image file '/home/joy/workspace/try/data/ct_scans_Train/0/n04370456_5753.JPEG', PIL.UnidentifiedImageError: cannot identify image file '/home/joy/workspace/try/data/ct_scans_Train/1/n04370456_11513.JPEG'. which scipy and pillow you are using?

Thanks,

Le

Abbas kheljee

unread,

Jan 2, 2021, 12:31:59 AM1/2/21

to User Group for BigDL and Analytics Zoo

Hi Le,

I use the following code to read the Training Images, if I'm doing something wrong tell me.