PermissionError: [Errno 13] Permission denied: '/tmp/raylet_start.lock'

443 views
Skip to first unread message

AP

unread,
Jun 2, 2021, 9:49:56 AM6/2/21
to User Group for BigDL and Analytics Zoo
Hi Team,
I am getting this error while using zoo.orca.learn.tf2 Estimator
PermissionError: [Errno 13] Permission denied: '/tmp/raylet_start.lock'
What might be the probable reasons?


more logs:
    est = Estimator.from_keras(model_creator=model_creators)
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/orca/learn/tf2/estimator.py", line 63, in from_keras
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/orca/learn/tf2/estimator.py", line 107, in __init__
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/ray/raycontext.py", line 390, in get
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/ray/raycontext.py", line 479, in init
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/ray/raycontext.py", line 505, in _start_cluster
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 816, in collect
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(3, 8) finished unsuccessfully.
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in main
    process()
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 367, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4462.8166904/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 390, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/dfs/5/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000053/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.10.0-python-api.zip/zoo/ray/raycontext.py", line 242, in _start_ray_services
  File "/dfs/7/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000055/venv3.tar.gz/lib/python3.7/site-packages/filelock.py", line 323, in __enter__
    self.acquire()
  File "/dfs/7/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000055/venv3.tar.gz/lib/python3.7/site-packages/filelock.py", line 271, in acquire
    self._acquire()
  File "/dfs/7/yarn/nm/usercache//appcache/application_1621089916366_46514/container_e41_1621089916366_46514_01_000055/venv3.tar.gz/lib/python3.7/site-packages/filelock.py", line 384, in _acquire
    fd = os.open(self._lock_file, open_mode)
PermissionError: [Errno 13] Permission denied: '/tmp/raylet_start.lock'

huangka...@gmail.com

unread,
Jun 2, 2021, 11:00:31 PM6/2/21
to User Group for BigDL and Analytics Zoo
Hi,

When launching Ray processes on the same node, we add a temp lock to avoid possible conflicts.
According to the error, seem there's no permission under /tmp to create the lock. Could you check the permission on your machine?

Thanks,
Kai

Reply all
Reply to author
Forward
0 new messages