DataProc cluster stuck in Error state

1,494 views
Skip to first unread message

Derek Chan

unread,
Aug 15, 2017, 4:54:51 AM8/15/17
to Google Cloud Dataproc Discussions
Hi,

One of our DataProc cluster is stuck in an Error state. The error message is :

  • Operation timed out after 2100 tries: there are still pending operations

There are no jobs in the cluster but I am not able to delete the cluster. It keeps bringing up new worker nodes whenever I delete the VMs. Is there I can do to actually delete the cluster?



Best regards,
Derek C

Patrick Clay

unread,
Aug 15, 2017, 2:22:38 PM8/15/17
to Google Cloud Dataproc Discussions
Sorry to hear that,

Are the VMs that are coming back are secondary (preemptible) workers? If so you can delete the VMs by deleting their Google Compute Engine Managed Instance Group with gcloud compute instance-groups managed delete dataproc-CLUSTER_NAME-sw, which should make them go away for good.

As for resolving the operation timeout, cases like this are best handled by support, but barring that you can try sending your Project ID, Operation ID, and Cluster UUID to dataproc...@google.com (which only Google employees have access to). Support may be able to address it.

Hope that helps,
-Patrick
Reply all
Reply to author
Forward
0 new messages