Spark Job Management over Yarn Rest API

1,019 views
Skip to first unread message

ced...@cogniteev.com

unread,
Oct 25, 2018, 11:57:49 AM10/25/18
to Google Cloud Dataproc Discussions
Hello, 

In my workflow, I need to monitor and eventually stop Spark jobs running on my Dataproc cluster remotely.
I found some information about Yarn Rest API to check and change a job's status. But, I encounter an issue when I try to kill a job with Rest API :

example :  
curl -v -X PUT -d '{"state": "KILLED"}' 'http://my-yarn-master:8088/ws/v1/cluster/apps/application_XXXX/state'

I got the following error :
401 - Unauthorized
{
   
"RemoteException": {
       
"exception": "AuthorizationException",
       
"message": "Unable to obtain user name, user not authenticated",
       
"javaClassName": "org.apache.hadoop.security.authorize.AuthorizationException"
   
}
}


I found in some documentation that I need to activate HTTP authentification (kerberos?) on my cluster with some config : 
"Please note that in order to kill an app, you must have an authentication filter setup for the HTTP interface. The functionality requires that a username is set in the HttpServletRequest. If no filter is setup, the response will be an "UNAUTHORIZED" response."

Is it true? Can't I just define in a property the user requesting this action?

Thanks a lot,
Regards


Karthik Palaniappan

unread,
Oct 25, 2018, 1:48:00 PM10/25/18
to Google Cloud Dataproc Discussions
Interestingly, I get that error on Dataproc 1.3 (Hadoop 2.9), but not on Dataproc 1.2 (Hadoop 2.8). I wonder if there was a breaking change going from 2.8 -> 2.9.

Other notes:

1) I get the same error using the webui to kill the application on Hadoop 2.9
2) I tried adding "?user=dr.who" to give the request a user, but that didn't help.

The docs are really unclear on how to configure a filter.

Karthik Palaniappan

unread,
Oct 25, 2018, 2:09:48 PM10/25/18
to Google Cloud Dataproc Discussions
Looking into the code, the filters are controlled by hadoop.http.filter.initializers: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServerWebApp.java#L105

In 1.2, we did not set that property, so it defaulted to org.apache.hadoop.http.lib.StaticUserWebFilter, which is necessary for the REST API to work. In 1.3, we set hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer, which is necessary to run the YARN Application Timeline Server.

So this is a regression -- we should have included both filters in 1.3. For now, you can create a cluster, and specify --properties="core:hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter".

I'll make the change in the Dataproc image build, and it will roll out in the next couple weeks. You can watch the release notes here: https://cloud.google.com/dataproc/docs/release-notes. Also feel free to file a Hadoop JIRA to complain about the lack of clear documentation around these filters :)

Karthik Palaniappan

unread,
Oct 25, 2018, 2:18:35 PM10/25/18
to Google Cloud Dataproc Discussions
For posterity, this would also be an issue if you ran the Tez init action on older Dataproc images, which similarly sets hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer. I have filed an issue to fix that as well: https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/issues/371.

ced...@cogniteev.com

unread,
Oct 26, 2018, 4:49:39 AM10/26/18
to Google Cloud Dataproc Discussions
Thanks for your quick response!

It works now by adding the new filter to create cluster properties command, I just had to espace dict flag values like this :

gcloud beta dataproc clusters create test-autoscaling blabla... --properties "^;^\
dataproc:alpha.autoscaling.enabled=true;\
...
core:hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter"

Regards.
Reply all
Reply to author
Forward
0 new messages