Genie 3.3.9 : ERROR : mkdir: cannot create directHandling exit signal (code: 1) : File exis

10 views
Skip to first unread message

IB

unread,
Dec 17, 2018, 1:40:57 AM12/17/18
to genie
Hi,

   I am getting the below error when posting jobs:

mkdir: cannot create directHandling exit signal (code: 1)
Handling exit signal (code: 1)
: File exists

Steps :
1. docker run -t --rm -p 8080:8080 netflixoss/genie-app
2. Create applications, commands and clusters. Taken code from the init_demo
3. submit job: 
   python ../examples/run_hadoop_job.py test
   The code is same as the one in the demo docker image.

The only difference is the locations of all the files are not S3 rather inside the docker fs.
I have copied the files from local to docker using docker cp .... <container_id>:/apps/genie_app
All the application, commands and cluster files are copied using docker cp.

Any help will appreciate.

Tom Gianos

unread,
Dec 17, 2018, 1:17:58 PM12/17/18
to genie
Hi,

Can you provide more context? Where is this error happening? Can you provide the server logs and client logs?

Thanks,
Tom

IB

unread,
Dec 17, 2018, 10:39:46 PM12/17/18
to genie
Hi Tom,

    The logs from genie.log is below:
mkdir: /Users/att.indranilbHandling exit signal (code: 1)
Handling exit signal (code: 1)
c/hadoop: File exists

The client logs in DEBUG:

DEBUG:com.netflix.genie.jobs.adapter.genie_3:payload to genie 3:
DEBUG:com.netflix.genie.jobs.adapter.genie_3:{
    "clusterCriterias": [
        {
            "tags": [
                "sched:test",
                "type:yarn"
            ]
        },
        {
            "tags": [
                "type:genie"
            ]
        }
    ],
    "commandArgs": "--class org.apache.spark.examples.SparkPi ${SPARK_HOME}/lib/spark-examples*.jar 10",
    "commandCriteria": [
        "type:spark-submit"
    ],
    "disableLogArchival": false,
    "id": "7f49554c-0274-11e9-976b-000c6c077b63",
    "name": "Genie Demo Spark Submit Job",
    "user": "att.indranilb",
    "version": "3.0.0"
}
DEBUG:com.netflix.pygenie.utils:"POST http://localhost:8080/api/v3/jobs"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "POST /api/v3/jobs HTTP/1.1" 202 0
DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63 HTTP/1.1" 200 None
DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status HTTP/1.1" 200 None
Job 7f49554c-0274-11e9-976b-000c6c077b63 is INIT
http://localhost:8080/jobs?id=7f49554c-0274-11e9-976b-000c6c077b63&rowId=7f49554c-0274-11e9-976b-000c6c077b63
DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status HTTP/1.1" 200 None
.DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status HTTP/1.1" 200 None
DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status HTTP/1.1" 200 None

DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63/status HTTP/1.1" 200 None
DEBUG:com.netflix.pygenie.utils:"GET http://localhost:8080/api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63"
DEBUG:com.netflix.pygenie.utils:headers: {'user-agent': '1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/nflx-genie-client/3.6.6'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/v3/jobs/7f49554c-0274-11e9-976b-000c6c077b63 HTTP/1.1" 200 None
Job 7f49554c-0274-11e9-976b-000c6c077b63 finished with status FAILED

As mentioned i have used the same code as that in the demo docker cluster. Only changed the spark version to 2.4.0 and hadoop to 3.0.2
I have added the code as Archive.setup
1. Start the genie-app
2. run the setup.py under src/
3. then run the spark job submit or hadoop job under src/examples
Archive.zip

IB

unread,
Dec 17, 2018, 10:50:43 PM12/17/18
to genie
Hi Tom,

  Just adding to it:
I found a property "com.netflix.genie.server.job.manager.yarn.command.mkdir". How to use that? I mean how can i set it to `mkdir -p`?

Thanks and regards,
-- IB

IB

unread,
Dec 18, 2018, 5:05:39 AM12/18/18
to genie
Hi Tom,

    I got it working - the mkdir issue is now resolved. There was an issue in my setup script.
Thanks for your help.

I may have some other questions which i will post in a separate thread.

Regards,
-- IB

IB

unread,
Dec 18, 2018, 5:43:09 AM12/18/18
to genie
Hi Tom,

   I have 2 command files yarn302.yml and hadoop302.yml.

yarn302.yaml:

id: yarn302
name: yarn
user: att.indranilb
description: Yarn Command
status: ACTIVE
setupFile:
configs: []
executable: ${HADOOP_HOME}/sbin/start-yarn.sh
version: 3.0.2
tags:
  - type:yarn
  - ver:3.0.2
checkDelay: 5000

hadoop302.yml:

id: hadoop302
name: hadoop
user: att.indranilb
description: Hadoop Command
status: ACTIVE
setupFile:
configs: []
executable: ${HADOOP_HOME}/sbin/start-dfs.sh
version: 3.0.2
tags:
  - type:hadoop
  - ver:3.0.2
checkDelay: 5000

When i submit any spark job using the demo run_spark_submit_job.py, the job waits for the yarn at 8032 port and the namenodes and data nodes at 9000.
I have to manually start both of them in my local machine and then the job succeeds. Is it the case? I thought when i submit the job the starting of the hadoop cluster will be taken care of Genie.
Please clarify. 
Also can you point me to any document/example which has steps to connect to EMR?

Thanks and regards,
-- IB


Tom Gianos

unread,
Dec 18, 2018, 5:13:16 PM12/18/18
to genie
Genie is not responsible for launching clusters. As in the demo where the hadoop cluster is launched as a separate container and then its information registered with Genie so should your environment do it. For EMR you would launch your EMR cluster and after it is launched register its configuration with Genie in some process afterwards. Generally at Netflix our clusters are launched and the metadata files (*-site.xml) files are placed on S3 and then registered with Genie for the cluster to be used.

If you haven't already you probably should read the documentation particularly the concept section.
Reply all
Reply to author
Forward
0 new messages