Genie Demo Help

42 views
Skip to first unread message

Bill Blazek

unread,
Jul 25, 2017, 4:34:56 PM7/25/17
to genie

I'm a total newbie at Genie and Docker and
trying to follow the demo at
Section 3.
.
In the command window, I've run
docker exec -it genie_demo_client_3.0.3 /bin/bash
and have the bash prompt and can ls the files.
I've initialized genie with ./init_demo.py and
can see the data in genie at http://localhost:8080.

I get to #10, Run some jobs.
When I try to run
./run_hadoop_job.py
it it says it's in INIT then sits there for a very long time, 15 mins or more,
slowly putting out dots.
Genie says the job is RUNNING.


2017-07-25 20:02:16,508 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at genie-hadoop-prod/172.18.0.4:8032
2017-07-25 20:02:18,560 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:19,562 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:20,567 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:21,575 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:22,576 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:23,578 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:24,581 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:25,584 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:26,585 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-25 20:02:27,587 INFO [main] ipc.Client (Client.java:handleConnectionFailure(866)) - Retrying connect to server: genie-hadoop-prod/172.18.0.4:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Why can't it connect to the server:
genie-hadoop-prod/172.18.0.4:8032?
What else do I need to do?

If I eventually kill the job on the cmd line with CTRL-C, it shows up in Genie as FAILED.
If I try to run it again, both jobs show up in Genie, the FAILED job and
the next RUNNING job that is continuously retrying again.

Thanks in advance.

Bill Blazek

unread,
Jul 25, 2017, 4:44:00 PM7/25/17
to genie
P.S.

The cmd I'm running is
./run_hadoop_job.py sla or
./run_hadoop_job.py test
with the sla or test argument.
Without test or sla argument it just returns an error immediately.

Bill Blazek

unread,
Jul 25, 2017, 5:03:51 PM7/25/17
to genie
P.P.S.
The Genie Demo Spark Shell Job and Genie Demo Yarn Job,
./run_spark_shell_job.py and
./run_yarn_job.py
seem to run correctly and return
Usage: ./bin/spark-shell [options] Options: ...
and
Total number of applications ...
respectively.

The Genie Demo Spark Submit Job also errors out with connection refused errors.

Tom Gianos

unread,
Jul 25, 2017, 5:35:57 PM7/25/17
to genie
Hi Bill,

I suspect the problem you're having is that your hadoop containers are crashing before Genie can call them with your jobs that's why you're getting server not found exceptions. Below is an environment i just ran on my local machine and it worked ok. I suspect you need to give your docker daemon more memory. Make sure after you do "docker-compose up -d" that if you do "docker ps -a" afterwards it shows all five containers running (e.g. status UP) as mine do below

Let us know if that helps.

Tom

Docker -> about: Version 17.06.0-ce-mac19 (18663)
Docker -> preferences -> advanced: 4 CPU, 6 GB memory

Docker images:
netflixoss/genie-demo-client                3.0.3               dfc0cd1e85a8        4 months ago        263MB
netflixoss/genie-demo-apache                3.0.3               da20809627a3        4 months ago        1.45GB
netflixoss/genie-app                        3.0.3               11793e6cd700        4 months ago        480MB
sequenceiq/hadoop-docker                    2.7.1               42efa33d1fa3        19 months ago       1.76GB

docker ps -a :
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                                                           NAMES
c181e0a4c59a        netflixoss/genie-demo-client:3.0.3   "/bin/bash"              33 seconds ago      Up 31 seconds                                                                                                                                                                                                       genie_demo_client_3.0.3
b49365f142e6        netflixoss/genie-app:3.0.3           "java -Djava.secur..."   38 seconds ago      Up 33 seconds       0.0.0.0:8080->8080/tcp                                                                                                                                                                          genie_demo_app_3.0.3
7af271c44895        sequenceiq/hadoop-docker:2.7.1       "/bin/bash -c '/us..."   40 seconds ago      Up 37 seconds       2122/tcp, 8030-8033/tcp, 8040/tcp, 0.0.0.0:8088->8088/tcp, 8042/tcp, 49707/tcp, 50010/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:50070->50070/tcp, 50020/tcp, 50090/tcp, 0.0.0.0:50075->50075/tcp   genie_demo_hadoop_prod_3.0.3
f5834e42db15        sequenceiq/hadoop-docker:2.7.1       "/bin/bash -c '/us..."   40 seconds ago      Up 37 seconds       2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50090/tcp, 0.0.0.0:8089->8088/tcp, 0.0.0.0:19889->19888/tcp, 0.0.0.0:50071->50070/tcp, 0.0.0.0:50076->50075/tcp   genie_demo_hadoop_test_3.0.3
f38d3e4edb15        netflixoss/genie-demo-apache:3.0.3   "httpd-foreground"       40 seconds ago      Up 38 seconds       80/tcp                                                                                                                                                                                          genie_demo_apache_3.0.3

Spark Submit:
docker exec -it genie_demo_client_3.0.3 /bin/bash
bash-4.3# ./init_demo.py
WARNING:__main__:Created Hadoop 2.7.1 application with id = [hadoop271]
WARNING:__main__:Created Spark 1.6.3 application with id = [spark163]
WARNING:__main__:Created Spark 2.0.1 application with id = [spark201]
WARNING:__main__:Created Hadoop command with id = [hadoop271]
WARNING:__main__:Created HDFS command with id = [hdfs271]
WARNING:__main__:Created Yarn command with id = [yarn271]
WARNING:__main__:Created Spark 1.6.3 Shell command with id = [sparkshell163]
WARNING:__main__:Created Spark 1.6.3 Submit command with id = [sparksubmit163]
WARNING:__main__:Created Spark 2.0.1 Shell command with id = [sparkshell201]
WARNING:__main__:Created Spark 2.0.1 Submit command with id = [sparksubmit201]
WARNING:__main__:Set applications for Hadoop command to = [hadoop271]
WARNING:__main__:Set applications for HDFS command to = [[hadoop271]]
WARNING:__main__:Set applications for Yarn command to = [[hadoop271]]
WARNING:__main__:Set applications for Spark 1.6.3 Shell command to = [[u'hadoop271', u'spark163']]
WARNING:__main__:Set applications for Spark 1.6.3 Submit command to = [[u'hadoop271', u'spark163']]
WARNING:__main__:Set applications for Spark 2.0.1 Shell command to = [[u'hadoop271', u'spark201']]
WARNING:__main__:Set applications for Spark 2.0.1 Submit command to = [[u'hadoop271', u'spark201']]
WARNING:__main__:Created prod cluster with id = [96e1308a-7dcd-4bf7-bcb2-f83c07afd7c4]
WARNING:__main__:Created test cluster with id = [4c5c59f3-cfce-49c6-836a-fc168454c95f]
WARNING:__main__:Added all commands to the prod cluster with id = [96e1308a-7dcd-4bf7-bcb2-f83c07afd7c4]
WARNING:__main__:Added all commands to the test cluster with id = [4c5c59f3-cfce-49c6-836a-fc168454c95f]
bash-4.3# ./run_spark_submit_job.py sla
Job d945fea3-7180-11e7-8cc3-0242ac120006 is INIT
..
Job d945fea3-7180-11e7-8cc3-0242ac120006 finished with status SUCCEEDED


Tom Gianos

unread,
Jul 25, 2017, 5:37:02 PM7/25/17
to genie
PS: The yarn and spark shell commands probably work cause they were able to run completely locally on the Genie node and not reach out to the clusters. Yarn may have tried to reach out to the clusters but since it couldn't communicate just returned empty list for files or processes. I forget what I set that up to do.
Reply all
Reply to author
Forward
0 new messages