Cloudera Nodes + Job Tracker

18 views
Skip to first unread message

Brian Orwig

unread,
Jun 26, 2017, 12:30:00 PM6/26/17
to gobblin-users
I am currently running gobblin in MR mode, but I am unable to see that the work is being distributed to my other MR nodes in the hadoop cluster or anything in the job tracker. I am using CDH 5.8.1. 

I have the following values set in the mapred-site.xml (note I replaced the real host with $host).

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>$host:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>$host:19888</value>
</property>

I also specifically added the following option to the gobblin job: 

--jt $host:10020

With both of these I see nothing in the job tracker. 

Also when I look at the CDH console or even using top on the nodes I can't see an MR jobs running out there. It looks like everything is running on the one server where gobblin is running. 

Here is the script that I use to start the job: 

#!/bin/sh

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/app/gobblin/0.9.0/hadoop
export HADOOP_BIN_DIR=/opt/cloudera/parcels/CDH/bin
export HADOOP_CLIENT_OPTS="-Xmx16G"
export HADOOP_HEAPSIZE=16000
export CUSTOMER=
export DATA_TYPE=res
export LOG_TYPE=
export DATA_DIR=/data/connectivity/gobblin/
export TOPIC_WHITELIST=res_raw
for jarFile in `ls /app/gobblin/0.9.0/derby_lib/*`
do
  export DERBY_JARS=$jarFile,${DERBY_JARS}
done

/app/gobblin/0.9.0/bin/gobblin-mapreduce.sh --jt $host:10020 --fs hdfs://nameservice1 --workdir hdfs://nameservice1/etl/connectivity/gobblin --jars $DERBY_JARS --conf /app/gobblin/0.9.0/job_conf/res_ec2.pull

Is there something else that I need to have set somewhere? 

Sahil Takiar

unread,
Jun 26, 2017, 4:35:41 PM6/26/17
to Brian Orwig, gobblin-users
The value for HADOOP_HOME doesn't look correct. You want it to point to the Hadoop installation on your Cloudera cluster.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/a1da9392-4adc-4a7c-942c-6d247f88f46e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Sahil Takiar
Software Engineer at Cloudera
takiar...@gmail.com | (510) 673-0309

Brian Orwig

unread,
Jun 26, 2017, 7:25:07 PM6/26/17
to gobblin-users, brian...@derbysoft.net
Our gobblin instance is not running on a MR node in the cluster, but within the HADOOP_HOME=/app/gobblin/0.9.0/hadoop folder we have all of the hadoop configuration files that were copied from one of the MR nodes. 

/app/gobblin/0.9.0/hadoop/conf/
core-site.xml  hadoop-env.sh  hdfs-site.xml  hive-env.sh  hive-site.xml  log4j.properties  mapred-site.xml  redaction-rules.json  ssl-client.xml  topology.map  topology.py  yarn-site.xml

We assumed that it would just use this to communicate the MR jobs. If this is not a correct assumption and gobblin (and azkaban that runs the jobs) need to run on a MR node then please let me know. 

Thanks
-Brian
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/a1da9392-4adc-4a7c-942c-6d247f88f46e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages