Log4J Map Reduce Mode

391 views
Skip to first unread message

Brian Orwig

unread,
Jun 27, 2016, 3:59:32 PM6/27/16
to gobblin-users
I am trying to get Log4J working in MapReduce mode. I have added the -D log4j options to the gobblin-mapreduce.sh script (see below), but it is not pickup these changes up. 


# Launch the job to run on Hadoop
$HADOOP_BIN_DIR
/hadoop jar \
        $FWDIR_LIB
/gobblin-runtime.jar \
        gobblin
.runtime.mapreduce.CliMRJobLauncher \
       
-D log4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml \
       
-D mapred.child.java.opts=-Dlog4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml \

I tried without spaces as well: 

        -Dlog4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml \
       
-Dmapred.child.java.opts=-Dlog4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml \

I have set the logger to log to both a local directory as well as a directory within HDFS and neither worked. I have also copied the log4j.xml file into HDFS so that it is visible from that path as well. 

Is there something else that I need to be doing? 

Thanks


Issac Buenrostro

unread,
Jun 27, 2016, 4:09:41 PM6/27/16
to Brian Orwig, gobblin-users
Hi Brian,
The configuration above should work for the Gobblin driver, but not for the actual mappers (ie. you will see the expected logs in the local log file). Is this the case? If not, you can set up the config -Dlog4j.debug, and it will print some more information.

For MR, it will be a bit more complicated, because the path "file:///" will be looking for a configuration file in the local file system of the node running the mapper, not in HDFS. You have a few options here: 1) change the cluster-wide log4j.properties (generally not possible if it is not your cluster), 2) modify the gobblin code to programmatically set its custom logging, 3) apparently newer versions of mapreduce offer this option (see https://issues.apache.org/jira/browse/MAPREDUCE-6052). 

What kind of logging are you trying to achieve?
Issac 

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/1b3ee6cc-1ebc-4529-9525-90f2dc86afb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Orwig

unread,
Jun 27, 2016, 4:27:58 PM6/27/16
to gobblin-users, brian...@derbysoft.net
Thanks for the quick response!

I will add the -Dlog4j.debug option and see if I can get more information. I am not getting any logging in local files for the driver either. 

For the other options it is our cluster so we can change the cluster wide log4j.properties file. Where is this generally located? 
I will also look at the newer versions of MR that you linked. 

Basically I have some written some custom converters for Gobblin and in those we log errors and stats data when we are converting the records and I need the logging output of those calls saved somewhere (local or HDFS). 

-Brian

Sahil Takiar

unread,
Jul 2, 2016, 12:06:05 PM7/2/16
to gobblin-users, brian...@derbysoft.net
Brian, try running the following in your shell "export HADOOP_CLIENT_OPTS=-Dlog4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml $HADOOP_CLIENT_OPTS" and then running your job. That should at least take care of configuring log4j for the driver.

It seems the setting to set the log4j properties file for map and reduce tasks is called "mapreduce.job.log4j-properties-file" (more documentation is here: https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml). This feature only seems to be available in Hadoop 2.6.0+

--Sahil

Sahil Takiar

unread,
Jul 2, 2016, 12:36:32 PM7/2/16
to gobblin-users, brian...@derbysoft.net
Actually, apologies. That probably won't work, because the "bin/gobblin-mapreduce.sh" script already sets the HADOOP_CLIENT_OPTS to "-Dlog4j.configuration=file:conf/log4j-mapreduce.xml". So you could (1) replace the conf/log4j-mapreduce.xml with your own configuration file, or (2) modify the script to remove that log4j file, and add in your own.

--Sahil

lbe...@gmail.com

unread,
Jul 17, 2016, 4:31:01 PM7/17/16
to gobblin-users, brian...@derbysoft.net
Hi Brian,

If you run the job with gobblin-mapreduce.sh you can only set the logging directory through command line (--logdir) but you
can't play in your own driver side logging conf, it's taken from $GOBBLIN_INSTALL/conf. But you can modify that one as
Sahil mentioned.

Thanks,
Lorand

Brian Orwig

unread,
Aug 3, 2016, 4:39:54 PM8/3/16
to gobblin-users, brian...@derbysoft.net
Thanks for the help!!!  
Adding the "export HADOOP_CLIENT_OPTS=-Dlog4j.configuration=file:///app/gobblin/1.0/conf/log4j.xml $HADOOP_CLIENT_OPTS"  options got me what I needed to get the logging working for the driver. 
Reply all
Reply to author
Forward
0 new messages