Camus Saying Job Completion . But No Data is written in HDFS.

Deep Shah

unread,

Jul 7, 2015, 7:15:07 PM7/7/15

to camu...@googlegroups.com

Hello All,

So I am using camus for getting simple JSON logs from Kafka Cluster and dumping into HDFS using Camus Map Reduce.

Part 1 . Local Kafka Cluster ( localhost:9092 ) && Local Camus (localhost) --> topic : test

===============================================================

Everything is working I am able to fetch logs from Kafka Cluster read the json format by JsonStringMessageDecoder and fetch the '@timestamp' key from json and use this value as my camus timestamp. 'message' key in json will fetch the message and store it into HDFS.

Kafka Sample Log

-------------------

{"message":"cosis`20150421000308.975`3xQTVv...","@version":"1","type":"core","host":"piab1-app1-3-piab.ops.sfdc.net","path":"/home/sfdc/logs/sfdc/na6.sfdc.na6-app3-1-sjl.app.20150421.194.gmt.log","timestamp_produce":"2015-05-27T13:35:49.371Z","process_id":"logstash1","log_record_type":"cosis","@timestamp":"2015-05-01T00:03:08.975Z","fingerprint":"A42BF62105AC0AC7240E09C7D4E4E87E"}

Camus HDFS

example File : Yearly/2015/05/01/03/xyz.log

Part 2 : Remote Kafka Cluster ( 172.16.1.96:6667 ) && Local Camus --> topic : test

===================================================================

Here it is running the job but no error and no data is dumped into HDFS

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/deep.shah/tmp

2015-07-07 14:53:13,705 INFO [main] kafka.CamusJob - Dir Destination set to: /camus/topics

2015-07-07 14:53:13,777 INFO [main] kafka.CamusJob - Previous execution: hdfs://test-local-EMPTY/camus/exec/history/2015-07-07-20-10-09

2015-07-07 14:53:13,797 INFO [main] kafka.CamusJob - New execution temp location: /camus/exec/2015-07-07-21-53-13

2015-07-07 14:53:14,612 INFO [main] kafka.CamusJob - Fetching metadata from broker 172.16.1.96:6667 with client id my_client_name for 0 topic(s) []

2015-07-07 14:53:14,706 INFO [main] zlib.ZlibFactory - Successfully loaded & initialized native-zlib library

2015-07-07 14:53:14,707 INFO [main] compress.CodecPool - Got brand-new compressor [.deflate]

2015-07-07 14:53:14,773 INFO [main] kafka.CamusJob - previous offset file:hdfs://test-local-EMPTY/camus/exec/history/2015-07-07-20-10-09/offsets-previous

2015-07-07 14:53:14,823 INFO [main] compress.CodecPool - Got brand-new decompressor [.deflate]

2015-07-07 14:53:14,911 INFO [main] Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2015-07-07 14:53:15,094 INFO [main] mapreduce.JobSubmitter - number of splits:0

2015-07-07 14:53:15,126 INFO [main] Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts

2015-07-07 14:53:15,126 INFO [main] Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress

2015-07-07 14:53:15,244 INFO [main] mapreduce.JobSubmitter - Submitting tokens for job: job_1436297498215_0004

2015-07-07 14:53:15,361 INFO [main] impl.YarnClientImpl - Submitted application application_1436297498215_0004

2015-07-07 14:53:15,384 INFO [main] mapreduce.Job - The url to track the job: http://deep-shah-wsl.internal.salesforce.com:8088/proxy/application_1436297498215_0004/

2015-07-07 14:53:15,385 INFO [main] mapreduce.Job - Running job: job_1436297498215_0004

2015-07-07 14:53:20,471 INFO [main] mapreduce.Job - Job job_1436297498215_0004 running in uber mode : false

2015-07-07 14:53:20,472 INFO [main] mapreduce.Job - map 0% reduce 0%

2015-07-07 14:53:21,499 INFO [main] mapreduce.Job - Job job_1436297498215_0004 completed successfully

2015-07-07 14:53:21,561 INFO [main] mapreduce.Job - Counters: 2

Job Counters

Total time spent by all maps in occupied slots (ms)=0

Total time spent by all reduces in occupied slots (ms)=0

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Group: Job Counters

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Total time spent by all maps in occupied slots (ms): 0

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Total time spent by all reduces in occupied slots (ms): 0

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Group: Job Counters

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Total time spent by all maps in occupied slots (ms): 0

2015-07-07 14:53:21,565 INFO [main] kafka.CamusJob - Total time spent by all reduces in occupied slots (ms): 0

2015-07-07 14:53:21,567 INFO [main] kafka.CamusJob - Moving execution to history : /camus/exec/history/2015-07-07-21-53-13

2015-07-07 14:53:21,583 INFO [main] kafka.CamusJob - Job finished

2015-07-07 14:53:21,616 INFO [main] mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2015-07-07 14:53:21,695 INFO [main] reporter.BaseReporter - ***********Timing Report*************

Job time (seconds):

pre setup 0.0 (0%)

get splits 0.0 (0%)

hadoop job 6.0 (75%)

commit 0.0 (0%)

Total: 0 minutes 8 seconds

Hadoop job task times (seconds):

min 9223372036854776.0

mean NaN

max 0.0

skew NaN/0.0 = NaN

Task wait time (seconds):

min 9223372036854776.0

mean NaN

max 0.0

Hadoop task breakdown:

kafka �

decode �

map output �

other �

Total MB read: 0

Please help me to debug or find solution to this Part 2

Rodolfo Mijangos

unread,

Jul 17, 2015, 5:53:47 PM7/17/15

to camu...@googlegroups.com

I have the same Issue...

Ashish Dutt

unread,

Jul 22, 2015, 11:57:17 PM7/22/15

to Camus - Kafka ETL for Hadoop, dssh...@gmail.com

Deep,

can you share how you were able to execute Part 1. I would like to know what changes you made to the conf.properties file. My apologies I cannot answer part 2 because apparently I'm not able to pass Part 1 itself.

This is what I am doing for part 1

Step 1: create a new topic test using command

/usr/bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Step 2: Next, I list topics using command

/usr/bin/kafka-topics --list --zookeeper localhost:2181 # I see the topic test listed

Step 3: Now, I insert a message into the topic test using the command

/usr/bin/kafka-console-producer --broker-list localhost:9092 --topic test # able to insert a message

Step 4: Finally, I execute the command

/usr/bin//kafka-console-consumer -zookeeper localhost:2181 -topic test -from-beginning # but nothing happens, the cursor just blinks on a new line and then I have to press Ctrl+Z to kill the operation?

Step 5: I try then the command kafka-topics --list --zookeeper localhost:2181 --topic test # To Read data from a Kafka topic and write it to standard output but nothing happens, the cursor just keeps blinking on the next line (I followed this link here )

Executing step 5, leads me to conclude that the message is not being written to the topic? Why it is not being written?

Question 1: How do I see if the message was inserted successfully? what command is it

Question 2: Why cannot I see the topic test in Hue (I'm looking in "File browser- /user/kafka/" and i see camus and exec folder )

What am I doing wrong here. Please suggest.

Thank you,

Ashish

Sean Braithwaite

unread,

Aug 5, 2015, 11:13:13 AM8/5/15

to Camus - Kafka ETL for Hadoop, dssh...@gmail.com

Also Having the same issue.

15/08/05 16:42:53 INFO kafka.CamusJob: fake-producer uri:tcp://xxx.soundcloud.com:9092 leader:1175089688 partition:0 earliest_offset:0 offset:0 latest_offset:771662 avg_msg_size:102e

stimated_size:790181888

15/08/05 16:42:53 WARN common.EmailClient: Email list is empty. Will not send email. Message: The current offset is too close to the earliest offset, Camus might be falling behind: fake-producer uri:tcp://xxx.soundcloud.com:9092 leader:1175089688 partition:0 earliest_offset:0 offset:0 latest_offset:771662 avg_msg_size:1024 estimated_size:790181888

Running the console consumer on the fake-producer topic yields the data that I would expect so the data is in Kafka, for some reason it appears that Camus is just not reading it.

Any help would be appreciated.

Deep Shah

unread,

Aug 5, 2015, 1:14:38 PM8/5/15

to Camus - Kafka ETL for Hadoop

Hi I am finally able to Run my camus job. That encryption error is mainly because either you have issue with proper topic allocation or offset. I am atttaching camus.properties file please have a look.

Also would like to mention that please comment this line "kafka.move.to.last.offset.list= ". This will fetch data from last offset of kafka and hence no data will be fetched.

camus.properties

Sean Braithwaite

unread,

Aug 7, 2015, 8:00:31 AM8/7/15

to Camus - Kafka ETL for Hadoop, dssh...@gmail.com

I don't want to jack this thread but I will mention that my issue was related to the kafka cluster not being accessible from the hadoop cluster. The connection errors appeared in the mapper output and not in console which submitted the job.

Reply all

Reply to author

Forward