Camus Saying Job Completion . But No Data is written in HDFS.

246 views
Skip to first unread message

Deep Shah

unread,
Jul 7, 2015, 7:15:07 PM7/7/15
to camu...@googlegroups.com
Hello All,

So I am using camus for getting simple JSON logs from Kafka Cluster and dumping into HDFS using Camus Map Reduce. 

Part 1 . Local Kafka Cluster ( localhost:9092 ) && Local Camus (localhost) --> topic : test
===============================================================
Everything is working I am able to fetch logs from Kafka Cluster read the json format by JsonStringMessageDecoder and fetch the '@timestamp' key from json and use this value as my camus timestamp. 'message' key in json will fetch the message and store it into HDFS.

Kafka Sample Log
-------------------
{"message":"cosis`20150421000308.975`3xQTVv...","@version":"1","type":"core","host":"piab1-app1-3-piab.ops.sfdc.net","path":"/home/sfdc/logs/sfdc/na6.sfdc.na6-app3-1-sjl.app.20150421.194.gmt.log","timestamp_produce":"2015-05-27T13:35:49.371Z","process_id":"logstash1","log_record_type":"cosis","@timestamp":"2015-05-01T00:03:08.975Z","fingerprint":"A42BF62105AC0AC7240E09C7D4E4E87E"}

Camus HDFS 

example  File : Yearly/2015/05/01/03/xyz.log 


Part 2 : Remote Kafka Cluster ( 172.16.1.96:6667 ) && Local Camus --> topic : test 
===================================================================
Here it is running the job but no error and no data is dumped into HDFS

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/deep.shah/tmp
2015-07-07 14:53:13,705 INFO  [main] kafka.CamusJob - Dir Destination set to: /camus/topics
2015-07-07 14:53:13,777 INFO  [main] kafka.CamusJob - Previous execution: hdfs://test-local-EMPTY/camus/exec/history/2015-07-07-20-10-09
2015-07-07 14:53:13,797 INFO  [main] kafka.CamusJob - New execution temp location: /camus/exec/2015-07-07-21-53-13
2015-07-07 14:53:14,612 INFO  [main] kafka.CamusJob - Fetching metadata from broker 172.16.1.96:6667 with client id my_client_name for 0 topic(s) []
2015-07-07 14:53:14,706 INFO  [main] zlib.ZlibFactory - Successfully loaded & initialized native-zlib library
2015-07-07 14:53:14,707 INFO  [main] compress.CodecPool - Got brand-new compressor [.deflate]
2015-07-07 14:53:14,773 INFO  [main] kafka.CamusJob - previous offset file:hdfs://test-local-EMPTY/camus/exec/history/2015-07-07-20-10-09/offsets-previous
2015-07-07 14:53:14,823 INFO  [main] compress.CodecPool - Got brand-new decompressor [.deflate]
2015-07-07 14:53:14,911 INFO  [main] Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2015-07-07 14:53:15,094 INFO  [main] mapreduce.JobSubmitter - number of splits:0
2015-07-07 14:53:15,126 INFO  [main] Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2015-07-07 14:53:15,126 INFO  [main] Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-07-07 14:53:15,244 INFO  [main] mapreduce.JobSubmitter - Submitting tokens for job: job_1436297498215_0004
2015-07-07 14:53:15,361 INFO  [main] impl.YarnClientImpl - Submitted application application_1436297498215_0004
2015-07-07 14:53:15,384 INFO  [main] mapreduce.Job - The url to track the job: http://deep-shah-wsl.internal.salesforce.com:8088/proxy/application_1436297498215_0004/
2015-07-07 14:53:15,385 INFO  [main] mapreduce.Job - Running job: job_1436297498215_0004
2015-07-07 14:53:20,471 INFO  [main] mapreduce.Job - Job job_1436297498215_0004 running in uber mode : false
2015-07-07 14:53:20,472 INFO  [main] mapreduce.Job -  map 0% reduce 0%
2015-07-07 14:53:21,499 INFO  [main] mapreduce.Job - Job job_1436297498215_0004 completed successfully
2015-07-07 14:53:21,561 INFO  [main] mapreduce.Job - Counters: 2
Job Counters 
Total time spent by all maps in occupied slots (ms)=0
Total time spent by all reduces in occupied slots (ms)=0
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Group: Job Counters 
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Total time spent by all maps in occupied slots (ms): 0
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Total time spent by all reduces in occupied slots (ms): 0
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Group: Job Counters 
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Total time spent by all maps in occupied slots (ms): 0
2015-07-07 14:53:21,565 INFO  [main] kafka.CamusJob - Total time spent by all reduces in occupied slots (ms): 0
2015-07-07 14:53:21,567 INFO  [main] kafka.CamusJob - Moving execution to history : /camus/exec/history/2015-07-07-21-53-13
2015-07-07 14:53:21,583 INFO  [main] kafka.CamusJob - Job finished
2015-07-07 14:53:21,616 INFO  [main] mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-07-07 14:53:21,695 INFO  [main] reporter.BaseReporter - ***********Timing Report*************
Job time (seconds):
       pre setup    0.0 (0%)
      get splits    0.0 (0%)
      hadoop job    6.0 (75%)
          commit    0.0 (0%)
Total: 0 minutes 8 seconds

Hadoop job task times (seconds):
             min 9223372036854776.0
            mean    NaN
             max    0.0
            skew    NaN/0.0 = NaN

Task wait time (seconds):
             min 9223372036854776.0
            mean    NaN
             max    0.0

Hadoop task breakdown:
           kafka �
          decode �
      map output �
           other �

  Total MB read: 0


Please help me to debug or find solution to this Part 2



Rodolfo Mijangos

unread,
Jul 17, 2015, 5:53:47 PM7/17/15
to camu...@googlegroups.com
I have the same Issue... 

Ashish Dutt

unread,
Jul 22, 2015, 11:57:17 PM7/22/15
to Camus - Kafka ETL for Hadoop, dssh...@gmail.com
Deep,
can you share how you were able to execute Part 1. I would like to know what changes you made to the conf.properties file. My apologies I cannot answer part 2 because apparently I'm not able to pass Part 1 itself.
This is what I am doing for part 1

Step 1: create a new topic test using command 
                       /usr/bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Step 2: Next, I list topics using command 
                      /usr/bin/kafka-topics --list --zookeeper localhost:2181 # I see the topic test listed
Step 3: Now, I insert a message into the topic test using the command 
                      /usr/bin/kafka-console-producer --broker-list localhost:9092 --topic test  # able to insert a message
Step 4: Finally, I execute the command 
                      /usr/bin//kafka-console-consumer -zookeeper localhost:2181 -topic test -from-beginning  # but nothing happens, the cursor just blinks on a new line and then I have to press Ctrl+Z to kill the operation?

Step 5: I try then the command kafka-topics --list --zookeeper localhost:2181 --topic test # To Read data from a Kafka topic and write it to standard output but nothing happens, the cursor just keeps blinking on the next line (I followed this link here )

Executing step 5, leads me to conclude that the message is not being written to the topic? Why it is not being written? 

Question 1: How do I see if the message was inserted successfully? what command is it 
Question 2: Why cannot I see the topic test in Hue (I'm looking in "File browser- /user/kafka/" and i see camus and exec folder )

What am I doing wrong here. Please suggest.

Thank you,
Ashish

Sean Braithwaite

unread,
Aug 5, 2015, 11:13:13 AM8/5/15
to Camus - Kafka ETL for Hadoop, dssh...@gmail.com
Also Having the same issue.

15/08/05 16:42:53 INFO kafka.CamusJob: fake-producer    uri:tcp://xxx.soundcloud.com:9092      leader:1175089688       partition:0     earliest_offset:0       offset:0        latest_offset:771662    avg_msg_size:102e
stimated_size:790181888
15/08/05 16:42:53 WARN common.EmailClient: Email list is empty. Will not send email. Message: The current offset is too close to the earliest offset, Camus might be falling behind: fake-producer      uri:tcp://xxx.soundcloud.com:9092       leader:1175089688       partition:0     earliest_offset:0       offset:0        latest_offset:771662    avg_msg_size:1024       estimated_size:790181888

Running the console consumer on the fake-producer topic yields the data that I would expect so the data is in Kafka, for some reason it appears that Camus is just not reading it.

Any help would be appreciated.

Deep Shah

unread,
Aug 5, 2015, 1:14:38 PM8/5/15
to Camus - Kafka ETL for Hadoop
Hi I am finally able to Run my camus job. That encryption error is mainly because either you have issue with proper topic allocation or offset. I am atttaching camus.properties file please have a look. 

Also would like to mention that please comment this line "kafka.move.to.last.offset.list= ". This will fetch data from last offset of kafka and hence no data will be fetched.
camus.properties

Sean Braithwaite

unread,
Aug 7, 2015, 8:00:31 AM8/7/15
to Camus - Kafka ETL for Hadoop, dssh...@gmail.com
I don't want to jack this thread but I will mention that my issue was related to the kafka cluster not being accessible from the hadoop cluster. The connection errors appeared in the mapper output and not in console which submitted the job.
Reply all
Reply to author
Forward
0 new messages