Wrong FS: hdfs:///

jeremie.m...@gmail.com

unread,

Aug 6, 2014, 11:59:18 AM8/6/14

to camu...@googlegroups.com

Hi,

I'm testing Camus for Hadoop 2 on HortonWorks sandbox.

It's ok to extract Kafka message in the local file system, but I have an error when I try to write to hdfs :

[LocalJobRunner] - job_local178888545_0001
java.lang.Exception: java.lang.IllegalArgumentException: Wrong FS: hdfs://163.113.177.244:8020/kafka/message_zer/hourly/2014/08/06/08, expected: file:///
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
...

Here is my camus.properties :

etl.destination.path=hdfs://163.113.177.244:8020/kafka
etl.execution.base.path=/home/kafka/logs/exec
etl.execution.history.path=/home/kafka/logs/hist
zookeeper.hosts=163.113.177.244:2181
zookeeper.broker.topics=/brokers/topics
zookeeper.broker.nodes=/brokers/ids
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.StringRecordWriterProvider
kafka.brokers=163.113.177.244:9092
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

Something wrong, but don't know what ...

Thanks,

Jérémie

Ken Goodhope

unread,

Aug 7, 2014, 10:49:38 AM8/7/14

to jeremie.m...@gmail.com, camu...@googlegroups.com

Are you seeing this error in the client, or in the map task?

On 8/6/14, 8:59 AM, "jeremie.m...@gmail.com"

>--
>You received this message because you are subscribed to the Google Groups
>"Camus - Kafka ETL for Hadoop" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to camus_etl+...@googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

jeremie.m...@gmail.com

unread,

Aug 12, 2014, 10:45:30 AM8/12/14

to camu...@googlegroups.com

Hi,

It's in the client log, just after the map task :

...
[EtlRecordReader] -topic:messages_zer partition:0 beginOffset:3 estimatedLastOffset:3
[KafkaReader] - bufferSize=1048576
[KafkaReader] - timeout=30000
[KafkaReader] - Connected to leader tcp://163.113.177.244:9092 beginning reading at offset 3 latest offset=3
[EtlRecordReader] - Records read : 0

[LocalJobRunner] - message_zer:0:0 begin read at 2014-08-12T07:31:59.861-07:00; message_zer:0:1 begin read at 2014-08-12T07:32:04.707-07:00 > map
[Task] - Task:attempt_local1817453151_0001_m_000000_0 is done. And is in the process of committing
[LocalJobRunner] - message_zer:0:0 begin read at 2014-08-12T07:31:59.861-07:00; message_zer:0:1 begin read at 2014-08-12T07:32:04.707-07:00 > map
[Task] - Task attempt_local1817453151_0001_m_000000_0 is allowed to commit now
[LocalJobRunner] - map task executor complete.
[LocalJobRunner] - job_local1817453151_0001
java.lang.Exception: java.lang.IllegalArgumentException: Wrong FS: hdfs://163.113.177.244:8020/kafka/message_zer/hourly/2014/08/12/07, expected: file:///
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://163.113.177.244:8020/kafka/message_zer/hourly/2014/08/12/07, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:79)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:506)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputCommitter.commitTask(EtlMultiOutputCommitter.java:87)
at org.apache.hadoop.mapred.Task.commit(Task.java:1157)
at org.apache.hadoop.mapred.Task.done(Task.java:1019)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[Job] - Job job_local1817453151_0001 failed with state FAILED due to: NA
[Job] - Counters: 22
File System Counters
FILE: Number of bytes read=167741
FILE: Number of bytes written=6199965
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=21689
Map output records=21689
Input split bytes=598
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=57
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=125304832
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=5794808
total
data-read=5749690
decode-time(ms)=2094
event-count=21689
request-time(ms)=740462
[CamusJob] - Group: File System Counters
[CamusJob] - FILE: Number of bytes read: 167741
[CamusJob] - FILE: Number of bytes written: 6199965
[CamusJob] - FILE: Number of read operations: 0
[CamusJob] - FILE: Number of large read operations: 0
[CamusJob] - FILE: Number of write operations: 0
[CamusJob] - Group: Map-Reduce Framework
[CamusJob] - Map input records: 21689
[CamusJob] - Map output records: 21689
[CamusJob] - Input split bytes: 598
[CamusJob] - Spilled Records: 0
[CamusJob] - Failed Shuffles: 0
[CamusJob] - Merged Map outputs: 0
[CamusJob] - GC time elapsed (ms): 57
[CamusJob] - CPU time spent (ms): 0
[CamusJob] - Physical memory (bytes) snapshot: 0
[CamusJob] - Virtual memory (bytes) snapshot: 0
[CamusJob] - Total committed heap usage (bytes): 125304832
[CamusJob] - Group: File Input Format Counters
[CamusJob] - Bytes Read: 0
[CamusJob] - Group: File Output Format Counters
[CamusJob] - Bytes Written: 5794808
[CamusJob] - Group: total
[CamusJob] - data-read: 5749690
[CamusJob] - decode-time(ms): 2094
[CamusJob] - event-count: 21689
[CamusJob] - request-time(ms): 740462
[CamusJob] - Group: File System Counters
[CamusJob] - FILE: Number of bytes read: 167741
[CamusJob] - FILE: Number of bytes written: 6199965
[CamusJob] - FILE: Number of read operations: 0
[CamusJob] - FILE: Number of large read operations: 0
[CamusJob] - FILE: Number of write operations: 0
[CamusJob] - Group: Map-Reduce Framework
[CamusJob] - Map input records: 21689
[CamusJob] - Map output records: 21689
[CamusJob] - Input split bytes: 598
[CamusJob] - Spilled Records: 0
[CamusJob] - Failed Shuffles: 0
[CamusJob] - Merged Map outputs: 0
[CamusJob] - GC time elapsed (ms): 57
[CamusJob] - CPU time spent (ms): 0
[CamusJob] - Physical memory (bytes) snapshot: 0
[CamusJob] - Virtual memory (bytes) snapshot: 0
[CamusJob] - Total committed heap usage (bytes): 125304832
[CamusJob] - Job finished
[JvmMetrics] - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
[CamusJob] - ***********Timing Report*************
Job time (seconds):
pre setup 1.0 (7%)
get splits 1.0 (7%)
hadoop job 10.0 (71%)
commit 0.0 (0%)
Total: 0 minutes 14 seconds

Hadoop job task times (seconds):
min 9223372036854776.0
mean NaN
max 0.0
skew NaN/0.0 = NaN

Task wait time (seconds):
min 9223372036854776.0
mean NaN
max 0.0

Hadoop task breakdown:
kafka â%
decode â%
map output ï¿½
other -â%

Total MB read: 5

[JvmMetrics] - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:355)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:610)
[kafka@sandbox camus-hadoop2]$

Jérémie

Le jeudi 7 août 2014 16:49:38 UTC+2, Ken Goodhope a écrit :
> Are you seeing this error in the client, or in the map task?
>
>
>
>
>
>

Andrew Ehrlich

unread,

Aug 18, 2014, 1:47:53 PM8/18/14

to jeremie.m...@gmail.com, camu...@googlegroups.com

Try this: etl.destination.path, etl.execution.base, and etl.execution.history.path should be file paths, and fs.default.name should be your HDFS (or S3) path.

Like:

etl.destination.path=/kafka/

etl.execution.base.path=/home/kafka/logs/exec
etl.execution.history.path=/home/kafka/logs/hist

fs.default.name=hdfs://163.113.177.244:8020/

jeremie.m...@gmail.com

unread,

Aug 19, 2014, 9:35:09 AM8/19/14

to camu...@googlegroups.com, jeremie.m...@gmail.com

Thank you !

It works fine !

Regards,

Jérémie

Reply all

Reply to author

Forward