HDFS as deep storage. (Broken Pipe - IOException) at time of ingestion

884 views
Skip to first unread message

Deepak Jain

unread,
Apr 22, 2014, 12:25:43 PM4/22/14
to druid-de...@googlegroups.com
Hadoop Version 1.2.1
Single node cluster. (Pseudo distributed mode)
Druid Version 0.6.98
Input is in HDFS.
I have one overlord running and no other nodes are running.


Questions
1. Is 1.2.1v of hadoop a problem ?
2. I included 
"taskSpec" : {
"hadoopCoordinates": "org.apache.hadoop:hadoop-common:1.2.1"
    }
in index_hadoop json at the time of submission. I still see the same exception. The logs show "hadoopDependencyCoordinates" : [ "org.apache.hadoop:hadoop-client:2.3.0" ],. Does that mean 1.2.1v of hadoop was not used ?
3. How to fix the exception ?

Exception
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "hadoop-server-249608/10.65.220.125"; destination host is: "hadoop-server-249608.slc01.dev.ebayc3.com":8020; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy162.getFileInfo(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy162.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1746)
	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399)
	at io.druid.indexer.JobHelper.setupClasspath(JobHelper.java:78)
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:168)
	... 14 more
Caused by: java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
	at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
	at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1009)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	... 4 more
2014-04-22 16:18:17,274 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikipedia_2014-04-22T16:18:03.290Z",
  "status" : "FAILED",
  "duration" : 6256
}

I see same exception with my sample dataset and wikipedia examples.
Any help is appreciated.
Regards,
Deepak

wikipedia_index_hadoop_task.json
ErrorLogs.txt

Deepak Jain

unread,
Apr 22, 2014, 12:59:38 PM4/22/14
to druid-de...@googlegroups.com
I modified the index_hadoop JSON to correctly include hadoopCoordinates. Now i dont see above exception. First M/R job completes, Immediately after that i see below exception

2014-04-22 16:56:50,560 WARN [task-runner-0] org.apache.hadoop.mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2014-04-22 16:56:50,567 INFO [task-runner-0] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-22 16:56:50,585 WARN [task-runner-0] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-04-22 16:56:50,586 WARN [task-runner-0] org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2014-04-22 16:56:51,246 INFO [task-runner-0] io.druid.indexer.IndexGeneratorJob - Job wikipedia-index-generator-Optional.of([2013-08-31T00:00:00.000Z/2013-09-01T00:00:00.000Z]) submitted, status available at http://hadoop-server-249608.slc01.dev.ebayc3.com:50030/jobdetails.jsp?jobid=job_201404221645_0004
2014-04-22 16:56:51,246 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient - Running job: job_201404221645_0004
2014-04-22 16:56:52,250 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -  map 0% reduce 0%
2014-04-22 16:57:01,273 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -  map 100% reduce 0%
2014-04-22 16:57:10,301 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -  map 100% reduce 33%
2014-04-22 16:57:14,313 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -  map 100% reduce 100%
2014-04-22 16:57:16,321 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient - Job complete: job_201404221645_0004
2014-04-22 16:57:16,343 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient - Counters: 29
2014-04-22 16:57:16,343 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -   Job Counters 
2014-04-22 16:57:16,343 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Launched reduce tasks=1
2014-04-22 16:57:16,343 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     SLOTS_MILLIS_MAPS=10723
2014-04-22 16:57:16,343 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Total time spent by all reduces waiting after reserving slots (ms)=0
2014-04-22 16:57:16,344 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Total time spent by all maps waiting after reserving slots (ms)=0
2014-04-22 16:57:16,344 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Launched map tasks=1
2014-04-22 16:57:16,344 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Data-local map tasks=1
2014-04-22 16:57:16,344 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     SLOTS_MILLIS_REDUCES=12322
2014-04-22 16:57:16,345 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -   File Output Format Counters 
2014-04-22 16:57:16,345 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Bytes Written=0
2014-04-22 16:57:16,345 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -   FileSystemCounters
2014-04-22 16:57:16,345 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     FILE_BYTES_READ=1872
2014-04-22 16:57:16,345 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     HDFS_BYTES_READ=1817
2014-04-22 16:57:16,346 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     FILE_BYTES_WRITTEN=256253
2014-04-22 16:57:16,346 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     HDFS_BYTES_WRITTEN=2867
2014-04-22 16:57:16,346 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -   File Input Format Counters 
2014-04-22 16:57:16,346 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Bytes Read=1675
2014-04-22 16:57:16,346 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -   Map-Reduce Framework
2014-04-22 16:57:16,347 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Map output materialized bytes=1872
2014-04-22 16:57:16,347 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Map input records=5
2014-04-22 16:57:16,347 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Reduce shuffle bytes=1872
2014-04-22 16:57:16,347 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Spilled Records=10
2014-04-22 16:57:16,347 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Map output bytes=1846
2014-04-22 16:57:16,348 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Total committed heap usage (bytes)=327155712
2014-04-22 16:57:16,348 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     CPU time spent (ms)=6850
2014-04-22 16:57:16,348 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Combine input records=0
2014-04-22 16:57:16,348 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     SPLIT_RAW_BYTES=142
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Reduce input records=5
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Reduce input groups=1
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Combine output records=0
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Physical memory (bytes) snapshot=425791488
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Reduce output records=0
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Virtual memory (bytes) snapshot=2027036672
2014-04-22 16:57:16,349 INFO [task-runner-0] org.apache.hadoop.mapred.JobClient -     Map output records=5
2014-04-22 16:57:16,364 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikipedia_2014-04-22T16:56:38.034Z, type=index_hadoop, dataSource=wikipedia}]
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:220)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:224)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:203)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.Counter, but interface was expected
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:177)
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:134)
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:80)
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:259)
	... 11 more
2014-04-22 16:57:16,370 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikipedia_2014-04-22T16:56:38.034Z",
  "status" : "FAILED",
  "duration" : 31643
}


Any suggestions ? 

Deepak Jain

unread,
Apr 22, 2014, 1:00:43 PM4/22/14
to druid-de...@googlegroups.com
Do I need to build druid from source, use 1.2.1 version of hadoop (To match that in cluster) and run all nodes with that version ? 
Regards,
Deepak

Deepak Jain

unread,
Apr 22, 2014, 1:03:28 PM4/22/14
to druid-de...@googlegroups.com
If that is true, please let me the know pom.xml that need version change, steps to build and how to build the final druid tar ball ?
Regards,
Deepak

Fangjin Yang

unread,
Apr 22, 2014, 8:02:55 PM4/22/14
to druid-de...@googlegroups.com
Hi Deepak, see inline.


On Tuesday, April 22, 2014 9:25:43 AM UTC-7, Deepak Jain wrote:
Hadoop Version 1.2.1
Single node cluster. (Pseudo distributed mode)
Druid Version 0.6.98
Input is in HDFS.
I have one overlord running and no other nodes are running.


Questions
1. Is 1.2.1v of hadoop a problem ?

It should not be, but we've had interesting cases with different versions of Hadoop not working so well. One thing you can always try it so recompile Druid and replace all hadoop dependencies with your particular version. This has generally worked for others.
 
2. I included 
"taskSpec" : {
"hadoopCoordinates": "org.apache.hadoop:hadoop-common:1.2.1"
    }
in index_hadoop json at the time of submission. I still see the same exception. The logs show "hadoopDependencyCoordinates" : [ "org.apache.hadoop:hadoop-client:2.3.0" ],. Does that mean 1.2.1v of hadoop was not used ?

Hmmm, specifying "hadoopCoordinates" should try and tell druid to get a certain version of hadoop although the logs don't show this. Can you share your full task json?

Fangjin Yang

unread,
Apr 22, 2014, 8:03:57 PM4/22/14
to druid-de...@googlegroups.com
These errors look like dependency conflicts. Some say the hardest problem in big data is getting different versions of Hadoop to work :P

Can you try recompiling Druid with your specific version of Hadoop?

Deepak Jain

unread,
Apr 22, 2014, 11:19:13 PM4/22/14
to druid-de...@googlegroups.com
Hello,
I cloned driud project and at master branch found these files with org.apache.hadoop dependency



On Wednesday, April 23, 2014 5:33:57 AM UTC+5:30, Fangjin Yang wrote:
These errors look like dependency conflicts. Some say the hardest problem in big data is getting different versions of Hadoop to work :P

Can you try recompiling Druid with your specific version of Hadoop?

dvasthimal@hadoopdruidindexer-255663:~/druid/source/druid$ find . -name pom.xml | xargs grep "org.apache.hadoop"
./indexing-hadoop/pom.xml:            <groupId>org.apache.hadoop</groupId>
./indexing-service/pom.xml:            <groupId>org.apache.hadoop</groupId>
./hdfs-storage/pom.xml:            <groupId>org.apache.hadoop</groupId>
./pom.xml:                <groupId>org.apache.hadoop</groupId>
dvasthimal@hadoopdruidindexer-255663:~/druid/source/druid$ 

I modified the version from 2.3.0 to 1.2.1 and ran mvn clean install and build was successful.
How can i build the tar file that i can use to install in all nodes?

Fangjin Yang

unread,
Apr 24, 2014, 10:30:35 PM4/24/14
to druid-de...@googlegroups.com
After running mvn clean install, look under the /services/target folder and you should see both a tarball and a selfcontained that you should be able to use.
Reply all
Reply to author
Forward
0 new messages