Presto reading RCFile

156 views
Skip to first unread message

zl...@netflix.com

unread,
Feb 25, 2014, 10:18:44 PM2/25/14
to presto...@googlegroups.com

Hi,

Is Presto working with RCFile? I did a simple reading from RCFile, and get the following error:

Query 20140226_031332_00048_mdtir failed: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-10-232-27-21.ec2.internal/10.232.27.21"; destination host is: "ip-10-155-133-10.ec2.internal":9000;
java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-10-232-27-21.ec2.internal/10.232.27.21"; destination host is: "ip-10-155-133-10.ec2.internal":9000;
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at com.facebook.presto.hive.HiveSplitSourceProvider$HiveSplitSource.getNextBatch(HiveSplitSourceProvider.java:479)
at com.facebook.presto.execution.SqlStageExecution.scheduleSourcePartitionedNodes(SqlStageExecution.java:634)
at com.facebook.presto.execution.SqlStageExecution.startTasks(SqlStageExecution.java:554)
at com.facebook.presto.execution.SqlStageExecution.access$200(SqlStageExecution.java:93)
at com.facebook.presto.execution.SqlStageExecution$4.run(SqlStageExecution.java:526)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-10-232-27-21.ec2.internal/10.232.27.21"; destination host is: "ip-10-155-133-10.ec2.internal":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy157.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy157.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:747)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.<init>(DistributedFileSystem.java:726)
at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:717)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1780)
at com.facebook.presto.hadoop.HadoopFileSystem.listLocatedStatus(HadoopFileSystem.java:30)
at com.facebook.presto.hive.util.AsyncRecursiveWalker.doWalk(AsyncRecursiveWalker.java:70)
at com.facebook.presto.hive.util.AsyncRecursiveWalker.access$000(AsyncRecursiveWalker.java:31)
at com.facebook.presto.hive.util.AsyncRecursiveWalker$1.run(AsyncRecursiveWalker.java:58)
at com.facebook.presto.hive.util.SuspendingExecutor$1.run(SuspendingExecutor.java:67)
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82)
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41)
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53)
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82)
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41)
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53)
... 3 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

My RCFile is created using hive, here is its detailed info:

# Detailed Table Information
Database:           benchmark
Owner:               dataeng
CreateTime:         Wed Feb 26 03:05:20 UTC 2014
LastAccessTime:     UNKNOWN
Protect Mode:       None
Retention:           0
Location:           hdfs://10.155.133.10:9000/ttl_title_country_r_rc_hdfs
Table Type:         EXTERNAL_TABLE
Table Parameters:
EXTERNAL             TRUE
numFiles             0
numPartitions       0
numRows             0
rawDataSize         0
totalSize           0
transient_lastDdlTime 1393384147

# Storage Information
SerDe Library:       org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
InputFormat:         org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:       org.apache.hadoop.hive.ql.io.RCFileOutputFormat
Compressed:         No
Num Buckets:         -1
Bucket Columns:     []
Sort Columns:       []
Storage Desc Params:
serialization.format 1
Time taken: 0.178 seconds, Fetched: 65 row(s)

any hints on this RCFile failure?

Thanks,
Zhenxiao

Andy Kramolisch

unread,
Feb 25, 2014, 10:33:49 PM2/25/14
to presto...@googlegroups.com
This is an HDFS read exception (you can tell because it spawns from org.apache.hadoop.hdfs.*), not an RCFile read issue. I've seen similar issues twice:

1. Before we properly configured Presto with our federated HDFS info.
2. Running/debugging locally without 'dfs.client.use.datanode.hostname' set in hdfs-site.xml.

All in all, this looks like a DNS resolution issue.


--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
- Andy, L∞ Person
(540) 847-1989

Zhenxiao Luo

unread,
Feb 26, 2014, 8:22:35 PM2/26/14
to presto...@googlegroups.com
Thanks Andy.

Yes, it is an HDFS config problem, not related to Presto at all.

Are you also using Presto in EMR? Or, you are using it in your own datacenter?

Thanks,
Zhenxiao

Andy Kramolisch

unread,
Feb 26, 2014, 8:23:58 PM2/26/14
to presto...@googlegroups.com
We run Presto on EC2, but not via EMR. We have a cluster of machines running Mesos, on top of which we run Presto.

Zhenxiao Luo

unread,
Feb 28, 2014, 1:56:32 PM2/28/14
to presto...@googlegroups.com
Is your Presto worker running on the same node with MapReduce? And
Mesos managing both MapReduce and Presto?

Thanks,
Zhenxiao
Reply all
Reply to author
Forward
0 new messages