I've been evaluating the support for Iceberg tables in Hive/MR3 2.0.
Everything has gone fine until recently. When I copy large, but not
unreasonably large, amounts of data from a normal, partitioned, Hive
table stored as ORC to another table with the same format and
partitioning but stored as Iceberg, I consistently get errors. I've
traced the errors all the way down to HDFS.
Here's a sample query that I'm using.
insert into table monthly_question_i partition (hash)
select * from monthly_question where hash >= '7c0' and hash < '800'
Here's a sample of the errors I see in HDFS datanodes.
2025-05-13 13:51:25,693 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1924010820-127.0.1.1-1708370541174:blk_1101550683_27817671
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:216)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:144)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:25,703 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550683_27817671, type=LAST_IN_PIPELINE: Thread is interrupted.
2025-05-13 13:51:25,703 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550683_27817671, type=LAST_IN_PIPELINE terminating
2025-05-13 13:51:25,703 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1924010820-127.0.1.1-1708370541174:blk_1101550683_27817671 received exception java.io.IOException: Premature EOF from inputStream
2025-05-13 13:51:25,704 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: kbhadoop05:9866:DataXceiver error processing WRITE_BLOCK operation src: /
10.1.6.240:57240 dst: /
10.1.6.236:9866
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:216)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:144)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:27,076 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in PacketResponder.run():
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:462)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:158)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:116)
at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1681)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1612)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1520)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:27,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550684_27817672, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[
10.1.6.241:9866,
10.1.6.234:9866]
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:462)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:158)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:116)
at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1681)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1612)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1520)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:27,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550684_27817672, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[
10.1.6.241:9866,
10.1.6.234:9866] terminating
2025-05-13 13:51:27,080 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1924010820-127.0.1.1-1708370541174:blk_1101550684_27817672
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:216)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:179)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:27,081 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1924010820-127.0.1.1-1708370541174:blk_1101550684_27817672 received exception java.io.IOException: Premature EOF from inputStream
2025-05-13 13:51:27,081 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: kbhadoop05:9866:DataXceiver error processing WRITE_BLOCK operation src: /
10.1.6.157:21577 dst: /
10.1.6.236:9866
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:216)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:179)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:27,091 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1924010820-127.0.1.1-1708370541174:blk_1101550690_27817678 src: /
10.1.6.237:51964 dest: /
10.1.6.236:9866
2025-05-13 13:51:27,096 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
10.1.6.237:51964, dest: /
10.1.6.236:9866, volume: /data1/hdfs/datanode, bytes: 3385, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-2260809_21, offset: 0, srvID: f54ca2f0-e761-473e-a091-0d80fd996c15, blockid: BP-1924010820-127.0.1.1-1708370541174:blk_1101550690_27817678, duration(ns): 1975377
2025-05-13 13:51:27,096 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550690_27817678, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[
10.1.6.235:9866] terminating
2025-05-13 13:51:28,452 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1101550657_27817645 replica FinalizedReplica, blk_1101550657_27817645, FINALIZED
getNumBytes() = 268435456
getBytesOnDisk() = 268435456
getVisibleLength()= 268435456
getVolume() = /data1/hdfs/datanode
getBlockURI() = file:/data1/hdfs/datanode/current/BP-1924010820-127.0.1.1-1708370541174/current/finalized/subdir8/subdir20/blk_1101550657 on volume /data1/hdfs/datanode for deletion
2025-05-13 13:51:28,471 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1924010820-127.0.1.1-1708370541174 blk_1101550657_27817645 URI file:/data1/hdfs/datanode/current/BP-1924010820-127.0.1.1-1708370541174/current/finalized/subdir8/subdir20/blk_1101550657
2025-05-13 13:51:30,059 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1924010820-127.0.1.1-1708370541174:blk_1101550692_27817680 src: /
10.1.6.157:32300 dest: /
10.1.6.236:9866
2025-05-13 13:51:30,066 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
10.1.6.157:32300, dest: /
10.1.6.236:9866, volume: /data1/hdfs/datanode, bytes: 3381, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-2260809_21, offset: 0, srvID: f54ca2f0-e761-473e-a091-0d80fd996c15, blockid: BP-1924010820-127.0.1.1-1708370541174:blk_1101550692_27817680, duration(ns): 2818884
2025-05-13 13:51:30,067 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550692_27817680, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[
10.1.6.241:9866,
10.1.6.235:9866] terminating
2025-05-13 13:51:31,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1924010820-127.0.1.1-1708370541174:blk_1101550688_27817676
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:214)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:144)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
2025-05-13 13:51:31,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550688_27817676, type=LAST_IN_PIPELINE: Thread is interrupted.
2025-05-13 13:51:31,078 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1924010820-127.0.1.1-1708370541174:blk_1101550688_27817676, type=LAST_IN_PIPELINE terminating
2025-05-13 13:51:31,078 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1924010820-127.0.1.1-1708370541174:blk_1101550688_27817676 received exception java.io.IOException: Connection reset by peer
2025-05-13 13:51:31,078 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: kbhadoop05:9866:DataXceiver error processing WRITE_BLOCK operation src: /
10.1.6.240:53282 dst: /
10.1.6.236:9866
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:214)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:221)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:144)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:553)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:1011)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:920)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:299)
at java.base/java.lang.Thread.run(Thread.java:829)
I think that the connection resets/refusals are the core problem but
I'm by no means sure. I found the following article on Stack Overflow
that looks similar.
https://stackoverflow.com/questions/44058613/connection-reset-by-peer-while-running-apache-spark-job
As mentioned in the article, I increased the values for
dfs.namenode.handler.count and ipc.server.listen.queue.size in the
HDFS configuration and net.ipv4.tcp_max_syn_backlog and
net.core.somaxconn in the Linux configuration. That helped a little
bit in one case but that was it.
I've tried other Hive settings such as the following that we
frequently use when writing to multiple partitons. None of them have
seemed to help.
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
hive.exec.max.dynamic.partitions.pernode = 10000
hive.optimize.sort.dynamic.partition.threshold = -1
Has anyone else experienced a problem like this or know of other Hive,
Iceberg or HDFS settings to try? This problem seems to be confined to
cases using partitioning. If I create an Iceberg table in the same
format but without partitioning, I can copy the entire table.
David
--
David Engel
da...@istwok.net