Failed to free block due to concurrent read

77 views
Skip to first unread message

Shuai Zhang

unread,
Mar 31, 2016, 12:55:12 AM3/31/16
to Alluxio Users
Hi there,

I found an abnormal space usage in Alluxio master web UI after several MapReduce jobs. After delete all files in Alluxio CLI, there are still 33% usage. (see figure"extra-usage.png")

I choose one of the most used worker to check the log. (see figure"workers.png") There are many warnings as below: (see "worker_log.zip" for full worker log)
2016-03-30 21:39:19,073 WARN  logger.type (BlockMasterSync.java:run) - Failed master free block cmd for: 2885681156 due to concurrent read.
java.io.IOException: Failed to delete R:/alluxioworker/2885681156
at alluxio.util.io.FileUtils.delete(FileUtils.java:136)
at alluxio.worker.block.TieredBlockStore.removeBlockInternal(TieredBlockStore.java:856)
at alluxio.worker.block.TieredBlockStore.removeBlock(TieredBlockStore.java:315)
at alluxio.worker.block.TieredBlockStore.removeBlock(TieredBlockStore.java:309)
at alluxio.worker.block.BlockWorker.removeBlock(BlockWorker.java:557)
at alluxio.worker.block.BlockMasterSync$BlockRemover.run(BlockMasterSync.java:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Only a few tasks of MapReduce job failed. The exception is:
Error: java.io.IOException: alluxio.exception.BlockInfoException: Cannot complete a file without all the blocks committed
        at alluxio.client.file.FileOutStream.close(FileOutStream.java:173)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at shuaiz.IOMapperBase.map(IOMapperBase.java:138)
        at shuaiz.IOMapperBase.map(IOMapperBase.java:39)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: alluxio.exception.BlockInfoException: Cannot complete a file without all the blocks committed
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at alluxio.exception.AlluxioException.from(AlluxioException.java:73)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:324)
        at alluxio.client.file.FileSystemMasterClient.completeFile(FileSystemMasterClient.java:130)
        at alluxio.client.file.FileOutStream.close(FileOutStream.java:171)
        ... 12 more
See ALLUXIO-1853
for more information.

Why Alluxio failed to remove the blocks? How to solve this problem without a re-format of Alluxio worker?


Regards,
Shuai Zhang
extra-usage.png
workers.png
worker_log.zip

and...@alluxio.com

unread,
Apr 4, 2016, 8:41:14 PM4/4/16
to Alluxio Users
Hi Shuai,

From the worker log, it looks like you're getting Cannot complete a file without all the blocks committed because the worker is running out of disk:

2016-03-30 21:25:47,809 ERROR logger.type (DataServerHandler.java:handleBlockWriteRequest) - Error writing remote block : There is not enough space on the disk
java
.io.IOException: There is not enough space on the disk
 at sun
.nio.ch.FileDispatcherImpl.truncate0(Native Method)
 at sun
.nio.ch.FileDispatcherImpl.truncate(FileDispatcherImpl.java:93)
 at sun
.nio.ch.FileChannelImpl.map(FileChannelImpl.java:887)
 at alluxio
.worker.block.io.LocalFileBlockWriter.write(LocalFileBlockWriter.java:76)

The worker also has many messages along the lines of 

java.io.IOException: Failed to delete R:/alluxioworker/3808428035

 at alluxio
.util.io.FileUtils.delete(FileUtils.java:136)
 at alluxio
.worker.block.TieredBlockStore.removeBlockInternal(TieredBlockStore.java:856)
 at alluxio
.worker.block.TieredBlockStore.removeBlock(TieredBlockStore.java:315)
 at alluxio
.worker.block.TieredBlockStore.removeBlock(TieredBlockStore.java:309)
 at alluxio
.worker.block.BlockWorker.removeBlock(BlockWorker.java:557)

This is the worker trying to delete temporary files that it created from its ramdisk. It's not clear why it's unable to delete them.

Can you explain a little more about your setup, like what version of Alluxio you are using and what changes you made from the default configuration? Also, can you look in R:/alluxioworker and see if it is empty after you delete all the files in Alluxio?

Best,

Andrew

Shuai Zhang

unread,
Apr 6, 2016, 9:10:40 AM4/6/16
to Alluxio Users
You are right Andrew. The worker is running out of disk due to the reason mentioned in another thread: Is it possible to store data to next tier level via Hadoop?

The version of Alluxio is v1.0.1, configuration file is attached in the thread mentioned above.

There are quite a few files in R:\alluxioworker after I clear all files via "AlluxioShell rm -R"

在 2016年4月5日星期二 UTC+8上午8:41:14,and...@alluxio.com写道:

and...@alluxio.com

unread,
Apr 7, 2016, 5:13:12 PM4/7/16
to Alluxio Users
If the files exist and the delete is still failing, there's probably a permissions issue. Do the permissions/owners of those files make sense? Have you run Alluxio as different users?

Bin Fan

unread,
Apr 8, 2016, 2:16:30 AM4/8/16
to Alluxio Users
Alluxio 1.0.1 fixed a bug of sticky bits when deleting block files from Alluxio managed space, but that is mostly related to Linux / Mac OS.
I don't know if that similar problem may still happen on Windows.

Shuai Zhang

unread,
Apr 12, 2016, 3:14:56 AM4/12/16
to Alluxio Users
There shouldn't be permission issue. I disabled codes related to changing permission of files or folders. I can delete the files with the same account launching Alluxio worker.

在 2016年4月8日星期五 UTC+8上午5:13:12,and...@alluxio.com写道:

Shuai Zhang

unread,
Apr 12, 2016, 3:16:05 AM4/12/16
to Alluxio Users
I belive this issue was related to the failure of committing blocks when running out of spaces.

在 2016年4月8日星期五 UTC+8下午2:16:30,Bin Fan写道:
Reply all
Reply to author
Forward
0 new messages