Why the data in memory still not be persisted into hdfs with setting write type to "ASYNC_THROUGH" ?

65 views
Skip to first unread message

Kaiming Wan

unread,
Oct 11, 2016, 7:09:55 AM10/11/16
to Alluxio Users, fanb...@gmail.com
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

Kaiming Wan

unread,
Oct 12, 2016, 4:23:58 AM10/12/16
to Alluxio Users, fanb...@gmail.com
The configuration can make effect when upload a smaller file such as 512MB. Even when I use upload a 1GB file, the configuration doesn't work.

在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:

Kaiming Wan

unread,
Oct 12, 2016, 5:15:57 AM10/12/16
to Alluxio Users, fanb...@gmail.com
And I find that the output data( several bytes) of MapReduce job will not being put into hdfs , though "ASYNC_THROUGH" is set.


在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

Yupeng Fu

unread,
Oct 12, 2016, 10:34:21 AM10/12/16
to Kaiming Wan, Alluxio Users, Bin Fan
Hi Kaiming,

Can you paste the logs from master and workers?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kaiming Wan

unread,
Oct 12, 2016, 10:07:49 PM10/12/16
to Alluxio Users, fanb...@gmail.com
Hi Yupeng Fu,

    The master.log:

2016-10-13 10:00:42,071 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [10.8.12.16:19998, 10.8.12.17:19998]
2016-10-13 10:00:42,072 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: 10.8.12.16:19998
2016-10-13 10:00:43,911 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [10.8.12.16:19998, 10.8.12.17:19998]
2016-10-13 10:00:43,912 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: 10.8.12.16:19998
2016-10-13 10:00:43,998 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - Master addresses: [10.8.12.16:19998, 10.8.12.17:19998]
2016-10-13 10:00:43,998 INFO  logger.type (LeaderInquireClient.java:getMasterAddress) - The leader master: 10.8.12.16:19998
2016-10-13 10:02:06,408 ERROR logger.type (DefaultAsyncPersistHandler.java:getWorkerStoringFile) - Not all the blocks of file /linecount/1G.txt stored on the same worker
2016-10-13 10:02:06,408 ERROR logger.type (DefaultAsyncPersistHandler.java:scheduleAsyncPersistence) - No worker found to schedule async persistence for file /linecount/1G.txt

the worker.log
2016-10-13 10:04:40,707 INFO  logger.type (BlockMasterSync.java:run) - Block 1952767279104 removed at session -4
2016-10-13 10:05:10,710 INFO  logger.type (BlockMasterSync.java:run) - Block 1953086046208 removed at session -4
2016-10-13 10:05:10,710 INFO  logger.type (BlockMasterSync.java:run) - Block 1953169932288 removed at session -4
2016-10-13 10:05:20,712 INFO  logger.type (BlockMasterSync.java:run) - Block 1953186709504 removed at session -4
2016-10-13 10:05:20,712 INFO  logger.type (BlockMasterSync.java:run) - Block 1953270595584 removed at session -4
2016-10-13 10:05:50,715 INFO  logger.type (BlockMasterSync.java:run) - Block 1953505476608 removed at session -4
2016-10-13 10:06:10,716 INFO  logger.type (BlockMasterSync.java:run) - Block 1953790689280 removed at session -4






By the way, I used the command "alluxio fs -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.RoundRobinPolicy copyFromLocal 1G.txt /linecount/1G.txt" to load the file into alluxio.


在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

Kaiming Wan

unread,
Oct 12, 2016, 10:38:36 PM10/12/16
to Alluxio Users, fanb...@gmail.com
I google the issue and find a relation posts which you are also involved.



My issue have a lot in common with it such as:

1. The "Persistence State" of the files becomes "IN_PROGRESS" and it never changes
back to "PERSISTED". 
2. The master.log give out the same error "No worker found to schedule async persistence for file "
3. When running a spark job(calculate how many lines in the 1GB file), I also get the ERROR. And I can see the stage is blocked and will never finish.
ERROR type: java.net.SocketTimeoutException: Read timed out
alluxio
.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out



The difference is :
 I can't find any errors or exceptions after upload the 1GB file.


Other info:

I use spark stream to execute analyse job and write data into alluxio in real time. However, the data produced by spark stream is also not persisted into hdfs. And the state of those file are all "NOT_PERSISTED". The state will never change to "PERSISTED" or "IN_PROGRESS". I have to persist it manually.



By the way, would you mind telling me the following extra question?

1. why my session id shown in worker.log is a negative number -4?
2. What does the INFO log which says alluxio is removing block periodically mean?




在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

Kaiming Wan

unread,
Oct 12, 2016, 10:40:25 PM10/12/16
to Alluxio Users, fanb...@gmail.com
Watching my mistake:

The difference is :
 I can't find any errors or exceptions after upload the 1GB file in worker.log.

在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

Kaiming Wan

unread,
Oct 13, 2016, 3:21:52 AM10/13/16
to Alluxio Users, fanb...@gmail.com
After configure alluxio.user.network.netty.timeout.ms to a larger value, and redo the spark line count job. I find the read timeout error is missing from master log. However, there is new in worker.log, shown as follows. And the job is stuck during the stage like:

[Stage 1:=============================>                          (42 + 38)  /80]

2016-10-13 15:08:44,825 ERROR logger.type (BlockDataServerHandler.java:handleBlockWriteRequest) - Error writing remote block : Temp blockId 201326592 is not available, because it already exists
alluxio
.exception.BlockAlreadyExistsException: Temp blockId 201326592 is not available, because it already exists
 at alluxio
.worker.block.TieredBlockStore.checkTempBlockIdAvailable(TieredBlockStore.java:390)
 at alluxio
.worker.block.TieredBlockStore.createBlockMetaInternal(TieredBlockStore.java:521)
 at alluxio
.worker.block.TieredBlockStore.createBlockMeta(TieredBlockStore.java:185)
 at alluxio
.worker.block.DefaultBlockWorker.createBlockRemote(DefaultBlockWorker.java:299)
 at alluxio
.worker.netty.BlockDataServerHandler.handleBlockWriteRequest(BlockDataServerHandler.java:147)
 at alluxio
.worker.netty.DataServerHandler.channelRead0(DataServerHandler.java:73)
 at alluxio
.worker.netty.DataServerHandler.channelRead0(DataServerHandler.java:42)
 at io
.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
 at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
 at io
.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
 at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
 at io
.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
 at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
 at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
 at io
.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
 at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
 at io
.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
 at io
.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:831)
 at io
.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:322)
 at io
.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
 at io
.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java
.lang.Thread.run(Thread.java:745)

By the way, I successfully run the spark line count job to count lines for a 10GB file at first. When I run the same job,but with alluxio. The issue appeared.

I use the following command at spark-shell:

val s1 = sc.textFile("alluxio://10.8.12.16:19998/linecount/10G.txt",80)
s1.count()

val s2 = sc.textFile("hdfs://ns/alluxio/linecount/10G.txt",80)
s2.count()


Kaiming Wan

unread,
Oct 14, 2016, 3:01:44 AM10/14/16
to Alluxio Users, wan...@gmail.com, fanb...@gmail.com
My configuration info:


alluxio
-env.sh:


ALLUXIO_MASTER_HOSTNAME
=${ALLUXIO_MASTER_HOSTNAME:-"10.8.12.16"}
ALLUXIO_WORKER_MEMORY_SIZE
=${ALLUXIO_WORKER_MEMORY_SIZE:-"100GB"}
ALLUXIO_RAM_FOLDER
=${ALLUXIO_RAM_FOLDER:-"/home/appadmin/ramdisk"}
ALLUXIO_UNDERFS_ADDRESS
=${ALLUXIO_UNDERFS_ADDRESS:-"hdfs://ns/alluxio/data"}






export ALLUXIO_MASTER_HOSTNAME=10.8.12.16


alluxio-site.properties

alluxio.underfs.hdfs.configuration=/home/appadmin/hadoop-2.7.2/etc/hadoop/core-site.xml
alluxio
.zookeeper.enabled=true
alluxio
.zookeeper.address=10.8.12.16:2181,10.8.12.17:2181,10.8.12.18:2181
alluxio
.master.journal.folder=hdfs://ns/alluxio/journal
alluxio
.security.authentication.socket.timeout.ms=3000000
alluxio
.worker.block.heartbeat.timeout.ms=300000
alluxio
.keyvalue.enabled=true
alluxio
.network.thrift.frame.size.bytes.max=64MB
alluxio
.user.network.netty.timeout.ms=30000
alluxio
.worker.session.timeout.ms=300000
alluxio
.user.file.writetype.default=ASYNC_THROUGH
alluxio
.user.network.netty.timeout.ms=300000
alluxio
.user.block.size.bytes.default=64MB
alluxio
.keyvalue.partition.size.bytes.max=64MB
alluxio
.user.block.remote.read.buffer.size.bytes=64MB





在 2016年10月12日星期三 UTC+8下午10:34:21,Yupeng Fu写道:
Hi Kaiming,

Can you paste the logs from master and workers?
On Wed, Oct 12, 2016 at 1:23 AM, Kaiming Wan <wan...@gmail.com> wrote:
The configuration can make effect when upload a smaller file such as 512MB. Even when I use upload a 1GB file, the configuration doesn't work.

在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.

Kaiming Wan

unread,
Oct 14, 2016, 3:45:12 AM10/14/16
to Alluxio Users, wan...@gmail.com, fanb...@gmail.com
I did several tests and find that the config "ASYNC_THROUGH" works well when I use the command "alluxio fs load 10G.txt /linecount/10G.txt".

When I use the command "alluxio fs -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.RoundRobinPolicy copyFromLocal 512MB.txt /linecount/512MB.txt" to upload a file whose size is bigger than block size, it fails to write to the hdfs(Not even a byte can be wrote to hdfs). 

And the worker.log keeps showing logs:
2016-10-14 15:40:10,236 INFO  logger.type (FileUtils.java:createStorageDirPath) - Folder /home/appadmin/ramdisk/alluxioworker/.tmp_blocks/870 was created!
2016-10-14 15:40:10,393 INFO  logger.type (BlockMasterSync.java:run) - Block 93667196928 removed at session -4


Many tmp blocks are created and finally the netty timeout value is reached.


I config hdfs in HA mode. Does it have effect on alluxio?



在 2016年10月12日星期三 UTC+8下午10:34:21,Yupeng Fu写道:
Hi Kaiming,

Can you paste the logs from master and workers?
On Wed, Oct 12, 2016 at 1:23 AM, Kaiming Wan <wan...@gmail.com> wrote:
The configuration can make effect when upload a smaller file such as 512MB. Even when I use upload a 1GB file, the configuration doesn't work.

在 2016年10月11日星期二 UTC+8下午7:09:55,Kaiming Wan写道:
I have configured the write type with "ASYNC_THROUGH". When I use the command "copyFromLocal" to upload a 10GB file to alluxio. After waiting for several hours, there is still no data being persisted into hdfs. What cause the issue?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.

Yupeng Fu

unread,
Oct 14, 2016, 1:10:49 PM10/14/16
to Kaiming Wan, Alluxio Users, Bin Fan
Hi Kaiming,

That's expected. Because as an experimental feature, right now ASYNC_THROUGH requires all the blocks of a file must reside in the same worker. You should find a message in master log of this.

Hope this helps,
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.

Kaiming Wan

unread,
Oct 16, 2016, 10:23:29 PM10/16/16
to Alluxio Users, wan...@gmail.com, fanb...@gmail.com
Hi Yupeng,

    Thanks for your reply. You are right, and I find the ERROR in master.log:

2016-10-14 15:37:25,052 ERROR logger.type (DefaultAsyncPersistHandler.java:getWorkerStoringFile) - Not all the blocks of file /linecount/512MB.txt stored on the same worker




在 2016年10月15日星期六 UTC+8上午1:10:49,Yupeng Fu写道:

Yupeng Fu

unread,
Nov 18, 2016, 1:04:14 PM11/18/16
to Kaiming Wan, Alluxio Users, Bin Fan
Glad you found the issue.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages