ASYNC_THROUGH write type with S3 underfs

150 views
Skip to first unread message

Xiaohui Liu

unread,
May 25, 2016, 10:28:14 PM5/25/16
to Alluxio Users
Hi, gurus

While using S3 as my underfs, I have set alluxio.user.file.writetype.default=ASYNC_THROUGH in alluxio-site.properties. 

Then I mount a s3 dir to alluxio and upload a few files to the mount point. However, I did not see the files being persisted to S3. Is there any other settings that I should pay attention to?

Regards
Xiaohui

Jiří Šimša

unread,
May 26, 2016, 10:37:05 AM5/26/16
to Xiaohui Liu, Alluxio Users
Hi Xiaohui,

Were the any errors in the log files? If not, how big were the files you were uploading and for how long did you wait before checking S3? The asynchronous persistence happens in the background and if the uploaded files are big relative to your S3 bandwidth, the background upload can take some time to complete.

Best,

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša

Adam Storm

unread,
Aug 25, 2016, 3:11:27 PM8/25/16
to Alluxio Users, her...@gmail.com
I'm having the same problem, and can't figure out what's going on.  I'm trying to use ASYNC_THROUGH with an S3 underfs and the written file shows up in the UI with persistence state "IN_PROGRESS" indefinitely (even if I send no additional write requests to Alluxio - i.e. Alluxio is idle).  I've tried with various file sizes (1 MB - 1 GB, and in-between) with no luck.  What's strange is that if I issue an 'alluxio fs persist' command, the file gets persisted.

I've poked around in the logs a bit, but can't find anything obvious there.

Is there some explicit configuration required to get ASYNC_THROUGH working?

I'm running Alluxio 1.2.0.

Yupeng Fu

unread,
Aug 25, 2016, 3:19:22 PM8/25/16
to Adam Storm, Alluxio Users, her...@gmail.com
Hi Adam,

Can you paste the logs from master and workers? Also, make sure you set the s3 credentials on all the master and workers.

Thanks,
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.

Jiří Šimša

unread,
Aug 25, 2016, 3:22:44 PM8/25/16
to Adam Storm, Alluxio Users, Xiaohui Liu
Hi Adam,

Is you workload using HDFS API or native Alluxio API? In particular, how do you configure the write type (e.g. alluxio-site.properties file vs. passing CreateFileOptions to CreateFile)?

Best,

On Thu, Aug 25, 2016 at 12:11 PM, Adam Storm <ajs...@gmail.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Adam Storm

unread,
Aug 25, 2016, 6:22:54 PM8/25/16
to Alluxio Users, ajs...@gmail.com, her...@gmail.com
Thanks for the quick responds Yupeng. 

What makes this trickier to diagnose is that I'm running this through the liballuxio C++ API I built in conjunction with Ryan Huang.  I did some more debugging though, and found that things work correctly when I change Alluxio's default write policy to ASYNC_THROUGH and copyFromLocal using the alluxio fs command. 

In liballuxio, we specify the s3 credentials by first calling Configuration.set() and then calling ClientContext.init().  This works for CACHE_THROUGH to s3, when running locally with one master and worker.  For ASYNC_THROUGH do I have to configure things differently?

Adam Storm

unread,
Aug 25, 2016, 8:20:10 PM8/25/16
to Alluxio Users, ajs...@gmail.com, her...@gmail.com
Just re-ran, and this showed up in the worker log:

2016-08-26 00:04:43,105 INFO  logger.type (FileUtils.java:createStorageDirPath) - Folder /mnt/ramdisk/alluxioworker was created!
2016-08-26 00:04:43,807 ERROR logger.type (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2016-08-26 00:04:44,045 INFO  server.Server (Server.java:doStart) - jetty-7.x.y-SNAPSHOT
2016-08-26 00:04:44,078 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.s.ServletContextHandler{/metrics/json,null}
2016-08-26 00:04:44,185 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.w.WebAppContext{/,file:/usr/local/alluxio-1.2.0/core/server/src/main/webapp/},/usr/local/alluxio-1.2.0/core/server/src/main/webapp
2016-08-26 00:04:47,981 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChann...@0.0.0.0:30000
2016-08-26 00:04:47,981 INFO  logger.type (UIWebServer.java:startWebServer) - Alluxio Worker Web service started @ 0.0.0.0/0.0.0.0:30000
2016-08-26 00:04:47,982 INFO  logger.type (AbstractClient.java:connect) - Alluxio client (version 1.2.0) is trying to connect with BlockMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:04:47,988 INFO  logger.type (AbstractClient.java:connect) - Client registered with BlockMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:04:48,846 INFO  logger.type (AlluxioWorker.java:start) - Started Alluxio worker with id 1
2016-08-26 00:04:48,846 INFO  logger.type (AlluxioWorker.java:start) - Alluxio worker version 1.2.0 started @ 5d4e4d7765a7/172.18.0.2:29998
2016-08-26 00:04:49,846 INFO  logger.type (AbstractClient.java:connect) - Alluxio client (version 1.2.0) is trying to connect with FileSystemMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:04:49,847 INFO  logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:04:49,848 INFO  logger.type (AbstractClient.java:connect) - Alluxio client (version 1.2.0) is trying to connect with FileSystemMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:04:49,848 INFO  logger.type (AbstractClient.java:connect) - Client registered with FileSystemMasterWorker master @ localhost/127.0.0.1:19998
2016-08-26 00:05:33,388 INFO  logger.type (FileUtils.java:createStorageDirPath) - Folder /mnt/ramdisk/alluxioworker/.tmp_blocks/53 was created!
2016-08-26 00:05:34,944 WARN  logger.type (SleepingTimer.java:tick) - Worker FileSystemMaster Sync last execution took 1093 ms. Longer than the interval 1000
2016-08-26 00:05:34,977 ERROR logger.type (FileDataManager.java:needPersistence) - Failed to check if file 50331647 exists in under storage system
java.io.IOException: alluxio.exception.FileDoesNotExistException: Inode id 50,331,647 does not exist
        at alluxio.worker.block.BlockWorker.getFileInfo(BlockWorker.java:635)
        at alluxio.worker.file.FileDataManager.fileExistsInUfs(FileDataManager.java:158)
        at alluxio.worker.file.FileDataManager.needPersistence(FileDataManager.java:115)
        at alluxio.worker.file.FileWorkerMasterSyncExecutor$FilePersister.run(FileWorkerMasterSyncExecutor.java:140)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: alluxio.exception.FileDoesNotExistException: Inode id 50,331,647 does not exist
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at alluxio.exception.AlluxioException.fromThrift(AlluxioException.java:99)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:326)
        at alluxio.worker.file.FileSystemMasterClient.getFileInfo(FileSystemMasterClient.java:83)
        at alluxio.worker.block.BlockWorker.getFileInfo(BlockWorker.java:633)
        ... 6 more
2016-08-26 00:05:34,977 INFO  logger.type (FileUtils.java:createStorageDirPath) - Folder /mnt/ramdisk/alluxioworker/.tmp_blocks/766 was created!
2016-08-26 00:05:34,978 INFO  logger.type (FileWorkerMasterSyncExecutor.java:run) - persist file 50331647 of blocks [33554432]
2016-08-26 00:05:34,980 ERROR logger.type (FileWorkerMasterSyncExecutor.java:run) - Failed to persist file 50331647
java.io.IOException: alluxio.exception.FileDoesNotExistException: Inode id 50,331,647 does not exist
        at alluxio.worker.block.BlockWorker.getFileInfo(BlockWorker.java:635)
        at alluxio.worker.file.FileDataManager.prepareUfsFilePath(FileDataManager.java:297)
        at alluxio.worker.file.FileDataManager.persistFile(FileDataManager.java:226)
        at alluxio.worker.file.FileWorkerMasterSyncExecutor$FilePersister.run(FileWorkerMasterSyncExecutor.java:149)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: alluxio.exception.FileDoesNotExistException: Inode id 50,331,647 does not exist
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at alluxio.exception.AlluxioException.fromThrift(AlluxioException.java:99)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:326)
        at alluxio.worker.file.FileSystemMasterClient.getFileInfo(FileSystemMasterClient.java:83)
        at alluxio.worker.block.BlockWorker.getFileInfo(BlockWorker.java:633)
        ... 6 more
2016-08-26 00:05:36,729 INFO  logger.type (FileWorkerMasterSyncExecutor.java:run) - persist file 67108863 of blocks [50331648]
2016-08-26 00:05:36,731 INFO  logger.type (FileDataManager.java:prepareUfsFilePath) - persist file 67108863 at s3a://test-bluspark/alluxio/ajstorm/alluxiotest/output.dat
2016-08-26 00:05:39,945 INFO  logger.type (FileWorkerMasterSyncExecutor.java:heartbeat) - files [67108863] persisted

Adam Storm

unread,
Aug 25, 2016, 8:50:40 PM8/25/16
to Alluxio Users
There's also this.  Is the heartbeat thread responsible for async persistence?  If so, does the heartbeat thread shutting down imply that async persistence will stop until Alluxio is restarted?

2016-08-26 00:21:16,050 ERROR logger.type (HeartbeatThread.java:run) - Uncaught exception in heartbeat executor, Heartbeat Thread shutting down
java.lang.RuntimeException: alluxio.exception.FileDoesNotExistException: Inode id 100,663,295 does not exist
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:297)
        at alluxio.worker.file.FileSystemMasterClient.heartbeat(FileSystemMasterClient.java:116)
        at alluxio.worker.file.FileWorkerMasterSyncExecutor.heartbeat(FileWorkerMasterSyncExecutor.java:83)
        at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:67)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: alluxio.exception.FileDoesNotExistException: Inode id 100,663,295 does not exist

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at alluxio.exception.AlluxioException.fromThrift(AlluxioException.java:99)

Gene Pang

unread,
Aug 26, 2016, 9:01:19 AM8/26/16
to Alluxio Users
Hi Adam,

What does the master logs say during this time? It looks like it cannot find the file that the worker is telling the master about. Do you have an idea why the worker is referring to a file that the master doesn't know about?

Thanks,
Gene

Adam Storm

unread,
Aug 26, 2016, 11:56:57 AM8/26/16
to Alluxio Users
Hey Gene,

The master doesn't have anything of note.  Here are the last few log entries:

2016-08-26 00:04:42,010 INFO  server.Server (Server.java:doStart) - jetty-7.x.y-SNAPSHOT
2016-08-26 00:04:42,033 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.s.ServletContextHandler{/metrics/json,null}
2016-08-26 00:04:42,131 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.w.WebAppContext{/,file:/usr/local/alluxio-1.2.0/core/server/src/main/webapp/},/usr/local/alluxio-1.2.0/core/server/src/main/webapp
2016-08-26 00:04:48,781 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChann...@0.0.0.0:19999
2016-08-26 00:04:48,781 INFO  logger.type (UIWebServer.java:startWebServer) - Alluxio Master Web service started @ 0.0.0.0/0.0.0.0:19999
2016-08-26 00:04:48,781 INFO  logger.type (AlluxioMaster.java:startServing) - Alluxio master version 1.2.0 started @ localhost/127.0.0.1:19998
2016-08-26 00:04:48,827 INFO  logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=5d4e4d7765a7, rpcPort=29998, dataPort=29999, webPort=30000} id: 1
2016-08-26 00:04:48,841 INFO  logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=1, workerAddress=WorkerNetAddress{host=5d4e4d7765a7, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=11217666048, usedBytes=0, lastUpdatedTimeMs=1472169888841, blocks=[]}
2016-08-26 00:05:33,299 INFO  logger.type (JournalWriter.java:openCurrentLog) - Opened current log file: /usr/local/alluxio-1.2.0/journal/FileSystemMaster/log.out


I can repro this fairly easily now.  I run my test program which copies a file to Alluxio and the first time, async persist works (see logs in previous post).  If I then remove the directory (and file) from Alluxio and run again (trying to copy another file), the heartbeat thread dies with that exception. 

If you have anything specific you'd like me to try, let me know.  Otherwise, I'll keep plugging away on my end.

Thanks,
Adam

Yupeng Fu

unread,
Aug 26, 2016, 1:16:12 PM8/26/16
to Adam Storm, Alluxio Users
Hi Adam,

Can you share the detailed steps to reproduce the issue? I tried to follow what you did: use copyFromLocal with async persist to write a file in a folder, and then deleted the folder, and rerun the command. And everything worked fine.

Thanks,

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.

Adam Storm

unread,
Aug 26, 2016, 3:28:48 PM8/26/16
to Alluxio Users, ajs...@gmail.com
Hi Yupeng,

I'm able to reproduce it fairly reliably with the following steps:

dd if=/dev/zero of=outputSmall.dat  bs=1  count=1K - Create 1K file
alluxio-stop.sh all
alluxio-start.sh all
alluxio fs copyFromLocal outputSmall.dat /; alluxio fs rm -R /outputSmall.dat

I've tried various file sizes and 1K seems to be the sweet spot but this may depend on the persist speed to S3.  If you can't get it to work with 1K, you might want to try a few different sizes.  Also, sometimes I need to run the last command (copy and rm) a few times before the heartbeat thread dies.

Please let me know if you need any other info from my end.

Adam
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.

Yupeng Fu

unread,
Aug 29, 2016, 1:04:10 PM8/29/16
to Adam Storm, Alluxio Users
Hi Adam,

Did you run fs rm to delete the files right after the copy command?

If so, the persist fails because the file is deleted before the async persist finishes. Right now, Alluxio locks the blocks and doesn't release it until the persist finishes, to prevent block eviction. However, it doesn't prevent you from deleting it explicitly.

What I suggest is to poll the status of the file, and delete it after the file is fully persisted.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.

Adam Storm

unread,
Aug 30, 2016, 9:00:55 PM8/30/16
to Alluxio Users, ajs...@gmail.com
Hi Yupeng,

I'm not too concerned about the persist failing, what I'm concerned about the way things are failing.  If you look at the logs I provided, you'll see this:


Uncaught exception in heartbeat executor, Heartbeat Thread shutting down
java.lang.RuntimeException: alluxio.exception.FileDoesNotExistException: Inode id 100,663,295 does not exist

Is it not possible to catch the FileDoesNotExist exception, log a message, and continue running? 

Also, in my tests I'm seeing that once the heartbeat thread shuts down, async persistence is no longer functional until I reboot Alluxio.  Is this expected (i.e. does the heartbeat thread handle all async persist requests?)?  If so, I see this as more than a small annoyance to be worked around, but rather, something which needs to be addressed before ASYNC_THROUGH can be used in a meaningful way. 

Thanks,
Adam

Yupeng Fu

unread,
Aug 31, 2016, 10:56:46 AM8/31/16
to Adam Storm, Alluxio Users
Thanks for reporting this issue. 
I agree with you that the uncaught exception is unexpected should be addressed. I've created the ticket to track it, and will have you posted once it's addressed.

Cheers,
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages