Run Alluxio with multiple users

112 views
Skip to first unread message

Saverio Veltri

unread,
Mar 24, 2016, 5:54:54 AM3/24/16
to Alluxio Users
Hi everyone,
    we are testing Alluxio ( branch 1.0 ) in our environment; specifically we are trying to run Alluxio from CLI and from Flink with different users.

We've found out, thanks to this thread https://groups.google.com/forum/#!searchin/alluxio-users/worker$20user/alluxio-users/MEamF2hlStQ/vHpA0zW7BgAJ , that a stickybit permission is applied to every folder inside the tiered storage (even the root).
That prevents other user to move or delete folders, therefore only the user which has started Alluxio (or root) is able to perform operations on workers.
We have tried avoid the stickybit by commenting method FileUtils#SetlocalDirStickyBit

Is it safe? Do you see any concurrency or consistency issue in doing that?
If not, it would be better if this feature could be configurable somehow

Thanks,

Saverio

Calvin Jia

unread,
Mar 24, 2016, 4:57:59 PM3/24/16
to Alluxio Users
Hi Saverio,

The main purpose of the stickybit is to prevent files from being deleted/moved by non-Alluxio processes. Tiered storage should only be modified by the Alluxio worker. Which cases are you seeing which requires other uses to move or delete folders in tiered storage?

Thanks,
Calvin

Calvin Jia

unread,
Mar 24, 2016, 5:53:58 PM3/24/16
to Alluxio Users
Hi Saverio,

The (incorrect) behavior I'm seeing is that files being written through short circuit will not be accessible by the worker, and possibly files which are written remote cannot be read through short circuit. I've opened a PR to address these issues here: https://github.com/Alluxio/alluxio/pull/2917

Do these behaviors seem consistent with what you see (it seems like you are describing the client trying to delete/modify files in tiered storage which should not be the case).

Hope this helps,
Calvin

Saverio Veltri

unread,
Mar 25, 2016, 11:59:26 AM3/25/16
to Calvin Jia, Alluxio Users
Hi Calvin,
    thanks for addressing my comment and for the PR.
Our scenario is a little different.
We are packaging a distribution containing Alluxio as file system, Flink as computation framework and some other components.
We are preparing some service users (e.g. alluxio, flink etc.) and some applicative users (e.g. customer1).

With the stickybit applied to the root of the tiered storage, in our case /mnt/ramdisk/aluxioworker, we are not able to perform operations either with the CLI or with Flink with other users except the one that had started Alluxio.

We got this error


[flink@ip-172-31-28-164 ~]$ /opt/alluxio_radicalbit/bin/alluxio fs copyFromLocal test.txt /
Failed to cache: ThriftIOException(message:Unable to delete /mnt/ramdisk/alluxioworker/7472892904723459189/16777216)

with this stacktrace 

2016-03-25 16:47:11,663 ERROR logger.type (AlluxioShell.java:run) - Error running copyFromLocal test.txt /
java.io.IOException: Failed to cache: ThriftIOException(message:Unable to delete /mnt/ramdisk/alluxioworker/7472892904723459189/16777216)
        at alluxio.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:288)
        at alluxio.client.file.FileOutStream.close(FileOutStream.java:164)
        at com.google.common.io.Closer.close(Closer.java:206)
        at alluxio.shell.command.CopyFromLocalCommand.copyPath(CopyFromLocalCommand.java:170)
        at alluxio.shell.command.CopyFromLocalCommand.copyFromLocal(CopyFromLocalCommand.java:135)
        at alluxio.shell.command.CopyFromLocalCommand.run(CopyFromLocalCommand.java:78)
        at alluxio.shell.AlluxioShell.run(AlluxioShell.java:182)
        at alluxio.shell.AlluxioShell.main(AlluxioShell.java:66)
Caused by: java.io.IOException: ThriftIOException(message:Unable to delete /mnt/ramdisk/alluxioworker/7472892904723459189/16777216)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:326)
        at alluxio.client.block.BlockWorkerClient.cacheBlock(BlockWorkerClient.java:139)
        at alluxio.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:96)
        at alluxio.client.file.FileOutStream.close(FileOutStream.java:159)
        ... 6 more
Caused by: ThriftIOException(message:Unable to delete /mnt/ramdisk/alluxioworker/7472892904723459189/16777216)
        at alluxio.thrift.BlockWorkerClientService$cacheBlock_result$cacheBlock_resultStandardScheme.read(BlockWorkerClientService.java:3997)
        at alluxio.thrift.BlockWorkerClientService$cacheBlock_result$cacheBlock_resultStandardScheme.read(BlockWorkerClientService.java:3974)
        at alluxio.thrift.BlockWorkerClientService$cacheBlock_result.read(BlockWorkerClientService.java:3916)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at alluxio.thrift.BlockWorkerClientService$Client.recv_cacheBlock(BlockWorkerClientService.java:255)
        at alluxio.thrift.BlockWorkerClientService$Client.cacheBlock(BlockWorkerClientService.java:241)
        at alluxio.client.block.BlockWorkerClient$3.call(BlockWorkerClient.java:142)
        at alluxio.client.block.BlockWorkerClient$3.call(BlockWorkerClient.java:139)
        at alluxio.AbstractClient.retryRPC(AbstractClient.java:322)
        ... 9 more


Without the Stickybit we are able to use the CLI and flink properly.

Below an example of how directories in tiered storage are owned

[flink@ip-172-31-17-124 ~]$ ll /mnt/ramdisk/alluxioworker/
-rwxrwxrwx. 1 alluxio rbp-group 5 Mar 25 09:00 285212672
-rwxrwxrwx. 1 flink   rbp-group 5 Mar 25 09:01 301989888
 
after 2 "copyfromlocal" operations from 2 different users.

Hope this helps to understand our case.

Thanks,
Saverio



--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/QvfkxLEEYNM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Saverio Veltri
Software Craftsman

Bin Fan

unread,
Mar 25, 2016, 12:51:56 PM3/25/16
to Alluxio Users, jia.c...@gmail.com
Hi Saverio

I think Calvin's PR ( https://github.com/Alluxio/alluxio/pull/2917 which has been merged) should be able to address this stickybit problem.
In your case with problems, files created by different users (flink, alluxio) had stickybits but for different users.
Thus files created by flink may not be able to be deleted by Alluxio worker.

Calvin will cut 1.0.1 very soon, it will be nice if you can have a try and report if the issue still happen.

Bin
To unsubscribe from this group and all its topics, send an email to alluxio-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Calvin Jia

unread,
Mar 25, 2016, 1:36:36 PM3/25/16
to Alluxio Users, jia.c...@gmail.com
Hi Saverio,

Thanks for the detailed explanation, as Bin mentioned, I think your scenario is in line with the issue I fixed.

The idea of the stickybit is to give a bit more safety to the tiered storage files, ensuring only the Alluxio worker user can delete them. One bug which was happening with short-circuit writes was what you saw, files in the tiered storage were being owned by other users. With the change, all the files should belong to the Alluxio worker.

The exception you see is because the client sends a request to the worker in order to delete a block on behalf of the client. The worker responds with the exception because it cannot delete the file owned by the client due to the sticky bit. After the patch, it will be able to delete it since it owns the file.

Hope this helps,
Calvin

Saverio Veltri

unread,
Mar 29, 2016, 10:58:12 AM3/29/16
to Alluxio Users, jia.c...@gmail.com
Hi Calvin,
    We made further tests in our scenario and, as you and Bin mentioned, the issue has been resolved by your PR.

Now all the files and folders in tiered storage are owned by the Alluxio worker

Thanks a lot for your support,
Saverio

Calvin Jia

unread,
Mar 29, 2016, 2:01:21 PM3/29/16
to Alluxio Users, jia.c...@gmail.com
Hi Saverio,

Thanks for verifying and reporting this issue. The fix should be available in the latest 1.0.1 release.

Cheers,
Calvin
Reply all
Reply to author
Forward
0 new messages