Problem Mounting S3 Bucket Folder to Alluxio

404 views
Skip to first unread message

Keren Tseytlin

unread,
Nov 4, 2016, 3:44:12 PM11/4/16
to Alluxio Users
Hi all,

I am trying to mount an S3 directory to Alluxio and am following the direction as listed in the documentation. When I try to connect the bucket at a particular folder I get the below error:

$ ./bin/alluxio fs mount alluxio://localhost:19998/mnt/s3 s3a://kafkaConnectTest/testfolder
ThriftIOException(message:Ufs path /testfolder does not exist)

However, if I use the AWS CLI, then I am able to read from the bucket (and I know it is populated):
$ aws s3 ls s3://kafkaConnectTest/testfolder --recursive
2016-11-02 16:22:55          0 testfolder/
2016-11-02 16:46:03         36 testfolder/test1.txt

My conf/alluxio-site.properties file is populated with the following configurations:
aws.accessKeyId=AWSACCESSKEY
aws.secretKey=AWSSECRETACCESSKEY
alluxio.underfs.s3.proxy.host=PROXYLOCATION
alluxio.underfs.s3.proxy.port=PROXYPORT
alluxio.underfs.s3.proxy.https.only=true

The below snippet is from logs/master.log

2016-11-04 15:31:42,144 ERROR logger.type (S3AUnderFileSystem.java:mkdirsInternal) - Failed to create directory: testfolder
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: XXXXXXXXXXXXXX), S3 Extended Request ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXX
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1305)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:852)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:630)
        at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:405)
        at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:367)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:318)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3787)
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1399)
        at alluxio.underfs.s3a.S3AUnderFileSystem.mkdirsInternal(S3AUnderFileSystem.java:786)
        at alluxio.underfs.s3a.S3AUnderFileSystem.getFolderMetadata(S3AUnderFileSystem.java:633)
        at alluxio.underfs.s3a.S3AUnderFileSystem.getObjectDetails(S3AUnderFileSystem.java:654)
        at alluxio.underfs.s3a.S3AUnderFileSystem.exists(S3AUnderFileSystem.java:278)
        at alluxio.master.file.FileSystemMaster.mountInternal(FileSystemMaster.java:2122)
        at alluxio.master.file.FileSystemMaster.mountAndJournal(FileSystemMaster.java:2055)
        at alluxio.master.file.FileSystemMaster.mount(FileSystemMaster.java:2022)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$12.call(FileSystemMasterClientServiceHandler.java:233)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$12.call(FileSystemMasterClientServiceHandler.java:230)
        at alluxio.RpcUtils.call(RpcUtils.java:62)
        at alluxio.master.file.FileSystemMasterClientServiceHandler.mount(FileSystemMasterClientServiceHandler.java:230)
        at alluxio.thrift.FileSystemMasterClientService$Processor$mount.getResult(FileSystemMasterClientService.java:1611)
        at alluxio.thrift.FileSystemMasterClientService$Processor$mount.getResult(FileSystemMasterClientService.java:1595)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2016-11-04 15:31:42,208 WARN  logger.type (RpcUtils.java:call) - I/O error when running rpc
java.io.IOException: Ufs path /testfolder does not exist
        at alluxio.master.file.FileSystemMaster.mountInternal(FileSystemMaster.java:2123)
        at alluxio.master.file.FileSystemMaster.mountAndJournal(FileSystemMaster.java:2055)
        at alluxio.master.file.FileSystemMaster.mount(FileSystemMaster.java:2022)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$12.call(FileSystemMasterClientServiceHandler.java:233)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$12.call(FileSystemMasterClientServiceHandler.java:230)
        at alluxio.RpcUtils.call(RpcUtils.java:62)
        at alluxio.master.file.FileSystemMasterClientServiceHandler.mount(FileSystemMasterClientServiceHandler.java:230)
        at alluxio.thrift.FileSystemMasterClientService$Processor$mount.getResult(FileSystemMasterClientService.java:1611)
        at alluxio.thrift.FileSystemMasterClientService$Processor$mount.getResult(FileSystemMasterClientService.java:1595)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Is there a configuration I'm missing? How can I make sure that the mount is successive? Any help would be much appreciated! :)

Best,
Keren

Calvin Jia

unread,
Nov 7, 2016, 12:12:48 PM11/7/16
to Alluxio Users
Hi Keren,

It seems like your S3 credentials are not correctly specified, hence the access denied error. Could you double check if your aws keys are set correctly?

Hope this helps,
Calvin

Keren Tseytlin

unread,
Nov 7, 2016, 12:23:55 PM11/7/16
to Alluxio Users
Hey Calvin,

I've triple checked that my credentials are correct. I've also run `aws configure` just for good measure.

In the error log you also see that Alluxio is trying to create the folder "testfolder" but when I do the S3 CLI call it is visible that that folder already exists and is populated.

If I try to mount the entire S3 bucket, it lets me. But then Alluxio shows me that it's completely empty (even though it is populated as visible in the "ls" AWS CLI command). See below:

$ ./bin/alluxio fs mount alluxio://localhost:19998/mnt/s3 s3a://kafkaConnectTest
Mounted s3a://kafkaConnectTest at alluxio://localhost:19998/mnt/s3
$ ./bin/alluxio fs ls /mnt/s3
# Nothing gets printed
$ aws s3 ls s3://kafkaConnectTest
                           PRE connect-test-prefix/
                           PRE destination/
                           PRE testfolder/

Also, I don't believe I would be able to "ls" using the AWS CLI if my keys were not set correctly.

Best,
Keren

Calvin Jia

unread,
Nov 7, 2016, 6:11:26 PM11/7/16
to Alluxio Users
Hi Keren,

Thanks for checking, do the credentials you are using have write access to the buckets? When Alluxio is connected with S3, it will leave an empty file to represent directories, so that it can detect empty directories. This behavior is not ideal (since you need read and write for a fresh bucket which has not had the empty files created) and will be modified in the near future.

For your point about detecting data, Alluxio loads data on-demand instead of at mount time. One way for you to see the data is to run bin/alluxio fs ls -f /mnt/s3, the -f option will force Alluxio to discover files in the directory.

Hope this helps,
Calvin

Keren Tseytlin

unread,
Nov 8, 2016, 11:35:00 AM11/8/16
to Alluxio Users
Hey Calvin,

I confirm that I can upload files to the directory:
$ aws s3 cp testing.txt s3://kafkaConnectTest/testfolder/testing.txt --sse AES256
upload: ./testing.txt to s3://kafkaConnectTest/testfolder/testing.txt

Once again I mounted the directory, but the directory still comes up empty:
$ bin/alluxio fs ls -f /mnt/s3
# Still returns empty here

The interesting part here is that my company requires that all objects sent to S3 are server side encrypted (or encrypted by KMS, etc), or else the file will not be able to go through. See below that I cannot write to S3 without specifying SSE:
$ aws s3 cp testing.txt s3://kafkaConnectTest/testfolder/testing.txt
upload failed: ./testing.txt to s3://kafkaConnectTest/testfolder/testing.txt An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

I did a quick test, and fudged the rules at work on S3 buckets so that I can upload files to S3 without having the file encrypted. Once I did that, I was successful in being able to see my S3 files in alluxio and was able to do the mount:
$ ./bin/alluxio fs mount -readonly alluxio://localhost:19998/mnt/s3 s3a://kafkaConnectTest
Mounted s3a://kafkaConnectTest at alluxio://localhost:19998/mnt/s3
$ bin/alluxio fs ls -f /mnt/s3
-rw-------     XXXXX        XXXX        36.00B    11-08-2016 11:27:21:573  Not In Memory  /mnt/s3/test1.txt
-rw-------     XXXXX        XXXX        2334.00B  11-08-2016 11:27:21:713  Not In Memory  /mnt/s3/bigdogapi_elb_cname.sh
-rw-------     XXXXX        XXXX        703.00KB  11-08-2016 11:27:21:872  Not In Memory  /mnt/s3/cloudtrail_ex.json
-rw-------    XXXXX        XXXX        27.00B    11-08-2016 11:27:22:027  Not In Memory  /mnt/s3/testing.txt

Based on what I did above, it seems that the mount can only work successfully if the bucket policy for the S3 bucket does not require objects to be encrypted as a prereq to writing files. Can you please describe in more detail how the mount is working? Based on what you wrote above, it seems to me like the EC2 will drop a file into the S3 bucket to verify the mount, is my interpretation of that correct? If this is the case, then if that upload can't be SSE then that might be the reason the mount is failing.

Best,
Keren

Calvin Jia

unread,
Nov 8, 2016, 6:47:51 PM11/8/16
to Alluxio Users
Hey Keren,

Thanks for digging into this. If you need to use server side encryption, you enable this by adding alluxio.underfs.s3a.server.side.encryption.enabled=true to your alluxio-site.properties files (helpful documentation). 

As for the mount implementation, we add a dummy file for each directory there so that in the future we can detect which paths are directories without needing to rely on having a file in the directory (for example if /dir/file exists, we can infer /dir exists, but if only /dir exists, nothing would be present in S3 for us to know that).

Hope this helps,
Calvin

Keren Tseytlin

unread,
Nov 9, 2016, 9:45:35 AM11/9/16
to Alluxio Users
Thanks for the info and thanks for your help!

Best,
Keren
Reply all
Reply to author
Forward
0 new messages