Alluxio + Presto in Kubernetes

164 views
Skip to first unread message

Marvin

unread,
Mar 9, 2020, 10:07:03 PM3/9/20
to Alluxio Users
Hi,

We are currently running Alluxio and Presto in Kubernetes and we are using the presto distribution from Starburstdata. For alluxio setup, we followed the instructions from here. All of our data is in S3. Without using Alluxio for caching, Presto queries directly to S3 buckets ranges from 4 to 20 secs runtime. However, integrating Alluxio degraded the runtime to around 3 to 4 minutes.

Presto cluster runs on a separate set of machines (r3.2xlarge, r4.2xlarge) and so are Alluxio cluster (i3.2xlarge). The ec2 instances are all on the same single availability zone. We do not also monitor any errors on the logs. Can anyone advise on a better approach on using alluxio in kubernetes? 

Bin Fan

unread,
Mar 10, 2020, 1:06:42 AM3/10/20
to Marvin, Alluxio Users
hi Marvin,

The performance degradation you mentioned is unexpected. 
Typically Alluxio provides better or at least similar performance as S3 after data cached.

To provide you better suggestion, more info is needed such as Alluxio version, configuration, data size and etc.
Happy to have a deeper dive and discussion with you.
You are also welcome to join alluxio.io/slack to have more efficient communication

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/alluxio-users/20de3cbd-b2fc-4c40-9068-285e7f0b0879%40googlegroups.com.


--

Marvin

unread,
Mar 19, 2020, 8:57:24 PM3/19/20
to Alluxio Users
As i said above, we are getting a huge performance degradation when using alluxio and presto in kubernetes. Both applications are running on their own set of machines all on the same region. We are currently testing all the provided images as described here

Both Presto and alluxio runs on kubernetes and uses calico networking. Since the containers of both presto and alluxio runs on separate hosts, we won't benefit on both short-circuit scenario and domain socket path implementation. Correct me if I am wrong on this one. With this, we are supposed to get a network speed performance but that is not the case.

Below are the parameters we explicitly set. Can someone help or maybe provide direction on what could be wrong? We have already checked the network throughput across nodes and containers in the kubernetes cluster and we are getting 4-6Gbps. So we are ruling out any misconfiguration on the kubernetes cluster. Also, Presto directly connected to S3 is really fast. 

Does alluxio fits our use case? We basically want to separate storage and compute so we can scale independently. 

-Dalluxio.master.hostname=alluxio-master-0
-Dalluxio.worker.data.server.domain.socket.address=/opt/domain
-Dalluxio.user.short.circuit.enabled=false
-Dalluxio.worker.data.server.domain.socket.as.uuid=true
-Dalluxio.worker.network.block.reader.threads.max=20480
-Dalluxio.worker.network.async.cache.manager.threads.max=24
-Dalluxio.worker.memory.size=32GB
-Dalluxio.master.journal.type=UFS
-Dalluxio.master.journal.folder=/journal
-Dalluxio.security.stale.channel.purge.interval=365d
-Daws.accessKeyId=XXXXXXXXXXX
-Daws.secretKey=XXXXXXXXXXX
-Dalluxio.master.mount.table.root.ufs=s3://XXXXXXXXX/
-Dalluxio.user.block.size.bytes.default=128MB
-Dalluxio.user.file.metadata.sync.interval=0
-Dalluxio.security.authorization.permission.enabled=true
-Dalluxio.master.security.impersonation.hive.users=*
-Dalluxio.master.security.impersonation.presto.users=*
-Dalluxio.master.security.impersonation.yarn.users=*
-Dalluxio.master.security.impersonation.root.users=*
-Dalluxio.master.security.impersonation.root.groups=*
-Dalluxio.master.security.impersonation.client.users=*
-Dalluxio.master.security.impersonation.client.groups=*
-Dalluxio.worker.tieredstore.level0.alias=SSD
-Dalluxio.worker.tieredstore.levels=1
-Dalluxio.worker.tieredstore.level0.dirs.mediumtype=SSD
-Dalluxio.worker.tieredstore.level0.dirs.path=/alluxio-fs
-Dalluxio.worker.tieredstore.level0.dirs.quota=15G
-Dalluxio.worker.tieredstore.level0.watermark.high.ratio=0.95
-Dalluxio.worker.tieredstore.level0.watermark.low.ratio=0.7

Jiacheng Liu

unread,
Mar 24, 2020, 10:40:15 AM3/24/20
to Marvin, Alluxio Users
Hi Marvin,

Thanks that's more information to us. I agree with Bin. Using Alluxio (even in a remote cluster) to speed up S3 access is a meaningful use case to me.

The below points can possibly hurt your performance. I would probably start with these:
1. What is your worker tiered storage volume type? Basically what kind of volumes are mounted at /alluxio-fs and can the speed be a bottleneck?
2. What is your worker load? Frequent eviction can also hurt performance. The eviction is triggered when worker load is past 0.95 * quota, and stops when the load is below 0.7 * quota.

What is the formation of your Alluxio master/workers? Seems you are using a single master with UFS journal, with probably more than one worker? Could you please share how the cluster deployed exactly? That doc page covers all possible ways to deploy Alluxio.

Thank you. I'm happy to discuss in more depth. You can find me on Slack by the name of Jiacheng Liu.

Best,
Jiacheng

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
Message has been deleted

Marvin

unread,
Mar 24, 2020, 10:43:43 PM3/24/20
to Alluxio Users
Hi Jiacheng,

Thanks for the response. We have looked into these items:

1. What is your worker tiered storage volume type? Basically what kind of volumes are mounted at /alluxio-fs and can the speed be a bottleneck?
--> Tested several types of storage types. GP2, IO1 (3000 IOPS), and NVME attached to i-type instances. All have the same performance. We also tested writing and writing manually (dd command) to the mounted /alluxio-fs and we are getting great performance.

2. What is your worker load? Frequent eviction can also hurt performance. The eviction is triggered when worker load is past 0.95 * quota, and stops when the load is below 0.7 * quota.
--> We have also tried changing these values. Our quota is huge enough so eviction is not happening at all. 

We have also tested the network. We setup iperf3 (server/client) to get a sense on the throughput between alluxio nodes and presto nodes. We are also getting a huge throughput at around 4Gbps. This obviously depends on the instance type but this throughput result is huge enough. We tested this to make sure network is not the bottleneck.

On the deployment document to kubernetes, we followed section 2.4 which is deployment using kubectl. We have tried both singleMaster-localJournal  and multiMaster-EmbeddedJournal. We did not changed much of what is already defined on the YAML files except for the obvious like aws keys, s3 underfs etc.

We also changed the config-map.yaml to reflect the parameters i have initially provided. Also added the mount point definition for /alluxio-fs. The rest are the in-tact and unchanged. 

We still enabled short-circuit parameter to "true" alongside the domain socket. Though i think it won't make any much sense as presto and alluxio worker are not co-located. At least, we expect to get a "network-level" speed if not memory speed but we are actually getting a performance degradation.

What i am curious about is if our use-case is something that alluxio can be used for. If yes, i may be missing something on my set-up. That part i need your help on. Thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to alluxi...@googlegroups.com.

Bin Fan

unread,
Mar 25, 2020, 2:06:40 PM3/25/20
to Ankur Wahi, Jiacheng Liu, Marvin, Alluxio Users
Thanks Ankur,


A few tips on performance debugging:

- Are you running the latest 2.2.0? If not, strongly suggest you to run this version.

- Check if your data has been uniformly loaded into Alluxio,
You can check this by running "bin/alluxio fsadmin report capacity"
If it is not, next time you can run "bin/alluxio fs distributedLoad /path/file"

- On Alluxio 2.2.0, you can check the cache hit ratio more accurately  and see if they hit Alluxio cache

You are welcome to discuss more efficiently at our Slack channel
- Bin

On Tue, Mar 24, 2020 at 8:57 AM Ankur Wahi <wah...@gmail.com> wrote:
Hi,

Thanks for all the suggestions.
We are proposing the following architecture.

We have around 700TB data lake in S3, our top 10 highly queried tables take around 50TB.
So we plan to move the 50TB of data on to HDD in our existing Presto cluster.
For these top 10 tables the respective ETL will read and write directly to alluxio filesystem, and data will be backed up to S3 via async copy

We are also going to set alluxio.user.file.metadata.sync.interval, so that if any data is written directly to S3 it gets replicated to Alluxio.

We have started a small PoC, for testing Presto performance S3 vs Alluxio.
1) Moved around 8TB of a data of table from s3 to Alluxio file system
2) Built table on alluxio files
3) Ran query on alluxio table and compared with s3 table on same 8TB data

Result: s3 performance beating Alluxio for same cluster size
PoC is on 4 node r4.8x cluster. Each core node having 3TB HDD

Question:
1) What other factors to improve Alluxio table performance on same cluster size?

Thanks
Ankur

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages