We have been using Presto 0.111 without issue on AWS EMR clusters in order to perform cross join unnest operations on a field whose values are arrays of json objects. However, with the introduction of EMR 4.X, Amazon has been pre-loading the clusters with a later version of Presto, disabling the ability to install any other version of Presto. This transition into a later version of Presto has revealed a fundamental issue for us that was not present when using 0.111.
When running the same query as before, which consists of performing a cross-join unnest on an array of json objects, the cluster has an internal error with an OutOfMemoryError. Upon further investigation, we found out that the configurations we were using previously with 0.111 have become deprecated, such as task.max-memory, in favor of query.max-memory, etc. During our testing, we determined that Presto 0.112 is the last release that is viable in our use case. This conclusion led us to comparing the changes made from release 0.112 to 0.113. The major change appears to be enabling the clusterMemoryManager, which is explicitly set to true in MemoryManagerConfig in 0.113. This change results in the MemoryPool enabling blocking, which we believe could be the cause of our issues.
In the case that the blocking is not the cause of our problem, is there any configuration change we could make in later versions of Presto to alleviate our OutOfMemory problems related to performing cross-join unnest operations?
Thank you very much,
Tom