Release notes for all Amazon EMR releases are available below. For comprehensive release information for each release, see Amazon EMR 6.x release versions, Amazon EMR 5.x release versions and Amazon EMR 4.x release versions.
The 6.14.0 release improves the way that Amazon EMR interacts with open-source applications such as Apache Hadoop YARN ResourceManager and HDFS NameNode. This improvement reduces the risk of operational delays with cluster scaling, and mitigates startup failures that occur due to connectivity issues with the open-source applications.
When you launch a cluster with the latest patch release of Amazon EMR 5.36 or higher, 6.6 or higher, or 7.0 or higher, Amazon EMR uses the latest Amazon Linux 2023 or Amazon Linux 2 release for the default Amazon EMR AMI. For more information, see Using the default Amazon Linux AMI for Amazon EMR.
The 6.13.0 release improves the Amazon EMR log management daemon to ensure that all logs are uploaded at a regular cadence to Amazon S3 when a cluster termination command is issued. This facilitates faster cluster terminations.
Amazon EMR releases 6.12.0 and higher support LDAP integration with Apache Livy, Apache Hive through HiveServer2 (HS2), Trino, Presto, and Hue. You can also install Apache Spark and Apache Hadoop on an EMR cluster that uses 6.12.0 or higher and configure them to use LDAP. For more information, see Use Active Directory or LDAP servers for authentication with Amazon EMR.
The 6.12.0 release adds a new retry mechanism to the cluster scaling workflow for EMR clusters that run Presto or Trino. This improvement reduces the risk that cluster resizing will indefinitely stall due to a single failed resize operation. It also improves cluster utilization, because your cluster scales up and down faster.
The 6.12.0 release fixes an issue where cluster scale-down operations might stall when a core node that is undergoing graceful decommissioning turns unhealthy for any reason before it fully decommissions.
The 6.12.0 release improves cluster scale-down logic so that your cluster doesn't attempt a scale-down of core nodes below the HDFS replication factor setting for the cluster. This aligns with your data redundancy requirements, and reduces the chance that a scaling operation might stall.
The 6.12.0 release enhances the performance and efficiency of the health monitoring service for Amazon EMR by increasing the speed at which it logs state changes for instances. This improvement reduces the chance of degraded performance for cluster nodes that are running multiple custom client tools or third-party applications.
The 6.12.0 release improves the performance of the on-cluster log management daemon for Amazon EMR. As a result, there is less chance for degraded performance with EMR clusters that run steps with high concurrency.
With Amazon EMR release 6.12.0, the log management daemon has been upgraded to identify all logs that are in active use with open file handles on the local instance storage, and the associated processes. This upgrade ensures that Amazon EMR properly deletes the files and reclaims storage space after the logs are archived to Amazon S3.
The 6.12.0 release includes a log-management daemon enhancement that deletes empty, unused steps directories in the local cluster file system. An excessively large number of empty directories can degrade the performance of Amazon EMR daemons and result in disk over-utilization.
Due to lock contention, a node can enter into a deadlock if it's added or removed at the same time that it attempts to decommission. As a result, the Hadoop Resource Manager (YARN) becomes unresponsive, and affects all the incoming and currently-running containers.
This release fixes an issue where clusters that are running workloads on Spark with Amazon EMR might silently receive incorrect results with contains, startsWith, endsWith, and like. This issue occurs when you use the expressions on partitioned fields that have metadata in the Amazon EMR Hive3 Metastore Server (HMS).
This release adds a new retrymechanism to the cluster scaling workflow forEMR clusters that run Presto or Trino.This improvement reduces the risk that cluster resizingwill indefinitely stall due to a single failed resizeoperation. It also improves cluster utilization,because your cluster scales up and down faster.
This release improves cluster scale-down logicso that your cluster doesn't attempt ascale-down of core nodes below the HDFS replicationfactor setting for the cluster. This aligns withyour data redundancy requirements, and reduces thechance that a scaling operation might stall.
The log management daemonhas been upgraded to identify all logs that are inactive use with open file handles on the localinstance storage, and the associated processes. Thisupgrade ensures that Amazon EMR properly deletes thefiles and reclaims storage space after the logs arearchived to Amazon S3.
This release includes a log-management daemonenhancement that deletes empty, unused stepsdirectories in the local cluster file system. Anexcessively large number of empty directories candegrade the performance of Amazon EMR daemons and resultin disk over-utilization.
With Amazon EMR 6.11.0, the DynamoDB connector has been upgraded to version 5.0.0. Version 5.0.0 uses AWS SDK for Java 2.x. Previous releases used AWS SDK for Java 1.x. As a result of this upgrade, we strongly advise you to test your code before you use the DynamoDB connector with Amazon EMR 6.11.
When the DynamoDB connector for Amazon EMR 6.11.0 calls the DynamoDB service, it uses the Region value that you provide for the dynamodb.endpoint property. We recommend that you also configure dynamodb.region when you use dynamodb.endpoint, and that both properties target the same AWS Region. If you use dynamodb.endpoint and you don't configure dynamodb.region, the DynamoDB connector for Amazon EMR 6.11.0 will return an invalid Region exception and attempt to reconcile your AWS Region information from the Amazon EC2 instance metadata service (IMDS). If the connector can't retrieve the Region from IMDS, it defaults to US East (N. Virginia) (us-east-1). The following error is an example of the invalid Region exception that you might get if you don't properly configure the dynamodb.region property: error software.amazon.awssdk.services.dynamodb.model.DynamoDbException: Credential should be scoped to a valid region. For more information on the classes that are affected by the AWS SDK for Java upgrade to 2.x, see the Upgrade AWS SDK for Java from 1.x to 2.x (#175) commit in the GitHub repo for the Amazon EMR - DynamoDB connector.
This release fixes an issue where column data becomes NULL when you use Delta Lake to store Delta table data in Amazon S3 after column rename operation. For more information about this experimental feature in Delta Lake, see Column rename operation in the Delta Lake User Guide.
The 6.11.0 release fixes an issue that might occur when you create an edge node by replicating one of the primary nodes from a cluster with multiple primary nodes. The replicated edge node could cause delays with scale-down operations, or result in high memory-utilization on the primary nodes. For more information on how to create an edge node to communicate with your EMR cluster, see Edge Node Creator in the aws-samples repo on GitHub.
The 6.11.0 release fixes an issue with EMR clusters where an update to the YARN configuration file that contains the exclusion list of nodes for the cluster is interrupted due to disk over-utilization. The incomplete update hinders future cluster scale-down operations. This release ensures that your cluster remains healthy, and that scaling operations work as expected.
Hadoop 3.3.3 introduced a change in YARN (YARN-9608)that keeps nodes where containers ran in adecommissioning state until the applicationcompletes. This change ensures that local data suchas shuffle data doesn't get lost, and you don' need to re-run the job. This approach might also lead tounderutilization of resources on clusters with orwithout managed scaling enabled.
With Amazon EMR releases 6.11.0 and higher as well as 6.8.1, 6.9.1, and 6.10.1, the value ofyarn.resourcemanager.decommissioning-nodes-watcher.wait-for-applicationsis set to false in yarn-site.xml to resolve this issue.
While the fixaddresses the issues that were introduced byYARN-9608, it might cause Hive jobs to fail due toshuffle data loss on clusters that have managedscaling enabled. We've mitigated that risk in this release by also settingyarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-datafor Hive workloads. This config is only availablewith Amazon EMR releases 6.11.0 and higher.
This release no longer gets automatic AMI updates since it has been succeeded by 1 more more patch releases. The patch release is denoted by the number after the second decimal point (6.8.1). To see if you're using the latest patch release, check the available releases in the Release Guide, or check the Amazon EMR release dropdown when you create a cluster in the console, or use the ListReleaseLabels API or list-release-labels CLI action. To get updates about new releases, subscribe to the RSS feed on the What's new? page.
b1e95dc632