Announcing new versions of bdutil-1.1.0, gcs-connector-1.3.2, and bigquery-connector-0.5.1

119 views

Skip to first unread message

gcp-hadoop-announce

unread,

Jan 22, 2015, 8:32:45 PM1/22/15

to gcp-hadoo...@googlegroups.com

Greetings, users of Hadoop on Google Cloud Platform!

We’re pleased to announce the latest version of bdutil which adds support for Hortonworks HDP 2.2, improves default Hadoop 2 configuration, and improves Spark deployments.

Download bdutil-1.1.0.tar.gz or bdutil-1.1.0.zip now to try it out, or visit the developer documentation where the download links now point to the latest version.

Abridged highlights for bdutil updates:

bdutil now includes an extension for installing HDP via ambari which can be used by adding the following parameters to your existing bdutil invocations: "-e platforms/hdp/ambari_env.sh"

Abridged highlights for gcs-connector updates:

Zero-length file creation markers, which are used for fast-failing in the case of two writers, are now disabled by default. A configuration option has been created to re-enable them. See the gcs-connector CHANGES.txt for details.

Abridged highlights for bigquery-connector updates:

Various bug fixes in both the input and output formats.

Please see the detailed release notes below for more information about the new bdutil and connector features.

You may download each of the connectors directly via the following links, or use the latest bdutil to install them on a new cluster.

gcs-connector: gcs-connector-1.3.2-hadoop1.jar and gcs-connector-1.3.2-hadoop2.jar

bigquery-connector: bigquery-connector-0.5.1-hadoop1.jar and bigquery-connector-0.5.1-hadoop2.jar

As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.

All the best,

Your Google Team

bdutil-1.1.0: CHANGES.txt

1.1.0 - 2015-01-22

1. Added plugin for deploying Ambari/HDP with:
        ./bdutil -e platforms/hdp/ambari_env.sh deploy
2. Set dfs.replication to 2 under conf/hadoop*/hdfs-template.xml; this suits
    PD deployments better than r=3, but if deploying with HDFS residing on
    non-PD storage, the value should be reverted to 3.
3. Enabled Spark EventLog for Spark deployments, logging to
    gs://${CONFIGBUCKET}/spark-eventlog-base/${MASTER_HOSTNAME}
4. Migrated off of misc deprecated fields in favor of using
    spark-defaults.conf for Spark 1.0+; cleans up warnings on spark-submit.
5. Moved SPARK_LOG_DIR from default of ${SPARK_HOME}/logs into
    /hadoop/spark/logs so that they reside on the large PD if it exists.
6. Upgraded default Spark version to 1.2.0.
7. Added bdutil_env option INSTALL_JDK_DEVEL to optionally install full JDK
    with compiler/tools instead of just the minimal JRE; set to 'true' in
    single_node_env.sh and ambari_env.sh.
8. Added python script to allocate memory more intelligently in Hadoop 2.
9. Upgraded Hadoop 2 version to 2.5.2.

gcs-connector-1.3.2: CHANGES.txt

1.3.2 - 2015-01-22

1. In the create file path, marker file creation is now configurable. By
    default, marker files will not be created. The default is most suitable
    for MapReduce applications. Setting fs.gs.create.marker.files.enable to
    true in core-site.xml will re-enable marker files. The use of marker files
    should be considered for applications that depend on early failing when
    two concurrent writes attempt to write to the same file. Note that file
    overwrite semantics are preserved with or without marker files, but
    failures will occur sooner with marker files present.

bigquery-connector-0.5.1: CHANGES.txt

0.5.1 - 2015-01-22

1. Added enforcement of maximum number of export shards (currently 500)
    when calculating splits for BigQueryIntputFormat.
2. Fixed a bug where BigQueryOutputCommitter.needsTaskCommit() incorrectly
    depended on a Bigquery.Tables.list() call; listing tables suffers
    "eventual consistency", so occasionally a task would erroneously
    fail to commit data.
3. Removed extraneous table-deletion in BigQueryOutputCommitter.abortTask();
    cleanup occurs during job cleanup anyways, and this would incorrectly
    (but harmlessly) try to delete a nonexistent table for map tasks.

Reply all

Reply to author

Forward

0 new messages