Announcing the release of Delta Connectors 0.5.0

112 views
Skip to first unread message

Scott Sandre

unread,
Jul 29, 2022, 2:56:51 PM7/29/22
to delta...@googlegroups.com

Hello. We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13


The key features in this release are:

  • Apache Flink Source for Delta Lake - Read from Delta tables directly using Apache Flink™ with the new Flink/Delta Source Connector, which utilizes the Delta Standalone library. This release introduces `DeltaSource`, a source for reading elements from Delta tables using Flink’s `DataStream` API. `DeltaSource` supports both batch (bounded) and streaming (continuous) modes. See the documentation for more details, or check out the examples.

  • Added support for GCS and S3 multi-cluster writes for Delta Standalone and downstream connectors - Delta Standalone now uses the same `delta-storage` dependency as Delta Lake on Spark, meaning that Delta Standalone can write to and read from any cloud store that Delta Lake on Spark can. This includes S3 (single and multi cluster), Azure, GCS, and HDFS. This change also allows any connector that uses Delta Standalone, such as the Flink/Delta connector, to write to and read from the listed clouds, too. See the documentation for more details.

  • Automatic path-based LogStore resolution for Delta Standalone and downstream connectors - Delta Standalone now automatically determines which LogStore to use for your Delta table based on the scheme of the file paths written or read. Users of this version will no longer need to explicitly configure Delta Standalone to use a specific LogStore for a specific file scheme. This also applies to connectors using Delta Standalone, such as the Flink/Delta connector. See the documentation for more details.


Miscellaneous updates:

  • Metrics for the Flink/Delta Sink  - The Apache Flink Sink for Delta Lake now exposes metrics for the number of records processed, number of records written, and number of bytes written to file. See the documentation for more details.

  • Bug fixes for the Flink/Delta Sink 

    • Due to a dependency conflict with some Apache Flink packages, users of 0.4.x may have experienced an `IllegalAccessError` exception when producing the fat-jar to deploy to their Flink cluster. This issue is fixed in 0.5.0 by this PR.

    • Due to a dependency conflict between the version of `parquet-hadoop` used by Apache Flink and Delta Standalone, writing Delta checkpoint files to tables may have failed with `org.apache.parquet.io.InvalidRecordException`. This issue is fixed in 0.5.0 by this PR.

  • Performance improvements for Delta Standalone `DeltaLog::getChanges` API - Delta Standalone 0.3.0 included a new `DeltaLog::getChanges` API which exposed incremental metadata changes to the `_delta_log` without computing the entire snapshot. This release improves the performance of this API by exposing the per-version changes as an iterator instead of loading all changes into memory.


Credits

Allison Portis, Gerhard Brueckl, Grzegorz Kołakowski, Jiawei Bao, Pablo Flores, Paweł Kubit, Scott Sandre, Shixiong Zhu, Krzysztof Chmielewski


Cheers

--
email_signature_logo_sm
Scott Sandre
Software Engineer
Delta Ecosystem Team
Reply all
Reply to author
Forward
0 new messages