Announcing Delta Lake 0.3.0

135 views
Skip to first unread message

Tathagata Das

unread,
Aug 1, 2019, 9:44:58 PM8/1/19
to Delta Lake Users and Developers, user
Hello everyone, 

We are excited to announce the availability of Delta Lake 0.3.0 which introduces new programmatic APIs for manipulating and managing data in Delta Lake tables.

Here are the main features: 


  • Scala/Java APIs for DML commands - You can now modify data in Delta Lake tables using programmatic APIs for Delete, Update and Merge. These APIs mirror the syntax and semantics of their corresponding SQL commands and are great for many workloads, e.g., Slowly Changing Dimension (SCD) operations, merging change data for replication, and upserts from streaming queries. See the documentation for more details.


  • Scala/Java APIs for query commit history - You can now query a table’s commit history to see what operations modified the table. This enables you to audit data changes, time travel queries on specific versions, debug and recover data from accidental deletions, etc. See the documentation for more details.


  • Scala/Java APIs for vacuuming old files - Delta Lake uses MVCC to enable snapshot isolation and time travel. However, keeping all versions of a table forever can be prohibitively expensive. Stale snapshots (as well as other uncommitted files from aborted transactions) can be garbage collected by vacuuming the table. See the documentation for more details.


To try out Delta Lake 0.3.0, please follow the Delta Lake Quickstart: https://docs.delta.io/0.3.0/quick-start.html

To view the release notes:

We would like to thank all the community members for contributing to this release.

TD

Gourav Sengupta

unread,
Aug 2, 2019, 12:42:39 PM8/2/19
to Tathagata Das, Delta Lake Users and Developers, user
Yahoooo!!!!!

celebrations, wine, cocktails, parties, dances tonight :) 

Regards,
Gourav

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CA%2BAHuKmAhUar%3D7GZ9bUwJKmh%3Diu67%3DTVzH%2BhiwTpC0v33A_MQQ%40mail.gmail.com.

Kinjarapu, Pratap Ramana

unread,
Aug 5, 2019, 11:42:04 AM8/5/19
to Tathagata Das, Delta Lake Users and Developers, user

This is indeed one of the most useful feature. Thank you everyone for making this available.

 

Thanks,

Pratap

--

You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CA%2BAHuKmAhUar%3D7GZ9bUwJKmh%3Diu67%3DTVzH%2BhiwTpC0v33A_MQQ%40mail.gmail.com.



U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains information that is, or may be, covered by electronic communications privacy laws, and is also confidential and proprietary in nature. If you are not the intended recipient, please be advised that you are legally prohibited from retaining, using, copying, distributing, or otherwise disclosing this information in any manner. Instead, please reply to the sender that you have received this communication in error, and then immediately delete it. Thank you in advance for your cooperation.

---------------------------------------------------------------------

Hari Kodali

unread,
Jan 24, 2020, 10:32:49 PM1/24/20
to Tathagata Das, Delta Lake Users and Developers, user
i have below code for merge update, when i query the dataframe after merge i still see duplicate records based 
on "id" column
what is wrong here
val data: Map[String, String] = resultdf.columns
.map(mcol => s"leads.${mcol}" -> s"updates.${mcol}").toMap
deltaTable
.as("leads")
.merge(
resultdf.as("updates"),
"leads.id = updates.id"
).whenMatched("leads.id = updates.id")
.updateExpr(data)
//.updateAll()
.whenNotMatched()
.insertAll()
.execute()
Can any one help, neither updateExpr or updateAll do not work, instead i am seeing duplicate record when i run below code.
val leadsDF = spark.read.format("delta").parquet("/Users/HariKodali/tip/stage0/marketo/delta/leads")
        print(" total count ",leadsDF.count())
        leadsDF.groupBy("id").count()
         .filter($"count" > 1)         
        .show()
Thanks

Hari Kodali

BIGDATA Solutions Architect


--
Reply all
Reply to author
Forward
0 new messages