Delta-sharing support for deletion vectors

111 views
Skip to first unread message

Marcel Mossmann

unread,
Mar 20, 2024, 1:46:26 PMMar 20
to Delta Lake Users and Developers
Hello Everyone,
I have a few questions about delta-sharing and deletion vectors.
Currently I'm using delta-sharing reference server implementation v0.7 (delta-io/delta-sharing) and receiving an error regarding Delta protocol version for reader when trying to read a table with deletion vectors created by Databricks > 14.x:

Delta protocol version (3,7) is too new for this version of Delta
Standalone Reader/Writer (1,2). Please upgrade to a newer release.

I read a bit through the github and, from what I understand, these features are now supported through Delta Kernel (read only for now).
From what I got from the the delta-io/delta-sharing github repository, Delta Kernel is still not integrated yet, but it is on the roadmap.

My question are:
Are deletion vectors really not supported as of today on delta-sharing or am I mistaken?
If so, is there any forecast as to when this will be supported?

Thank you!

Artem Zhukov

unread,
Mar 21, 2024, 2:51:24 AMMar 21
to Marcel Mossmann, Delta Lake Users and Developers

--
zhukovgreen,

Data Engineer @Paylocity
https://github.com/zhukovgreen

On Mar 20, 2024, at 18:22, Marcel Mossmann <marcelm...@gmail.com> wrote:

Hello Everyone,
Hi Marcel,

I have a few questions about delta-sharing and deletion vectors.
Currently I'm using delta-sharing reference server implementation v0.7 (delta-io/delta-sharing) and receiving an error regarding Delta protocol version for reader when trying to read a table with deletion vectors created by Databricks > 14.x:

Delta protocol version (3,7) is too new for this version of Delta
Standalone Reader/Writer (1,2). Please upgrade to a newer release.
Can you try to upgrade the reader/writer versions [1]?


I read a bit through the github and, from what I understand, these features are now supported through Delta Kernel (read only for now).
From what I got from the the delta-io/delta-sharing github repository, Delta Kernel is still not integrated yet, but it is on the roadmap.

My question are:
Are deletion vectors really not supported as of today on delta-sharing or am I mistaken?
If so, is there any forecast as to when this will be supported?
We tested this feature with spark3.5 and the following setup:
```
pyspark --packages 'io.delta:delta-sharing-spark_2.12:3.1.0' --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
```
All works fine!

Thank you!

[1] https://docs.delta.io/latest/versioning.html#upgrading-protocol-versions
--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/4e969424-6a45-4440-8edf-db654676533dn%40googlegroups.com.


Marcel Mossmann

unread,
Mar 21, 2024, 7:21:25 PMMar 21
to Delta Lake Users and Developers
Hi zhukovgreen, thanks for the response!

From what I understand, the link you posted explains how to upgrade the protocol version on a table.
Unfortunately, the problem is exactly the opposite, we have a table which has a minReaderVersion of 3 (deletion vectors) and our "reader" has a reader version of 1.

For more details, the "reader" in this case is a Delta Sharing server managed by us and implemented through the Delta Sharing Reference Server (distributed by delta (delta-io/delta-sharing)).
In this case, upgrading the "reader" means upgrading the server itself and, from what I read on the github repo (delta-io/delta-sharing), Delta Sharing Reference Server still doesn't support deletion vectors, so it would be pointless to upgrade it now.
I want to confirm whether or not that's correct and, since the integration of Delta Kernel on the Delta Sharing Reference Server is on the roadmap for delta, I also want to check if there's any forecast on it?

Again, thanks a lot for your response!
Message has been deleted
Message has been deleted
Message has been deleted

Lin Zhou

unread,
Mar 22, 2024, 2:45:05 PMMar 22
to Delta Lake Users and Developers
Hi Marcel,
Yes, integration between delta-kernel and delta-sharing reference server is on the roadmap, tentatively happen in Q2, would that be a good timeline for you? 

Artem Zhukov

unread,
Mar 26, 2024, 9:32:50 AMMar 26
to Marcel Mossmann, Delta Lake Users and Developers
Makes more sense now, thanks for explaining. We are just a customer of the databricks, so the server part is hidden from us. Good luck with solving the problem

--
zhukovgreen,

Data Engineer @Paylocity
https://github.com/zhukovgreen

Marcel Mossmann

unread,
Apr 4, 2024, 3:33:11 PMApr 4
to Delta Lake Users and Developers
Hi Lin,

Sorry, about the delay on my reply.
Thanks a lot for answer, we will further analyze our timelines with this in mind.

Reply all
Reply to author
Forward
0 new messages