How to connect to DeltaLake without Spark?

rockssk

unread,

Feb 19, 2020, 5:37:11 PM2/19/20

to Delta Lake Users and Developers

Team,

Can i connect to deltalake store and access the table/files natively via Deltalake API without involving a spark cluster?

We want to leverage azure databricks cluster to perform data engineering tasks and store the resultant delta files/tables in Azure Datalake store and let the downstream systems consume these delta files/tables directly . Can Delta Lake API help here?

Any pointers or guidance is much appreciated

Regards

Suresh

Michael Armbrust

unread,

Feb 19, 2020, 5:47:56 PM2/19/20

to rockssk, Delta Lake Users and Developers

We want to leverage azure databricks cluster to perform data engineering tasks and store the resultant delta files/tables in Azure Datalake store and let the downstream systems consume these delta files/tables directly . Can Delta Lake API help here?

We do use Spark under the covers, but Delta Lake is included natively on Azure Databricks clusters. You can learn more in the azure docs.

What other systems do you want to be able to read these files? The protocol is also open source, so any system should be able to integrate. We are happy to work with other developers to help guide these integrations.

rockssk

unread,

Feb 19, 2020, 5:57:08 PM2/19/20

to Delta Lake Users and Developers

Thanks Michael for your prompt response. A sample case would be like shutting down Databricks cluster once the delta lake files are written and then Anytime of the day using azure functions or Azure Data Factory to consume the delta files natively without involving spark/databricks cluster. If you have any sample integration example that would be great starting point

rockssk

unread,

Feb 20, 2020, 9:56:45 AM2/20/20

to Delta Lake Users and Developers

In addition, since the deltalake is open sourced, can we leverage the benefits of this file system interacting directly with the Delta APIs and without Spark cluster at all?

shiche...@gmail.com

unread,

Nov 2, 2020, 9:15:41 PM11/2/20

to Delta Lake Users and Developers

As so far, is there still no direct API to manipulate the underlying files without spark?

Shixiong(Ryan) Zhu

unread,

Nov 2, 2020, 9:23:32 PM11/2/20

to shiche...@gmail.com, Delta Lake Users and Developers

We created a Delta Standalone Reader project and merged it recently. See https://github.com/delta-io/connectors/pull/51 It provides basic APIs to access Delta tables without Spark. Please take a look if you are interested. Feel free to create issue requests https://github.com/delta-io/connectors if you have any suggestions to these APIs.

Best Regards,

Ryan

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/a1f4dcf6-efb4-49df-b03b-cc829553ae55n%40googlegroups.com.

Joao Pedro Afonso Cerqueira

unread,

Nov 3, 2020, 4:25:38 AM11/3/20

to shiche...@gmail.com, Delta Lake Users and Developers

You should try other formats , if you're interested in manipulating parquet with version control in a Blog Storage.

There is Apache Iceberg (( https://iceberg.apache.org/ ))), that you can connect with other frameworks than Spark, and supports schema evolution, time travel, partition layout evolution and hidden partitioning.

This is being maintained by Netflix and Cloudera guys, in an open source maner, without locking you to a specific framework like spark.

--
Best Regards,

João P. A. Cerqueira

Big Data & Analytics specialist consultant - London ( Owner and founder jo...@fuelbigdata.com )

+44(0)7572550311 | jo...@fuelbigdata.com | jpacerqueira.c...@gmail.com | https://fuelbigdata.com

--

Chris Hoshino-Fish

unread,

Nov 3, 2020, 3:12:47 PM11/3/20

to Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers

Hi Joao,

Just a reminder, this email list is for discussing Delta Lake functionality, not other projects. Iceberg is certainly similar to Delta and has some of the same features, but Iceberg is behind Delta in terms of features availability and maturity, for instance UPDATE and MERGE support with standard APIs. I'm a bit confused about the "locking you in to Spark" comment - Spark is the best open source ETL software available right now, so I would personally say that Delta has made a good choice there. Delta's goal is to be the best storage format for data in public clouds, so it will certainly evolve over time, hence the standalone Rust client - https://github.com/delta-io/delta.rs which will evolve to include writes. We'd also be happy to work with members of the other data processing frameworks to build Delta writers, like Apache Flink. Frameworks like Hive or Presto don't necessarily make as much sense - why use those for ETL when you could use Spark?

Please keep comments in this email list related to Delta only. Thanks!

-Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CALXd%2B4Bi28BH4Mw4FnqVbEitJb%2B1B4RdKS12Xn5myOix%3DgnsoQ%40mail.gmail.com.

--

Chris Hoshino-Fish

Sr. Solutions Architect

Databricks Inc.

fi...@databricks.com

(415) 610-8520

Mushtaq Ahmed

unread,

Nov 3, 2020, 8:43:24 PM11/3/20

to Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers

On Wed, Nov 4, 2020 at 1:42 AM Chris Hoshino-Fish <fi...@databricks.com> wrote:

hence the standalone Rust client - https://github.com/delta-io/delta.rs which will evolve to include writes.

It is good to know that the rust client will evolve to include writes.

Is there a plan to add writing support for the jvm standalone connector? From the FAQ section, it says:

Can I write to a Delta table using this connector?
No. The connector doesn't support writing to a Delta table.

It will be great if write support can be added to the jvm connector itself.

Denny Lee

unread,

Nov 3, 2020, 8:51:02 PM11/3/20

to Mushtaq Ahmed, Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers

The Rust API is being maintained by the community at large and was started by the folks at Scribd. Chime in to the repo via issues and provide feedback / create issues so we can work together to get more features in, eh?!

--

You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CAFpViYiB_M-BbWB4peyPFZBPw3P4OB7Y2LkMak00rH%2BgxLncXQ%40mail.gmail.com.

--

Denny Lee
denn...@databricks.com

lec ssmi

unread,

Nov 3, 2020, 9:13:51 PM11/3/20

to Denny Lee, Mushtaq Ahmed, Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, Delta Lake Users and Developers

Hope that Delta can have an independent read and write API, so that users can read and write data through ordinary java programs. For some connectors, users can also customize their own implementation.

Denny Lee <denn...@databricks.com> 于2020年11月4日周三上午9:51写道：

QP Hou

unread,

Nov 4, 2020, 1:48:42 AM11/4/20

to Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers

As Chris already mentioned, Delta lake's design [0] doesn't have any Spark specific lock in. That's why we are able to build a Rust client that can interact with Delta tables natively without JVM [1]. The native client can be easily wrapped by other languages if needed, including Java. For example, we already created PoC Ruby and Python clients based off the Rust core [2][3].

Building the first full read/write implementation of the delta lake using Spark also makes a lot of sense because it is the best data lake ETL framework out there. But that doesn't mean Delta formats can only be accessed/managed by Spark. The community is growing and I do expect to see more alternative implementations being created as adoption increases.

[0]: https://github.com/delta-io/delta/blob/master/PROTOCOL.md
[1]: https://github.com/delta-io/delta.rs
[2]: https://github.com/delta-io/delta.rs/tree/main/python
[3]: https://github.com/delta-io/delta.rs/tree/main/ruby

Thanks,
QP Hou

To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CAPiuPB1YdrHDXzxgezc2gD3WoRTTODr8C9K9oJuqWjbk6a-dyA%40mail.gmail.com.

Reply all

Reply to author

Forward