How to connect to DeltaLake without Spark?

562 views
Skip to first unread message

rockssk

unread,
Feb 19, 2020, 5:37:11 PM2/19/20
to Delta Lake Users and Developers
Team,
 
   Can i connect to deltalake store and access the table/files natively via Deltalake API without involving a spark cluster?

  We want to leverage azure databricks cluster to perform data engineering tasks and store the resultant delta files/tables in Azure Datalake store and let the downstream systems consume these delta files/tables  directly . Can Delta Lake API help here?

  Any pointers or guidance is much appreciated


Regards
Suresh

Michael Armbrust

unread,
Feb 19, 2020, 5:47:56 PM2/19/20
to rockssk, Delta Lake Users and Developers
  We want to leverage azure databricks cluster to perform data engineering tasks and store the resultant delta files/tables in Azure Datalake store and let the downstream systems consume these delta files/tables  directly . Can Delta Lake API help here?

We do use Spark under the covers, but Delta Lake is included natively on Azure Databricks clusters. You can learn more in the azure docs.

What other systems do you want to be able to read these files? The protocol is also open source, so any system should be able to integrate. We are happy to work with other developers to help guide these integrations.

rockssk

unread,
Feb 19, 2020, 5:57:08 PM2/19/20
to Delta Lake Users and Developers
Thanks Michael for your prompt response. A sample case would be like shutting down Databricks cluster once the delta lake  files are written and then  Anytime of the day using azure functions or Azure Data Factory to consume the delta files natively without involving spark/databricks cluster. If you have any sample integration example that would be great starting point

rockssk

unread,
Feb 20, 2020, 9:56:45 AM2/20/20
to Delta Lake Users and Developers
In addition, since the deltalake is open sourced, can we leverage the benefits of this file system interacting directly with the Delta APIs and without Spark cluster at all?

shiche...@gmail.com

unread,
Nov 2, 2020, 9:15:41 PM11/2/20
to Delta Lake Users and Developers
As so far, is there still no direct API to manipulate the underlying files without spark?

Shixiong(Ryan) Zhu

unread,
Nov 2, 2020, 9:23:32 PM11/2/20
to shiche...@gmail.com, Delta Lake Users and Developers
We created a Delta Standalone Reader project and merged it recently. See https://github.com/delta-io/connectors/pull/51 It provides basic APIs to access Delta tables without Spark. Please take a look if you are interested. Feel free to create issue requests https://github.com/delta-io/connectors if you have any suggestions to these APIs.

Best Regards,

Ryan


--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/a1f4dcf6-efb4-49df-b03b-cc829553ae55n%40googlegroups.com.

Joao Pedro Afonso Cerqueira

unread,
Nov 3, 2020, 4:25:38 AM11/3/20
to shiche...@gmail.com, Delta Lake Users and Developers
 You should try other formats , if you're interested in manipulating parquet with version control in a Blog Storage.

 There is Apache Iceberg (( https://iceberg.apache.org/  ))), that you can connect with other frameworks than Spark, and supports schema evolution, time travel, partition layout evolution and hidden partitioning.

  This is being maintained by Netflix and Cloudera guys, in an open source maner, without locking you to a specific framework like spark.

 --
Best Regards,

   João P. A. Cerqueira
       +44(0)7572550311 | jo...@fuelbigdata.com   | jpacerqueira.c...@gmail.com | https://fuelbigdata.com 


--

Chris Hoshino-Fish

unread,
Nov 3, 2020, 3:12:47 PM11/3/20
to Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers
Hi Joao,

Just a reminder, this email list is for discussing Delta Lake functionality, not other projects. Iceberg is certainly similar to Delta and has some of the same features, but Iceberg is behind Delta in terms of features availability and maturity, for instance UPDATE and MERGE support with standard APIs. I'm a bit confused about the "locking you in to Spark" comment - Spark is the best open source ETL software available right now, so I would personally say that Delta has made a good choice there. Delta's goal is to be the best storage format for data in public clouds, so it will certainly evolve over time, hence the standalone Rust client - https://github.com/delta-io/delta.rs which will evolve to include writes. We'd also be happy to work with members of the other data processing frameworks to build Delta writers, like Apache Flink. Frameworks like Hive or Presto don't necessarily make as much sense - why use those for ETL when you could use Spark?

Please keep comments in this email list related to Delta only. Thanks!

-Chris



--

Chris Hoshino-Fish

Sr. Solutions Architect

Databricks Inc.

fi...@databricks.com

(415) 610-8520

Mushtaq Ahmed

unread,
Nov 3, 2020, 8:43:24 PM11/3/20
to Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers
On Wed, Nov 4, 2020 at 1:42 AM Chris Hoshino-Fish <fi...@databricks.com> wrote:
hence the standalone Rust client - https://github.com/delta-io/delta.rs which will evolve to include writes.
 
It is good to know that the rust client will evolve to include writes. 
Is there a plan to add writing support for the jvm standalone connector? From the FAQ section, it says:

Can I write to a Delta table using this connector?

No. The connector doesn't support writing to a Delta table.

It will be great if write support can be added to the jvm connector itself.
 

Denny Lee

unread,
Nov 3, 2020, 8:51:02 PM11/3/20
to Mushtaq Ahmed, Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers
The Rust API is being maintained by the community at large and was started by the folks at Scribd.  Chime in to the repo via issues and provide feedback / create issues so we can work together to get more features in, eh?!  

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.


--

lec ssmi

unread,
Nov 3, 2020, 9:13:51 PM11/3/20
to Denny Lee, Mushtaq Ahmed, Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, Delta Lake Users and Developers
Hope that Delta  can have an independent read and write API, so that users can read and write data  through ordinary java programs. For some connectors, users can also customize their own implementation.  

Denny Lee <denn...@databricks.com> 于2020年11月4日周三 上午9:51写道:

QP Hou

unread,
Nov 4, 2020, 1:48:42 AM11/4/20
to Chris Hoshino-Fish, Joao Pedro Afonso Cerqueira, shiche...@gmail.com, Delta Lake Users and Developers
As Chris already mentioned, Delta lake's design [0] doesn't have any Spark specific lock in. That's why we are able to build a Rust client that can interact with Delta tables natively without JVM [1]. The native client can be easily wrapped by other languages if needed, including Java. For example, we already created PoC Ruby and Python clients based off the Rust core [2][3].

Building the first full read/write implementation of the delta lake using Spark also makes a lot of sense because it is the best data lake ETL framework out there. But that doesn't mean Delta formats can only be accessed/managed by Spark. The community is growing and I do expect to see more alternative implementations being created as adoption increases.

[0]: https://github.com/delta-io/delta/blob/master/PROTOCOL.md
[1]: https://github.com/delta-io/delta.rs
[2]: https://github.com/delta-io/delta.rs/tree/main/python
[3]: https://github.com/delta-io/delta.rs/tree/main/ruby


Thanks,
QP Hou


Reply all
Reply to author
Forward
0 new messages