Delta Standalone not reading delta tables in ADLS gen2

294 views
Skip to first unread message

Soumitra Prasad

unread,
Sep 6, 2022, 11:39:24 AM9/6/22
to Delta Lake Users and Developers
Hi All,

Requesting some help here.

I am trying to connect to ADLS gen 2 using Delta Standalone - JAVA. I am using a service principal to connect but I am not able to read any data or schema.

I have attached screenshots of the code, configuration, output, and dependency added. Kindly help!

Thanks!
Screenshot 2022-09-06 210248.jpg
Screenshot 2022-09-06 205255.jpg
Screenshot 2022-09-06 210814.jpg
Screenshot 2022-09-06 205253.jpg
Screenshot 2022-09-06 210439.jpg
Screenshot 2022-09-06 205254.jpg

Shixiong(Ryan) Zhu

unread,
Sep 6, 2022, 1:05:28 PM9/6/22
to Soumitra Prasad, Delta Lake Users and Developers
Could you double check the table path? Looks like the table path contains no `_delta_log` directory.

Best Regards,

Ryan


--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/43316535-f5be-4daa-bddd-16e6fb69bd2an%40googlegroups.com.

Soumitra Prasad

unread,
Sep 6, 2022, 1:14:02 PM9/6/22
to Delta Lake Users and Developers
Hi Ryan,

Thanks for the reply.

I am having _delta_log table there. see the screenshot attached. I am confused why it is not able to connect even though the service principal attached to azure ADLS gen 2 has Blob Contributor Role.

Thanks,

Soumit

Screenshot 2022-09-06 224321.jpg

Shixiong(Ryan) Zhu

unread,
Sep 6, 2022, 1:21:25 PM9/6/22
to Soumitra Prasad, Delta Lake Users and Developers
Have you tried other roles such as Storage Blob Data Owner? I'm not familiar with this part. Are you able to talk to Azure support? This doesn't look like a Delta Lake issue. To debug whether this is a Delta Lake issue or an Azure hadoop connector issue, could you try to access the location using Hadoop FileSystem APIs directly, such as 

Configuration conf = ...
Path path = new Path(...)
System.out.println(path.getFileSystem(conf).listStatus(path).length)

Best Regards,

Ryan


Soumitra Prasad

unread,
Sep 6, 2022, 3:28:32 PM9/6/22
to Delta Lake Users and Developers
Hi Ryan,

It is solved now. Thanks for supporting. The issue is storage account URI for the GEN2 type account pointing to the delta table directory was wrong earlier. It will be in the below format to support Hadoop Filesystem API. This thing was missing on the Databricks Samples, you can see more details here: Apache Hadoop Azure support – Hadoop Azure Support: ABFS — Azure Data Lake Storage Gen2

"abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-delta-table>"

I have the working code script attached here. Thank you again!

Best Regards,
Soumit

Screenshot 2022-09-07 005408.jpg
App.java

Shixiong(Ryan) Zhu

unread,
Sep 6, 2022, 3:35:59 PM9/6/22
to Soumitra Prasad, Delta Lake Users and Developers
Cool!

> This thing was missing on the Databricks Samples

Do you have a link to this? Just trying to see if we can improve that.

Best Regards,

Ryan


Soumitra Prasad

unread,
Sep 6, 2022, 3:39:35 PM9/6/22
to Delta Lake Users and Developers

Shixiong(Ryan) Zhu

unread,
Sep 6, 2022, 3:47:33 PM9/6/22
to Soumitra Prasad, Delta Lake Users and Developers
Thanks! We will try to improve it. Probably put a few path examples for different cloud storages.

Best Regards,

Ryan


Blanca Rojo Martín

unread,
Jun 2, 2023, 3:58:12 PM6/2/23
to Delta Lake Users and Developers
Hi everyone!

I am finding this thread super useful. I am basically following the same process. I have written a spring boot app and import the same 3 jars in my pom. Hadoop-client is using reflection to get the right FileSystem for abfs. However that FileSystem exists in hadoop-azure jar. I have tried loads of combinations and I am consistently getting the same issue - "No FileSystem for scheme abfs".
This happens even when I use Soumitra's code above and uncomment line 56 onwards (For Hadoop FileSystem APIs source connectivity test).

Did anyone else have this issue?

Best wishes,

Blanca.

Blanca Rojo Martín

unread,
Jun 13, 2023, 12:38:36 PM6/13/23
to Delta Lake Users and Developers
I found a solution to my problem - posting here just in case anyone gets stuck in the same place:

You need to add the following to your hadoop config:

conf.set("fs.abfss.impl", "org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem")

Happy coding!

Blanca.

Reply all
Reply to author
Forward
0 new messages