Analysis Exception : Table or View not found for existing Delta table

4,976 views
Skip to first unread message

Rajan Patki

unread,
May 12, 2021, 6:02:14 AM5/12/21
to Delta Lake Users and Developers
I am new to Delalake. I was trying a simple example.
  1. Create dataframe from a csv using regualr Spark API
  2. Save it is as delta table
  3. Read it again.

It works fine. I can see the files are created in the default spark-warehouse folder.

But Next time I just want to read the saved table. So I comment code for the first two septs and re-run the program I get

Analysis Exception:Table or view not found

val transHistory = spark. read .option("header", "true") .option("inferschema", true) .csv(InputPath + "trainHistory.csv");

 transHistory.write.format("delta").mode(SaveMode.Overwrite).saveAsTable("transactionshistory") 

  val transHistoryTable = spark.read.format("delta").table("transactionshistory") transHistoryTable.show(10)

Am I doing something wrong ? I am using delta lake 0.8.0, Spark 3.0, and scala 2.12.13

Denny Lee

unread,
May 12, 2021, 12:09:48 PM5/12/21
to Delta Lake Users and Developers
I'm wondering if there is an issue associated with how the metastore is configured.  For sake of argument, could you try:

txHistory.write.format("delta").save("/filepath/txHistory")
val txHistoryTbl = spark.read.format("delta").load("/filepath/txHistory")
txHistoryTbl.show(10) 

Rajan Patki

unread,
May 13, 2021, 2:07:03 AM5/13/21
to Delta Lake Users and Developers
Thanks, Denny for your quick response.

I have already tried that and it works fine. When I read with file path it works fine but fails when I try to load it as a table.

I tested few more things yesterday. With Catalog API I tried to list databases and tables. I see the default database and when I am creating the table  and read it (above code) I see the table  transactionshistory as managed permanent table, however when next time if only try to read it (comment first 2 statement) I see that the default database does not have the table. So this is the actual problem. But I don't know why does this happen? I have set config same as given in quick start.
val spark = SparkSession.builder()
.master("local")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.appName("Delta lake example")
.getOrCreate()

Jacek Laskowski

unread,
May 13, 2021, 7:06:05 AM5/13/21
to Rajan Patki, Delta Lake Users and Developers
Hi Rajan,

Care to share the whole application and how you run it? Is this a standalone app or spark-shell?

Please post the whole stack trace.

I see nothing obvious that I could leverage to help you.

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/812813d5-705d-4e9a-bcb9-c6654ee594d2n%40googlegroups.com.

Rajan Patki

unread,
May 14, 2021, 1:55:27 AM5/14/21
to Delta Lake Users and Developers

Hi Jacek,


Following is the complete code

import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

object TableReadWrite {
  val InputPath = "D:\\Work\\RND\\Spark\\examples\\RetailData\\"

  def main(args: Array[String]): Unit = {
      
val spark = SparkSession.builder()
      .master("local")
      .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
      .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
      .appName("Delta lake example"
)
      .getOrCreate()

    
val transHistory = spark.
      read
      .option("header", "true")
      .option("inferschema", true)
      .csv(InputPath + "trainHistory.csv"
);

    //transHistory.write.format("delta").saveAsTable("transactionshistory")
    val transTable = spark.table("transactionshistory")
    transTable.show(10)
  }
}

Following is the stack trace

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: transactionshistory;;
'UnresolvedRelation [transactionshistory]

    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:106)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:92)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:177)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:92)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:89)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:130)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:156)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:133)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:133)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
    at org.apache.spark.sql.SparkSession.table(SparkSession.scala:584)
    at org.apache.spark.sql.SparkSession.table(SparkSession.scala:580)
    at TableReadWrite$.main(TableReadWrite.scala:22)
    at TableReadWrite.main(TableReadWrite.scala)

Notice that the statement

//transHistory.write.format(“delta”).saveAsTable(“transactionshistory”)

If I uncomment this and remove transactionhistory folder from the default spark warehouse path then it works fine. 

Meaning that when I am creating the table it gets the table in the catalog but if I am reading the existing table then it does not get the table. I have confirmed this with catalog APIs.

This is a very basic example and I am sure I am doing some silly mistakes or there is a gap in my understanding but I am not able to detect that.

Thanks for help

Regards,
Rajan

Jacek Laskowski

unread,
May 14, 2021, 4:25:06 AM5/14/21
to Rajan Patki, Delta Lake Users and Developers
Hi,

How do you run the app? spark-submit?

Adam Dec

unread,
May 14, 2021, 4:35:24 AM5/14/21
to Rajan Patki, Delta Lake Users and Developers
Hi, don't you just miss DB name?

Like: spark.table("DB_NAME.transactionshistory")
Otherwise it will look in Hive default db

/Adam

Mich Talebzadeh

unread,
May 14, 2021, 4:47:32 AM5/14/21
to Adam Dec, Rajan Patki, Delta Lake Users and Developers
Not really that familiar with delta but sounds like basics like DBName is missing or incorrect jar file

You can check if table created OK with

val DB = "test"
val tableName = "randomData"
val fullyQualifiedTableName =  DB + "."+ tableName
var rows = 0

if (spark.sql(s"SHOW TABLES IN ${DB} like '${tableName}'").count() == 1) {
     println (s"\n Table ${fullyQualifiedTableName} created OK")
 } else
 {   println (s"\n Table ${fullyQualifiedTableName} does not exist, quitting")
      sys.exit(1)
}

HTH
  



   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



rohit haseja

unread,
May 14, 2021, 4:50:48 AM5/14/21
to Adam Dec, Rajan Patki, Delta Lake Users and Developers
Hey Rajan,

If you want to read from table you should just use spark.table('table_name'). there is no need to mention the format delta.

you can refer to this link it might help you with this. I don't see anything else in this statement that can impact this.

Thanks,
Rohit Haseja.

rohit haseja

unread,
May 14, 2021, 4:52:31 AM5/14/21
to Adam Dec, Rajan Patki, Delta Lake Users and Developers
sorry for delay but heres the link for that article from databricks.


Thanks,
Rohit Haseja.

Rajan Patki

unread,
May 14, 2021, 6:08:12 AM5/14/21
to Delta Lake Users and Developers
I am running directly in Intellij.

Rajan Patki

unread,
May 14, 2021, 6:43:06 AM5/14/21
to Delta Lake Users and Developers

Hi All,
Thanks for response

@ adamdec85 & hasejarohit : I am creating the table in the default database so I had skipped it even while creating and I see that the database is created in the default spark warehouse location i.e. \spark_warehouse. I even tried giving explicit database name i.e. default.transactionhistory but still got the same error while reading the existing table.

@jacek : I am running directly from intellij

@Rohit Haseja : Yes I am following the same link and I am not using delta while reading the table.

I tried to check if the database and table is present in the catalog with the following APIs

spark.catalog.listDatabases().show()
spark.catalog.listTables().show()

When the table is not there and I am creating a new one I see that the database name is default and it has a table named transactionshistory which is managed table and not temporary, However when I run it only to read the existing table then the database name I get as default but the table is not there.

So the question is why Spark does not load the table.

Regards,
Rajan

Michael Zhang

unread,
May 14, 2021, 5:13:15 PM5/14/21
to Rajan Patki, Delta Lake Users and Developers
Hi Rajan,
   it is an expected behaviour. 
   All the database and table metadata is stored in hive metastore. 
   By default, delta io uses a built-in hivestore in JVM. If you create a table in one job run, and re-run the job referring to the table created in the last run, you will get the table not found error because it is a new build-in hivestore.
   this is what you are experiencing right now.

   you can configure delta io to use external hive store which is backed by a database (mysql, postgres etc)

   However, I cannot find any documentation regarding configuring an external hive store for delta io.

   If you use delta in databricks runtime, an external store is used and tables should be able to be referred globally. 

   Please note: tables created in delta derive from the parquet files which include the schema. The source of truth is parquet files. 

   given that, alternatively, you can use save to write dataframe to parquet files in delta format as @denny Lee suggested

txHistory.write.format("delta").save("/filepath/txHistory")




--
Thanks
Heng Zhang

Rajan Patki

unread,
May 18, 2021, 1:58:17 AM5/18/21
to Delta Lake Users and Developers
Thanks for the reply.  

Yes, your suggestion works I can create an external table on existing data, now I can move forward with my understanding.

However, df.write.format("delta").saveAsTable(...) should create managed table and one should be able to read it with spark.table(...) both these examples are mentioned in delta documentation.  https://docs.delta.io/latest/delta-batch.html#create-a-table&language-scala

but this does not work. saveAsTable (...) do create delta table files (delta_log folder and other files) but I can not read it with spark.table() in another session.

Regards,
Rajan

Michael Zhang

unread,
May 18, 2021, 4:02:18 PM5/18/21
to Rajan Patki, Delta Lake Users and Developers
the documentation you specified does not discuss hive store
when you use saveAsTable(...), tables metadata still goes to hive store. as. long as you use in-mem hivestore, then you won't be able to read it cross jvms

Heng



--
Thanks
Heng Zhang

Rajan Patki

unread,
May 19, 2021, 1:01:01 AM5/19/21
to Delta Lake Users and Developers
Yes. I understand. Thank you for the help.

Regards,
Rajan

Rajan Patki

unread,
Aug 14, 2021, 11:07:17 AM8/14/21
to Delta Lake Users and Developers
enableHiveSupport does the trick. You can read the delta table created in one session can be accessed from another session.

Just wanted to mention here because, I could not find this in the documentation, so it can be helpful if someone else stumbles. 

Michael Zhang

unread,
Aug 14, 2021, 12:20:26 PM8/14/21
to Rajan Patki, Delta Lake Users and Developers
Hi Rajan,

do you mean like this?

def getSparkSession(): SparkSession = { SparkSession.builder() .appName("Example Spark Submit") .enableHiveSupport() .config("immuta.spark.acl.assume.not.privileged", "true") .config("spark.hadoop.immuta.databricks.config.update.service.enabled", "false") .getOrCreate() }



--
Thanks
Heng Zhang

Rajan Patki

unread,
Aug 18, 2021, 5:22:13 AM8/18/21
to Delta Lake Users and Developers
Yes. I used delta configuration with enableHiveSupport
 val spark = SparkSession.builder()
        .master("local[*]")
        .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
        .appName("Delta lake example")
        .enableHiveSupport()
        .getOrCreate()
Reply all
Reply to author
Forward
0 new messages