--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com.
HiWe are using Delta features. The only problem we faced till now is Hive can not read DELTA outputs by itself (even if the Hive metastore is shared). However, if we create hive external table pointing to the folder (and with Vacuum), it can read the data.Other than that, the feature looks good and well thought out. We are doing a volume testing now....BestAyan
--Best Regards,
Ayan Guha
Is there a plan to have a business catalog component for the Data Lake? If not how would someone make a proposal to create an open source project related to that. I would be interested in building out an open source data catalog that would use the Hive metadata store as a baseline for technical metadata.
--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CAE4dWq-4rC8n5OXuB7NRfDhY4ZLwC8w20cLf7wbktvLKWotHow%40mail.gmail.com.
HiWe used spark.sql to create a table using DELTA. We also have a hive metastore attached to the spark session. Hence, a table gets created in Hive metastore. We then tried to query the table from Hive. We faced following issues:
- SERDE is SequenceFile, should have been Parquet
- Scema fields are not passed.
Essentially the hive DDL looks like:CREATE TABLE `TABLE NAME`( `col` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'path'=WASB PATH') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION ' WASB PATH'
TBLPROPERTIES ( 'spark.sql.create.version'='2.4.0', 'spark.sql.sources.provider'='DELTA', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}', 'transient_lastDdlTime'='1556544657')
Is this expected? And will the use case be supported in future releases?
We are now experimenting
Best
Ayan
Thanks for confirmation. We are using the workaround to create a separate Hive external table STORED AS PARQUET with the exact location of Delta table. Our use case is batch-driven and we are running VACUUM with 0 retention after every batch is completed. Do you see any potential problem with this workaround, other than during the time when the batch is running the table can provide some wrong information?