--
You received this message because you are subscribed to the Google Groups "projectnessie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectnessi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/projectnessie/c8748ad8-6f09-4eb9-8a7c-20cf5a805e6bn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Regarding the bulk insert support in Iceberg, I saw your question on Iceberg slack too.
I don't remember having bulk insert support (similar to Hudi), let's continue tracking that in Iceberg slack.
That being said, In Nessie,
we can create a temp_branch from the original_branch and do multiple normal insert operations in temp_branch
and merge them back to the original_branch as a single atomic commit (just a metadata operation).
Oh super, but will this avoid snapshots in the temp_branch ? This is critical.
Presently we write Parquets and operate directly on those, another approach we are considering is using the Nessie API ( by way of Table > Transaction > Append > Commit) however, this is going slow since there aren't any many decent samples around or the documentation :) ... But that's okay, part of S/W development.
Quick question here too, would you have any idea if Transaction.newAppend() >> DataFile >> Commit will use the same file or create a copy of the data in that file and we can delete the temp file ...
NessieCatalog d = new NessieCatalog();
d.setConf(s.sparkContext().hadoopConfiguration());
d.initialize("nessie", ImmutableMap.of(
"ref", "main",
"uri", "http://localhost:19120/api/v1",
"warehouse", "hdfs://voyager:9000/DI/warehouse"));
String tblName = "testtable8";
s.sqlContext().sql("create table nessie." + tblName + " (col0 string, col1 string, col2 string, col3 string, col4 string) " +
"using iceberg location 'hdfs://voyager:9000/DI/warehouse/mylocation/" + tblName + "'");
TableIdentifier tt = TableIdentifier.parse(tblName);
Table t = d.loadTable(tt);
Transaction transaction = t.newTransaction();
AppendFiles files = transaction.newAppend();
//Tried this approach #1
// AppendFiles files = t.newAppend();
String fileLocation = "hdfs://voyager:9000/DI/warehouse/" + tblName;
int numRecords = writeDummyData("temp_" + tblName);
DataFile file = DataFiles.builder(t.spec())
.withRecordCount(numRecords)
.withPath(fileLocation)
.withFormat(FileFormat.PARQUET)
.withFileSizeInBytes(Files.localInput(fileLocation).getLength())
.build();
//#1 approach , did not work, select count(*) did not show numRecords
// files.appendFile(file).commit();
files.appendFile(file);
transaction.commitTransaction(); //Raises Exception java.lang.IllegalStateException: Cannot commit transaction: last operation has not committed"
s.sqlContext().sql("select count(*) from nessie." + tblName).show();
table.newAppend().appendFile(file).commit();
AppendFiles files = t.newAppend();
String fileLocation = "hdfs://voyager:9000/DI/warehouse/" + tblName;
int numRecords = writeDummyData("temp_" + tblName);
DataFile file = DataFiles.builder(t.spec())
.withRecordCount(numRecords)
.withPath(fileLocation)
.withFormat(FileFormat.PARQUET)
.withFileSizeInBytes(Files.localInput(fileLocation).getLength())
.build();
//#1 approach , did not work, select count(*) did not show numRecords
files.appendFile(file).commit();
s.sqlContext().sql("select count(*) from nessie." + tblName).show();