Hello Team,
I am looking for a help near usage of getRemainigFilter to filter the data at row level.
With the help of file skipping I am able to filter the much of the files. However as per documentation getRemainingFilter helps to do the filter that does not satisfy the filter.
Please find my code below and suggest changes.
scanBuilder = dsnapshot.getScanBuilder(engine).withFilter(engine,predicate)
Scan deltaScan = scanBuilder.build();
CloseableIterator<FilteredColumnarBatch> filteredBatch = deltaScan.getScanFiles(engine);
Optional<Predicate> new_predicate = deltaScan.getRemainingFilter();
Row scanStateRow = deltaScan.getScanState(engine);
try {
List<Object[]> deltaObjectList = new ArrayList<>();
while(filteredBatch.hasNext()) {
FilteredColumnarBatch scanFileColumnarBatch = filteredBatch.next();
CloseableIterator<Row> scanFileRows = scanFileColumnarBatch.getRows();
StructType physicalReadSchema =
ScanStateRow.getPhysicalDataReadSchema(engine, scanStateRow);
System.out.println("Schema: " + physicalReadSchema.toString());
while (scanFileRows.hasNext()) {
Row scanFileRow = scanFileRows.next();
FileStatus fileStatus = InternalScanFileUtils.getAddFileStatus(scanFileRow);
CloseableIterator<ColumnarBatch> physicalDataIter =
engine.getParquetHandler().readParquetFiles(
singletonCloseableIterator(fileStatus),
physicalReadSchema,new_predicate
);
//further logic to print the filtered data.
Your help is much appreciated.