Regarding withFilter option in Delta Kernel API

34 views
Skip to first unread message

Nikhil Choudhari

unread,
May 8, 2025, 1:02:59 PMMay 8
to Delta Lake Users and Developers
Hello Team,

I am looking for a help near usage of getRemainigFilter to filter the data at row level.

With the help of file skipping I am able to filter the much of the files. However as per documentation getRemainingFilter helps to do the filter that does not satisfy the filter.

Please find my code below and suggest changes.

scanBuilder = dsnapshot.getScanBuilder(engine).withFilter(engine,predicate)
Scan deltaScan = scanBuilder.build();
            CloseableIterator<FilteredColumnarBatch> filteredBatch = deltaScan.getScanFiles(engine);
            Optional<Predicate> new_predicate = deltaScan.getRemainingFilter();
            Row scanStateRow = deltaScan.getScanState(engine);

        try {
            List<Object[]> deltaObjectList = new ArrayList<>();
            while(filteredBatch.hasNext()) {
                FilteredColumnarBatch scanFileColumnarBatch = filteredBatch.next();
                CloseableIterator<Row> scanFileRows = scanFileColumnarBatch.getRows();
                StructType physicalReadSchema =
                        ScanStateRow.getPhysicalDataReadSchema(engine, scanStateRow);
                System.out.println("Schema: " + physicalReadSchema.toString());
                while (scanFileRows.hasNext()) {
                    Row scanFileRow = scanFileRows.next();
                    FileStatus fileStatus = InternalScanFileUtils.getAddFileStatus(scanFileRow);
                    CloseableIterator<ColumnarBatch> physicalDataIter =
                            engine.getParquetHandler().readParquetFiles(
                                    singletonCloseableIterator(fileStatus),
                                    physicalReadSchema,new_predicate
                            );
//further logic to print the filtered data.

Your help is much appreciated.
Reply all
Reply to author
Forward
0 new messages