Guidance on deltalake table properties in deltalake-rs

52 views

Skip to first unread message

Nic Schrading

unread,

Jan 8, 2026, 7:26:15 PMJan 8

to Delta Lake Users and Developers

Hello,

I was unable to join the slack channel (looks like it might have run out of free user slots?) so I'm posting here instead.

I'm looking for guidance around the use of table properties like
AutoOptimizeAutoCompact. Am I right that setting this on a deltalake table in rust won't actually do anything because deltalake-rs is not running any higher level "connector" logic like spark would be?

If this is true, is the best alternative when running purely in rust to have a separate process periodically running optimize/vacuum on the table?

Secondly, I'm wondering if any experts here have guidance or documentation to share on how best to deal with wide tables (thousands of columns). I understand that performance improvements have been made with metadata parsing as of the last release in arrow. But even with arrow 57, I'm hitting fairly significant CPU spikes when running standard min/max bin queries across time ranges on wide tables. I have followed the guidance to remove stats for the columns (except the timestamp column). I suspect the issue is that my underlying parquet files need to be optimized to larger sizes to reduce the total number of files we need to parse, but I'm wondering if others have further guidance on what to look for or what to tune.

Thank you,

Nic Schrading

R Tyler Croy

unread,

Jan 10, 2026, 9:54:07 AMJan 10

to Nic Schrading, Delta Lake Users and Developers

(replies inline)

On Thursday, January 8th, 2026 at 4:26 PM, Nic Schrading <n...@revel.io> wrote:

> Hello,
> I was unable to join the slack channel (looks like it might have run out of free user slots?) so I'm posting here instead.
>

> I'm looking for guidance around the use of table properties like
> AutoOptimizeAutoCompact. Am I right that setting this on a deltalake table in rust won't actually do anything because deltalake-rs is not running any higher level "connector" logic like spark would be?
>

> If this is true, is the best alternative when running purely in rust to have a separate process periodically running optimize/vacuum on the table?

Your understanding of the support of the auto-optimize table property with regards to delta-rs is correct. What I do, and a number of other folks using delta-rs do is run a parallel process exactly as you describe to provide that table management. In many of my production environments the resources required for running optimize in a timely manner are better suited for a Spark cluster. In those situations we have a Rust-based process which is writing, and then periodic optimize jobs running, typically on yesterday's partition.

> Secondly, I'm wondering if any experts here have guidance or documentation to share on how best to deal with wide tables (thousands of columns). I understand that performance improvements have been made with metadata parsing as of the last release in arrow. But even with arrow 57, I'm hitting fairly significant CPU spikes when running standard min/max bin queries across time ranges on wide tables. I have followed the guidance to remove stats for the columns (except the timestamp column). I suspect the issue is that my underlying parquet files need to be optimized to larger sizes to reduce the total number of files we need to parse, but I'm wondering if others have further guidance on what to look for or what to tune.

There were some performance deficiencies observed with delta-kernel-rs and tables with billions of transactions and wider columns (100s). Thousands of columns is quite wide! If you're familiar with `perf` for profiling on Linux, I certainly wouldn't mind seeing the `perf.data` from a simple Rust process (with debug symbols! which opens the table, e.g.

let table = deltalake::open_table(table_url).await?;

Run with: `perf record -e cycles -m 8M --call-graph=dwarf --sample-cpu --switch-events --aio -z -- ./target/debug/my-repro-case`

Without seeing the tables' themselves, that's the best way I think I can help understand the adverse performance.

If you're not able to provide a performance profile, I'm also happen to sign an NDA under my corporate alt (rty...@buoyantdata.com) to see exactly what's causing the performance issue here.

Cheers