64kb Game

0 views

Skip to first unread message

Osman Briseno

unread,

Aug 3, 2024, 10:57:10 AM8/3/24

to hildetecpo

I was given the SQL Servers few weeks ago. First thing I realize: the customer has not formatted the disks with an allocation unit size set to 64KB, the disks are formatted with the default 4KB unit size. I explain to the customer that it is best practice to format the disk in 64KB because SQL performs disk IO in extents and an extent is 8 pages, each of which are 8KB, for a total of 64KB, hence formatting the SQL disks in 64KB will significantly improve performances.

My expectation is that I should have more IOPS and less latency when I read/write a 64KB block on a disk formatted with 64KB block size. My assumption is based on the idea that one operation is faster than 16 operations (16X4KB) for the same amount of data (64KB) with SCSI disks. However, my tests shows the same results, meaning this is no performance improvement from a benchmark standpoint. What am I missing ? Is my test wrong? Misunderstanding of SQL ? Misunderstanding of storage ? How can I prove that disks formatted in 64KB improves SQL performance ?

Similarly, a 64kb au by itself won't prevent SQL Server from performing a single 8kb page read from a data file if required. A 4kb au won't necessarily prevent a 64kb, 256kb or 512kb physical read from a data file.

The au is a provisioning/tracking unit for the filesystem. It is the smallest intitial allocation that can be made to a file. Going to have thousands of XML files smaller than 8kb in a filesystem? Using a 4kb au in that filesystem will allow file sizes of 4kb to 60kb(as well as any other multiple of 4kb as a size). While the 64kb au wouldn't allow any file sizes below 64kb or incremental file growth smaller than 64kb.

In high performance database layout recommendations, I typically recommend one busy file (transaction log or data file) per NTFS filesystem/Windows volume. This prevents files from interleaving with each other in the filesystem as they grow. Interleaved files results in filesystem fragmentation (mainly in the sense of loss of contiguity but also sometimes in the sense of mixed-up-ness or wasted space). It also requires more filesystem metadata overall to track.

Interleaved files can result in performance degradation if the breaks in file contiguity prevent large reads which would otherwise be possible. More metadata in a 4k vs 64k au filesystem can have a performance impact - especially as individual file size or occupied filesystem space grows. More metadata for the same contents means more disk reads/writes to retrieve or maintain the metadata. It also means more server RAM to hold the metadata, and more memory accesses to traverse the maps from file offset to location within filesystem.

I think a very well-designed test could show the difference between 64k and 4k au even on a 64bit OS. But it would have to include file interleaving and large enough file sizes for the differences in metadata size and file contiguity to present themselves in the results.

I still recommend 64k au on new systems. But on existing systems with significant data already present I often advise to wait until a storage tech refresh or other system migration to *change* from 4K to 64k.

However, if we put SQL server aside for one second, would you agree or not the reading/writing 64KB block size from a disk formated with 64 au should provide better performance than reading/writing 64KB block size from a disk formated with 4KB au ?

For example, with a single 64kb file (filesystem cache disabled) all alone on a filesystem I expect no difference in the maximum read rate against that file for reads or writes of 8k or 64k regardless of 4k/64k au.

You will notice a significant drop in through put with 4K block size on random reads. Attached is the graph from a diskspd test that i did , this is how i found that the drives were formatted with 4K allocation unit. The one in purple is on a server with drives formatted in 64K and the other one in 4k. These tests were executed against two different LUNS on different physical servers at the same time.

I also thought it could be the reason (VMFS 1MB), but there is no indication on internet that says 64KB au is useless on virtual disks. Microsoft also recommends to format SQL disks with 64KB on VMs on Azure... (maybe the storage is optimized for 64KB...)

Once clusters are allocated to a file for growth, the work of the allocation unit is done. When a physical read is issued against a file, at the filesystem layer the file-based address is resolved for lower levels of the Windows stack based on the cluster allocations.

A single file on a new volume should have completely contiguous data. Unless the NTFS metadata was fragmented and bloated by a lot of prior activity like massive amounts of small file creation and deletion, the metadata should also be very compact.

If that file is a transaction log, sequential writes of 512bytes to 60k will occur regardless of 4k/64k cluster size. If its a database file, reads of 8k to 512k (the default Windows maximum physical IO size and maximum physical IO size for VMware pvscsi vhba) will also occur regardless of 4k/64k cluster size. On a VMware vm, vmfs filesystem block size will have minimal effect outside the first write to thin volumes. The 1 mb vmfs block size doesn't force small physical reads to become larger.

On real systems there are often departures from the ideal. Multiple files within a filesystem increase the odds of fragmentation of both data and metadata. Sometimes that's not even intentional. If a SQL Server data file grows while DBCC checkdb is running, the sparse file for the database snapshot can result in fragmentation of the base database file. Sparse files themselves are very susceptible to fragmentation, as is discussed in the "four stages part 2" post above.

In the past, NetApp indicated observed performance improvements on systems with 4k allocation units vs 64 k allocation units. I don't doubt their results, but I am skeptical of their conclusions. I bet closer examination of the test systems would reveal something other than cluster size was responsible for observed performance difference. Maybe file fragmentation within the filesystem. Maybe filesystem metadata fragmentation or bloat. Maybe 4k vs 512byte disk sector size...

4k allocation units allow large numbers of small files to be stored in an NTFS filesystem without a lot of empty space in the allocated clusters. Although there have been claims of observed performance gains for SQL Server with 4k au, I've always suspected there's another cause in those cases.

64k allocation units allow much larger maximum Windows volume sizes than 4k allocation units. For systems like SQL Server where small numbers of large files are expected within an NTFS filesystem, 64k au seems a natural choice. Its much less likely to experience very badly fragmented metadata and hopefully (if more than 1 file in the filesystem) will experience greater contiguity. But predicting a performance gain on any given system by converting from 4k to 64k au AND pinpointing the reason for improved performance is a very tricky business. I'm not overly skeptical when I see evidence of gain on some systems. But I'm also not surprised to see legitimate tests on other systems that show no measurable difference.

You are correct, MS still recommends 64k block size for SQL drives no matter if they are VMDK or not. The link you pasted states that the author doesn't think you will see much, if any difference in performance between 4k and 64k block sized VMDKs due to the fact that the VMFS volume is 1 MB block size.

When you attach each new drive to a virtual machine, you should add it to a new SCSI controller, 0:0 for the OS disk, 1:0 for the second disk, 2:0 for the third disk, 3:0 for the fourth disk and then I like to go back to 1:1 for the fifth disk or pick the SCSI controller with the smallest drive on it and so on. Then for each new SCSI controller added, make it VMware Paravirtual instead of the default LSI Logic SAS. This will get you better throughput and performance in the virtual machines. We stick to no more than 8 drives total for a virtual machine. I have no idea if this will give you different results for your IO tests between the VMDKs but something to consider. Also this method for adding new drives holds true for Hyper-V VMs as well. I can speak from experience that 5 VHDX drives on a single virtual SCSI controller will bring a virtual machine to its knees when it starts accessing files on those drives.