25.07.2025 15:58, Artyom Ivanov:
> Hi everyone,
> I'm currently investigating why we write zeros to the end of the file (`PIO_init_data()`), and I don't quite understand why we're
> doing this.
If you want investigation, check relates issues:
Improve performance of database file growth after CORE-1228 was fixed [CORE1469] #1886
Use fast file grow on those Linux systems which supports it [CORE4443] #4763
> The file extension is done via `fallocate()`, and this call ensures that the new space will be initialized with zeros (source1
> <
https://man7.org/linux/man-pages/man2/fallocate.2.html>, source2 <
https://www.linuxquestions.org/questions/linux-newbie-8/
> fallocate-does-it-fill-the-space-with-zeros-
4175578213/>). In other words, the operating system guarantees that when reading this
> space through the file system, we will see zeros (primarily for security reasons and because of how SSDs work), even though
> physically, anything could be in that location on the disk (if you try to read it by bypassing the file system, for example, through
> `dd`, then most likely, there will not be zeros there).
It is interesting to know how exactly OS (or FS?) implements this guarantee.
For example, Windows maintains "valid data" marker for the every file (file stream)
and any attempt to read after this marker returns zero's. So far, so good. But any write
after "valid data" marker will force OS to fill the gap between marker and write position
by zero's. This is why we prefer to do it by self - in predictable way and using relatively
big IO block's for efficiency. At the same time the size of "init" block is much less than
size of file extension.
> If you really need to zero out the disk at the moment of allocating new
> space, there is the `FALLOC_FL_ZERO_RANGE` flag, but not all file systems support it, and we don't really need it, we just want to
> make sure there is no garbage when reading.
Perhaps, there was no advanced options such as FALLOC_FL_ZERO_RANGE when CORE4443 was
implemented, I don't remember such details.
> There was also an idea that explicit zeroing could be done to avoid creating a sparse file, but a sparse file can occur when there
> are holes (pages filled with zeros) in the middle of the file, which is not our case.
> Therefore, everything is moving towards getting rid of explicit zeroing. This gives the restore a good boost, since we're writing
> twice as many bytes due to the zeroing writes, but here it is worth mentioning that, due to FW being disabled, the restore writes
> less than twice as much data to the disk (here I am talking about the number of bytes written to disk, not `pwrite` system calls,
> since zeroing is done by writing a large number of pages in a single system call, i.e., simply counting the difference in `pwrite`
> calls does not give the whole picture).
Do you have a real numbers or it is just an estimation ? I remember that performance penalty
caused by filling file by zero's was about 15-20%. Of course it was about Windows/HDD at that time.
Regards,
Vlad