The Windows SDK docs state that when using unbuffered output the length
of the buffer to be written must be a multiple of the sector size of the device
where the file is being written. Furthermore the buffer must be written starting
at an offset in the file that is also a multiple of the sector size. This
implies that the size of a file thusly created must also be a multiple of the
sector size.
I find it hard to accept that an arbitrary size file cannot be created using
unbuffered IO! Is that indeed the case, or am I missing something?
Jean Cyr
/* Time ain't money when all you got is time. */
"Jean Cyr" <jc...@online.nospam> wrote in message
news:%23vwVpJZ...@TK2MSFTNGP09.phx.gbl...
--
The personal opinion of
Gary G. Little
"Jean Cyr" <jc...@online.nospam> wrote in message
news:%23vwVpJZ...@TK2MSFTNGP09.phx.gbl...
"Gary G. Little" <gglittl...@sbcglobal.net> wrote in message
news:gBtTc.811$ua5...@newssvr22.news.prodigy.com...
"David J. Craig" <SeniorDri...@shogunyoshimuni.com.net> wrote in
message news:uY8OMzZ...@TK2MSFTNGP12.phx.gbl...
>The minimum you can read/write in non-buf mode is a sector. Cluster concept
>is only used to convert logical offset to disk LBA.
I am not concerned with the physical space occupied on the media. As you
say, that will always be a multiple of the cluster size (not sector size). I
also realize that the smallest unit of data that can be written to an HDD is
one sector. But that's not the point.
The actual file size as reported by Windows, has byte resolution. Let me
explain in more practical terms. Lets say that I wanted to copy file A to file B
using unbuffered IO, and that file A is 513 bytes long. File A obviously
occupies 2 HD sectors, even though Windows reports it as 513 bytes long.
Using unbuffered writes to create file B, I would be obligated to write 1024
bytes in order to meet the sector multiples rule (assuming a sector size of
512). Upon closing this file (file B) Windows would report it as 1024 bytes
long, not 513!
This is the problem I am trying to solve. Short of closing and reopening the
file using buffered IO, how can I set the file size to the actual number of
valid data bytes in the file, instead of the space it occupies on the media.
This seems like a simple requirement easily achieved when using buffered
IO, but seemingly impossible when using unbuffered IO!
>I've had to resort to such solution, but I wonder why SetEndOfFile doesn't
>work for non-sector aligned values. You can read a file which is not
>multiple of sector size in non-buffered mode (assuming your buffers are
>always multiple of sector), but why then not SetEndOfFile? it doesn't
>involve write to the file, anyway (other than possible zero-filling newly
>allocated clusters).
That's the first thing I tried. But SetEndOfFile only allows you to mark the
current position as the end of the file. You would have to move the current
file pointer first to the desired position using SetFilePointer or
SetFilePointerEx. But those functions have the same limitations as ReadFile
and WriteFile with respect to positioning when using unbuffered IO.
From SDK SetFilePointerEx specs.:
"If the hFile handle was opened with the FILE_FLAG_NO_BUFFERING flag
set, an application can move the file pointer only to sector-aligned positions."
[...]
> The actual file size as reported by Windows, has byte resolution. Let me
> explain in more practical terms. Lets say that I wanted to copy file A to
> file B using unbuffered IO, and that file A is 513 bytes long. File A
> obviously occupies 2 HD sectors, even though Windows reports it as 513
> bytes long.
No. NTFS can store small files very efficiently, sharing sectors with other
small files and system data.
S
>> The actual file size as reported by Windows, has byte resolution. Let me
>> explain in more practical terms. Lets say that I wanted to copy file A to
>> file B using unbuffered IO, and that file A is 513 bytes long. File A
>> obviously occupies 2 HD sectors, even though Windows reports it as 513
>> bytes long.
>
>No. NTFS can store small files very efficiently, sharing sectors with other
>small files and system data.
Ok, so tell me how I would create such a file, say 10 bytes long, using
unbuffered IO.
I don't think you can. NTFS may be smart enough to "compact" files when they
shrink below some threshold, but I do not know for sure.
If I were writing an intelligent copy utility, I would use the unbuffered
mode only to write the whole clusters of a file, and write its tail in the
normal mode. And I would probably use the unbuffered mode only for files
beyond some threshold, 256K and up.
S
>> >No. NTFS can store small files very efficiently, sharing sectors with
>> >other small files and system data.
>>
>> Ok, so tell me how I would create such a file, say 10 bytes long, using
>> unbuffered IO.
>
>I don't think you can. NTFS may be smart enough to "compact" files when they
>shrink below some threshold, but I do not know for sure.
That is essentially, the answer I'm trying to get confirmed by MS.
>If I were writing an intelligent copy utility, I would use the unbuffered
>mode only to write the whole clusters of a file, and write its tail in the
>normal mode. And I would probably use the unbuffered mode only for files
>beyond some threshold, 256K and up.
My experiments show that the overhead of closing then re-opening the file
cancel out any performance gains that could be realized using unbuffered IO.
At least for the distribution of file sizes we typically encounter. Is there a
way to switch from un-buffered to buffered mode without incurring the overhead
of re-opening the file?
If you work with a lot of small files in sequence, you may open many at
once, read/write them, close. This saves you the disk seeks from the
directory area to the data area.
"Jean Cyr" <jc...@online.nospam> wrote in message
news:e97$HctgEH...@TK2MSFTNGP11.phx.gbl...
>You may want to use unbuffered mode for handling big files, because it saves
>you from file cache bloat (very nasty flaw in Windows memory management).
>For such files, overhead of closing/reopening a file is not significant.
Thanks. That is probably the approach I will end up using.
>If you work with a lot of small files in sequence, you may open many at
>once, read/write them, close. This saves you the disk seeks from the
>directory area to the data area.
Unfortunately our algorithm does not lend itself easily to this approach. A
good idea though.