Creating arbitrary length file using unbuffered IO???

Jean Cyr

unread,

Aug 13, 2004, 8:05:28 PM8/13/04

to

Not sure if this group is the appropriate place for this posting but I could not
find an more appropriate group.

The Windows SDK docs state that when using unbuffered output the length
of the buffer to be written must be a multiple of the sector size of the device
where the file is being written. Furthermore the buffer must be written starting
at an offset in the file that is also a multiple of the sector size. This
implies that the size of a file thusly created must also be a multiple of the
sector size.

I find it hard to accept that an arbitrary size file cannot be created using
unbuffered IO! Is that indeed the case, or am I missing something?

Jean Cyr

/* Time ain't money when all you got is time. */

David J. Craig

unread,

Aug 13, 2004, 9:19:43 PM8/13/04

to

Rather simple question. Create and write the file data using non-buffered
writes ensuring to keep with the sector size rules. Then re-open the file
using normal IO and set the file size where you want it. This will cause
some cache activity, but if you don't issue a read, it may be minimal.
Don't forget that the directory entries and FAT/system data will probably be
cached even with the non-buffered create.

"Jean Cyr" <jc...@online.nospam> wrote in message
news:%23vwVpJZ...@TK2MSFTNGP09.phx.gbl...

Jean Cyr

unread,

Aug 13, 2004, 11:33:37 PM8/13/04

to

Thanks, but I am looking for a more direct approach. The need close then
re-open the file seems wasteful and inelegant.

Gary G. Little

unread,

Aug 14, 2004, 3:19:40 PM8/14/04

to

Of course you can have an arbitrary file size. However, the media to which
you write any given sized data is constrained to store your data in a
physical representation. If it is an HDD, then you must store your
arbitrarily sized file to a sequence of fixed size sectors on that HDD. You
may create a file containing "REM HELLO WORLD" that only occupies 11 bytes,
and if you look at the file size using DIR or explorer it will say 11 bytes.
However, when you write that to disc it will OCCUPY a minimum of 512 bytes
or the minimum amount of data you can access on typical HDD media. Windows
and NTFS typically group HDD sectors into a cluster, so that 11 byte batch
file will typically occupy SECTOR SIZE times SECTORS per CLUSTER. In a FAT
system on a 1 GIG HDD, that 11 bytes could easily tie up 64K of media space.
Anytime you access that file the minimum you can access is that cluster
size, so in the example given, the disc controller will transfer 64k to and
from memory.

--
The personal opinion of
Gary G. Little

"Jean Cyr" <jc...@online.nospam> wrote in message
news:%23vwVpJZ...@TK2MSFTNGP09.phx.gbl...

Alexander Grigoriev

unread,

Aug 14, 2004, 6:06:02 PM8/14/04

to

The minimum you can read/write in non-buf mode is a sector. Cluster concept
is only used to convert logical offset to disk LBA.

"Gary G. Little" <gglittl...@sbcglobal.net> wrote in message
news:gBtTc.811$ua5...@newssvr22.news.prodigy.com...

Alexander Grigoriev

unread,

Aug 14, 2004, 6:09:20 PM8/14/04

to

I've had to resort to such solution, but I wonder why SetEndOfFile doesn't
work for non-sector aligned values. You can read a file which is not
multiple of sector size in non-buffered mode (assuming your buffers are
always multiple of sector), but why then not SetEndOfFile? it doesn't
involve write to the file, anyway (other than possible zero-filling newly
allocated clusters).

"David J. Craig" <SeniorDri...@shogunyoshimuni.com.net> wrote in
message news:uY8OMzZ...@TK2MSFTNGP12.phx.gbl...

Jean Cyr

unread,

Aug 14, 2004, 6:50:32 PM8/14/04

to

"Alexander Grigoriev" <al...@earthlink.net> wrote:

>The minimum you can read/write in non-buf mode is a sector. Cluster concept
>is only used to convert logical offset to disk LBA.

I am not concerned with the physical space occupied on the media. As you
say, that will always be a multiple of the cluster size (not sector size). I
also realize that the smallest unit of data that can be written to an HDD is
one sector. But that's not the point.

The actual file size as reported by Windows, has byte resolution. Let me
explain in more practical terms. Lets say that I wanted to copy file A to file B
using unbuffered IO, and that file A is 513 bytes long. File A obviously
occupies 2 HD sectors, even though Windows reports it as 513 bytes long.
Using unbuffered writes to create file B, I would be obligated to write 1024
bytes in order to meet the sector multiples rule (assuming a sector size of
512). Upon closing this file (file B) Windows would report it as 1024 bytes
long, not 513!

This is the problem I am trying to solve. Short of closing and reopening the
file using buffered IO, how can I set the file size to the actual number of
valid data bytes in the file, instead of the space it occupies on the media.
This seems like a simple requirement easily achieved when using buffered
IO, but seemingly impossible when using unbuffered IO!

Jean Cyr

unread,

Aug 14, 2004, 7:00:01 PM8/14/04

to

"Alexander Grigoriev" <al...@earthlink.net> wrote:

>I've had to resort to such solution, but I wonder why SetEndOfFile doesn't
>work for non-sector aligned values. You can read a file which is not
>multiple of sector size in non-buffered mode (assuming your buffers are
>always multiple of sector), but why then not SetEndOfFile? it doesn't
>involve write to the file, anyway (other than possible zero-filling newly
>allocated clusters).

That's the first thing I tried. But SetEndOfFile only allows you to mark the
current position as the end of the file. You would have to move the current
file pointer first to the desired position using SetFilePointer or
SetFilePointerEx. But those functions have the same limitations as ReadFile
and WriteFile with respect to positioning when using unbuffered IO.

From SDK SetFilePointerEx specs.:

"If the hFile handle was opened with the FILE_FLAG_NO_BUFFERING flag
set, an application can move the file pointer only to sector-aligned positions."

Slava M. Usov

unread,

Aug 14, 2004, 8:23:34 PM8/14/04

to

"Jean Cyr" <jc...@online.nospam> wrote in message

news:#26#bElgEH...@TK2MSFTNGP12.phx.gbl...

[...]

> The actual file size as reported by Windows, has byte resolution. Let me
> explain in more practical terms. Lets say that I wanted to copy file A to
> file B using unbuffered IO, and that file A is 513 bytes long. File A
> obviously occupies 2 HD sectors, even though Windows reports it as 513
> bytes long.

No. NTFS can store small files very efficiently, sharing sectors with other
small files and system data.

S

Jean Cyr

unread,

Aug 14, 2004, 10:05:28 PM8/14/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote:

>> The actual file size as reported by Windows, has byte resolution. Let me
>> explain in more practical terms. Lets say that I wanted to copy file A to
>> file B using unbuffered IO, and that file A is 513 bytes long. File A
>> obviously occupies 2 HD sectors, even though Windows reports it as 513
>> bytes long.
>
>No. NTFS can store small files very efficiently, sharing sectors with other
>small files and system data.

Ok, so tell me how I would create such a file, say 10 bytes long, using
unbuffered IO.

Slava M. Usov

unread,

Aug 15, 2004, 10:02:53 AM8/15/04

to

"Jean Cyr" <jc...@online.nospam> wrote in message

news:OPsGXxmg...@TK2MSFTNGP11.phx.gbl...

I don't think you can. NTFS may be smart enough to "compact" files when they
shrink below some threshold, but I do not know for sure.

If I were writing an intelligent copy utility, I would use the unbuffered
mode only to write the whole clusters of a file, and write its tail in the
normal mode. And I would probably use the unbuffered mode only for files
beyond some threshold, 256K and up.

S

Jean Cyr

unread,

Aug 15, 2004, 10:49:11 AM8/15/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote:

>> >No. NTFS can store small files very efficiently, sharing sectors with
>> >other small files and system data.
>>
>> Ok, so tell me how I would create such a file, say 10 bytes long, using
>> unbuffered IO.
>
>I don't think you can. NTFS may be smart enough to "compact" files when they
>shrink below some threshold, but I do not know for sure.

That is essentially, the answer I'm trying to get confirmed by MS.

>If I were writing an intelligent copy utility, I would use the unbuffered
>mode only to write the whole clusters of a file, and write its tail in the
>normal mode. And I would probably use the unbuffered mode only for files
>beyond some threshold, 256K and up.

My experiments show that the overhead of closing then re-opening the file
cancel out any performance gains that could be realized using unbuffered IO.
At least for the distribution of file sizes we typically encounter. Is there a
way to switch from un-buffered to buffered mode without incurring the overhead
of re-opening the file?

Alexander Grigoriev

unread,

Aug 15, 2004, 12:01:43 PM8/15/04

to

You may want to use unbuffered mode for handling big files, because it saves
you from file cache bloat (very nasty flaw in Windows memory management).
For such files, overhead of closing/reopening a file is not significant.

If you work with a lot of small files in sequence, you may open many at
once, read/write them, close. This saves you the disk seeks from the
directory area to the data area.

"Jean Cyr" <jc...@online.nospam> wrote in message

news:e97$HctgEH...@TK2MSFTNGP11.phx.gbl...

Jean Cyr

unread,

Aug 15, 2004, 12:58:12 PM8/15/04

to

"Alexander Grigoriev" <al...@earthlink.net> wrote:

>You may want to use unbuffered mode for handling big files, because it saves
>you from file cache bloat (very nasty flaw in Windows memory management).
>For such files, overhead of closing/reopening a file is not significant.

Thanks. That is probably the approach I will end up using.

>If you work with a lot of small files in sequence, you may open many at
>once, read/write them, close. This saves you the disk seeks from the
>directory area to the data area.

Unfortunately our algorithm does not lend itself easily to this approach. A
good idea though.