Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Handling million small files

37 views
Skip to first unread message

John Dove

unread,
May 5, 2005, 11:20:39 AM5/5/05
to
Hi,
how good is Windows 2000 Server filesystem in handling one
million 512-bytes files?
Are there any performance numbers?

Thanks in advance.
John

David Lowndes

unread,
May 5, 2005, 11:41:28 AM5/5/05
to
>how good is Windows 2000 Server filesystem in handling one
>million 512-bytes files?
>Are there any performance numbers?

Doing what sort of operations?

Dave

Carl Daniel [VC++ MVP]

unread,
May 5, 2005, 12:01:44 PM5/5/05
to
John Dove wrote:
> Hi,
> how good is Windows 2000 Server filesystem in handling one
> million 512-bytes files?
> Are there any performance numbers?

Caveat - I've never had a million files in a single directory. I've had over
100,000 and the NTFS filesystem performed far better than *nix filesystems
with the same size/number of files.

As long as you're using NTFS volumes (not FAT32), you should be OK. NTFS
stores directories as a B-Tree, much as a database would organize a
clustered index. With files of only 512 bytes, the entire file may fit
within the MFT record, so disk fragmentation should be quite low. You'll
probably want to use a volume management tool (typically a disk
defragmenter) to extend the reserved MFT space to accomodate such a large
number of files without causing fragmentation of the MFT. You'll want at
least 1Gb of MFT space, I'd go for 2Gb or more.

Warning: browsing to a folder containing 1 million files with Explorer will
bring your machine to it's knees as explorer tries to read the entire
directory and build a list view of it. This is an explorer problem, not a
filesystem problem.

-cd


Slava M. Usov

unread,
May 5, 2005, 12:31:32 PM5/5/05
to
"John Dove" <anon...@discussions.microsoft.com> wrote in message
news:0d6e01c55185$fec7e2e0$a601...@phx.gbl...

About one year ago I ran a very simple test: create N empty files in one
directory, then open each file and write its filename into the file, then
delete the files. Results:

files 10000 100000 1000000 2000000
create, seconds 0 11 270 592
update, second 1 29 1047 1991
delete, seconds 1 31 656 1359

It was not Windows 2000 Server though, it was Windows 2003 Server, on a
plain vanilla ATA disk formatted with NTFS. I was not tracking the disk
usage, but I seem to remember it was no more than 4G, probably less.

S


Slava M. Usov

unread,
May 5, 2005, 12:35:23 PM5/5/05
to
"Carl Daniel [VC++ MVP]" <cpdaniel_remove...@mvps.org.nospam>
wrote in message news:OKjC4uYU...@TK2MSFTNGP10.phx.gbl...

[...]

> As long as you're using NTFS volumes (not FAT32)

FAT32 can have 64K files in one directory, at most. There are also
per-volume limitations.

S


Worldnet

unread,
May 5, 2005, 10:15:11 PM5/5/05
to
"John Dove" wrote:

John,

FWIW, we have an application here running on Win2K with lots of files spread
across two arrays (both formatted NTFS). The average file size is around
44K and is always less than 1MB. The average directory has around 28,000
files and the highest count for a single directory is around 57,000 files.
We're currently at around 1.35 million files total (we got the second array
when the first one started to run out of space at around 770,000 files).

We've never had any kind of performance problem with file creation or
reading. An audit process that enumerates all the individual files with
FindFirstFile and friends seems to run OK (but it runs unattended once in a
blue moon so we've never really checked it for performance). In fact the
first array I mentioned also holds the OS and SQL Server. (The second array
holds nothing but the files for this application.)

Of course, because of a problem someone else already pointed out, we never
use Explorer to examine any of the directories unless we decide the server
needs a good thrashing :)

Craig


John Dove

unread,
May 6, 2005, 5:37:18 AM5/6/05
to
Thanks to everyone contributed.
Browsing the folder doesn't matter as we use an ad-hoc
explorer, with a
virtual list view.

I'm interested in performance times for opening and reading
a random file from
the directory, and for updating and saving it back to NTFS
filesystem.
Carl, what volume management tool do you mean?
Also, do you suggest I split that single directory to 26 a-
to-z folders?

Markus Zingg

unread,
May 6, 2005, 9:09:25 AM5/6/05
to
>FAT32 can have 64K files in one directory, at most. There are also
>per-volume limitations.

This statement - while being correct - first caused some objection
here :-)

There are no reasons for this limitt implicitly caused by the FAT32
filesystem"per se" since directories (including the root directory)
are not different from clusterd files. However, quickly digging
through "fatformat.pdf" brought up the relevant parts which I share
here for others.

Markus

<------- excerpt from fatformat.pdf below this point -------->

· Similarly, a FAT file system driver must not allow a directory (a
file that is actually a container for other files) to be larger than
65,536 * 32 (2,097,152) bytes.

NOTE: This limit does not apply to the number of files in the
directory. This limit is on the size of the directory itself and has
nothing to do with the content of the directory. There are two reasons
for this limit:

1. Because FAT directories are not sorted or indexed, it is a bad idea
to create huge directories; otherwise, operations like creating a new
entry (which requires every allocated directory entry to be checked to
verify that the name doesn’t already exist in the directory) become
very slow.

2. There are many FAT file system drivers and disk utilities,
including Microsoft’s, that expect to be able to count the entries in
a directory using a 16-bit WORD variable. For this reason,
directories cannot have more than 16-bits worth of entries.


Doron Holan [MS]

unread,
May 7, 2005, 11:58:23 AM5/7/05
to
I think you will want to disable 8.3 atuo naming on NTFS if you do this.
8.3 autonaming can consume some perf on you that you can reclaim.

d

--
Please do not send e-mail directly to this alias. this alias is for
newsgroup purposes only.
This posting is provided "AS IS" with no warranties, and confers no rights.


"John Dove" <anon...@discussions.microsoft.com> wrote in message

news:109601c5521f$32340160$a601...@phx.gbl...

Joe Richards [MVP]

unread,
May 8, 2005, 10:22:56 AM5/8/05
to
Honestly, I think for something like that you may want to consider some sort of
DataBase like say SQL Server. Store the files in rows in the DB in a single OS
file. I don't have perf numbers but my gut says you will see the SQL Server
performs considerably better in that specific layout of lots of small files.

--
Joe Richards Microsoft MVP Windows Server Directory Services
www.joeware.net

Slava M. Usov

unread,
May 9, 2005, 7:09:31 AM5/9/05
to
"Joe Richards [MVP]" <humore...@hotmail.com> wrote in message
news:%23Uflul9...@TK2MSFTNGP09.phx.gbl...

> Honestly, I think for something like that you may want to consider some
> sort of DataBase like say SQL Server. Store the files in rows in the DB in
> a single OS file. I don't have perf numbers but my gut says you will see
> the SQL Server performs considerably better in that specific layout of
> lots of small files.

I do not think so. The performance metrics that I posted here indicate ~3300
inserts/s, ~1000 updates/s and ~1400 deletes/s on an NTFS volume. If I had
run a SQL DB on _that_ same hardware [a vanilla ATA drive on a commodity
PC], I would hardly have obtained more than 2000 transactions/s; most
likely, I would have struggled to do better than 1000 transactions/s.

S


um

unread,
May 12, 2005, 1:41:04 AM5/12/05
to
"Slava M. Usov" wrote
> "John Dove" wrote

>
> > how good is Windows 2000 Server filesystem in handling one
> > million 512-bytes files?
> > Are there any performance numbers?
> >
> About one year ago I ran a very simple test: create N empty files in one
> directory, then open each file and write its filename into the file, then
> delete the files. Results:
>
> files 10000 100000 1000000 2000000
> create, seconds 0 11 270 592
> update, second 1 29 1047 1991
> delete, seconds 1 31 656 1359
>
> It was not Windows 2000 Server though, it was Windows 2003 Server, on a
> plain vanilla ATA disk formatted with NTFS. I was not tracking the disk
> usage, but I seem to remember it was no more than 4G, probably less.

For most practical uses you would need to do the test using random access,
ie. not always accessing the files in the same order. This way under Win2000pro
there's a big difference in the performance numbers.


0 new messages