ext2 performance with large directories

jtnews

unread,

Jun 1, 2001, 10:03:41 AM6/1/01

to

Anyone know how ext2 performance degrades
with large directories?

I'm developing a software program
which stores 100,000 or more files in
one single directory.

Is directory entry lookup based on some
hashing type of scheme? Or is it a linear
lookup?

el...@no.spam

unread,

Jun 1, 2001, 12:50:43 PM6/1/01

to

In article <3B17A0BA...@optonline.net>,
jtnews <jtn...@optonline.net> wrote:

It's a linear lookup, i.e. it gets very slow.

--
http://www.spinics.net/linux/

jurriaan kalkman

unread,

Jun 1, 2001, 2:41:16 PM6/1/01

to

There are patches floating around the linux-kernel mailinglist to deal
with this problem, but the general consensus seems to be such programs
are shitty and should be avoided. Reiserfs may do better, BTW.

Good luck,
Jurriaan
--
I that case, I shall prepare my Turnip Surprise.
And the surprise is?
There's nothing else in it except turnip.
Baldrick on Haute Cuisine
GNU/Linux 2.4.5-ac4 SMP/ReiserFS 2x1402 bogomips load av: 0.01 0.01 0.00

Patrick Draper/Austin/Sector 7 USA, Inc.

unread,

Jun 1, 2001, 3:02:06 PM6/1/01

to

A typical solution is to break up the directory into a large tree. The
directories are named according to the files contained inside them.

example:

all files wil names starting with 'a' go into /a. Same with other
letters. If that doesn't break it up enough, start going with two
letters, or three letters, in a tree arrangement.

/a
/a/aa
/a/ab
/a/ac
/a/ad
/b
/b/ba
/b/bb

and so on. The reason for the tree is to make it so that no directory
has too many files. Your program would then look at the filename to get
the right path to the file it's looking for. A good way to do it is to
make a function that given a filename, returns the path that you would
expect to find the file in.

To see another example of this, take a look at your terminfo database
which on my Debian system is in /usr/share/terminfo. I have 2139 files
in that heirarchy, which was enough to warrant splitting into the tree.
You should definitely do this if you have 100,000 files.

Anonymous

unread,

Jun 1, 2001, 11:08:57 PM6/1/01

to

el...@no.spam wrote:

> It's a linear lookup, i.e. it gets very slow.

One thing I noticed is that even for as many
as 60,000 directory entries, the performance
isn't all that bad.

I wonder why?

--------== Posted Anonymously via Newsfeeds.Com ==-------
Featuring the worlds only Anonymous Usenet Server
-----------== http://www.newsfeeds.com ==----------

Linus Torvalds

unread,

Jun 4, 2001, 1:28:58 AM6/4/01

to

In article <3B1858C9...@optonline.net>,

Anonymous <anon...@anonymous.anonymous> wrote:
>el...@no.spam wrote:
>
>> It's a linear lookup, i.e. it gets very slow.
>
>One thing I noticed is that even for as many
>as 60,000 directory entries, the performance
>isn't all that bad.
>
>I wonder why?

Depending on your access patterns, the directory cache will kick in, and
do most of the real work.

And the dcache uses a pretty efficient hashing mechanism, regardless of
what the underlying filesystem is doing.

But you should realize that the dcache is nothing but a cache, and while
very good for most normal loads you can still get into nasty performance
behaviour by having the "wrong" access patterns.

Linus

el...@no.spam

unread,

Jun 4, 2001, 9:58:14 PM6/4/01

to

In article <9ff6aq$2l3$1...@cesium.transmeta.com>,
Linus Torvalds <torv...@cesium.transmeta.com> wrote:

>Depending on your access patterns, the directory cache will kick in, and
>do most of the real work.
>
>And the dcache uses a pretty efficient hashing mechanism, regardless of
>what the underlying filesystem is doing.
>
>But you should realize that the dcache is nothing but a cache, and while
>very good for most normal loads you can still get into nasty performance
>behaviour by having the "wrong" access patterns.

IIRC inn was great at having the "wrong" patterns. But storing news as
one file per article is probably one of the worst things you can do.

cLIeNUX user

unread,

Jun 5, 2001, 12:03:23 AM6/5/01

to

humb...@smart.net

How was Japan?

Rick Hohensee
301-595-4063