I am definitely planning to split the images into directories by size
and that will at least divide the number by a factor of the various
sizes (but on the higher end this could still be between 150 - 175
thousand images which is still a pretty big number. I don't know if
this will be a problem or not or there is really anything to worry
about at all - but it is better to obtain advice from those that have
been there, done that - or are at least a bit more familiar with
pushing limits on Unix resources than to wonder whether it will work.
My experience suggests that 'ls' has a lousy sort routine or
that it takes a long time to get the metadata.
When I've had to deal with a huge number of files in a directory
I can get the list very quickly in Python using os.listdir
even though ls is slow. If you're in that situation again, see
if the '-f' for unsorted flag makes a difference or use '-1'
to see if it's all the stat calls.
It all depends on the file system you are using, and somewhat on the
operations you are typically performing. I assume this is ufs/ffs, so
the directory is a linear list of all files.
This causes some performance concerns for accessing: if you want to
access an individual file, you need to scan the entire directory. The
size of a directory entry depends on the length of a name. Assuming
file names of 10 characters, in which case each entry is 20 bytes, a
directory with 500,000 images file names requires 10MB on disk. So
each directory lookup would potentially require to read 10MB from
disk, which might be noticable. For 6,000 entries, the directory
size is 120kB, which might not be noticable.
In 4.4+, there is a kernel compile time option UFS_DIRHASH,
which causes creation of an in-memory hashtable for directories,
speeding up lookups significantly. This requires, of course, enough
main memory to actually keep the hashtable.
FreeDB (CD database) stores one file per CD in one directory per
category. The "misc" category/directory on my FreeBSD 5.3 system
currently contains around 481,571 small files. The "rock"
directory/category contains 449,208 files.
As some have said, ls is *very* slow on these directories, but otherwise
there don't seem to be any problems.
> FreeDB (CD database) stores one file per CD in one directory per
> category. The "misc" category/directory on my FreeBSD 5.3 system
> currently contains around 481,571 small files. The "rock"
> directory/category contains 449,208 files.
> As some have said, ls is *very* slow on these directories, but otherwise
> there don't seem to be any problems.
I assume you're all using Linux. The GNU version of ls does two things
that slow it down. The System V and BSD versions were pretty much
identical, in that they processed the argv array in whatever order the
shell passed it in. The GNU version re-orders the argv array and stuffs
all the arguments into a queue. No big deal if you're just doing ls,
but for ls <multiple directory names> it can slow it down for large
argv[n] and/or recursive/deep ls.
The other thing it does different from SysV/BSD ls is that it provides
for default options in an environment variable. If those env settings
specify to always use color, that will slow directory processing _way_
down, identically to the -F option. That's because the color and -F
options _require_ a stat() on each and every file in the directory.
Standard ls with no options (or old SysV/BSD ls that came with no
options) works nearly as fast as os.listdir() in Python, because it
doesn't require a stat().
The only thing faster, from a shell user's viewpoint, is 'echo *'. That
may not be much help;-)
Ivan Van Laningham
God N Locomotive Works
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
I would expect the various GUI file managers may give unpredictable
results; I would also not rely on remotely mounting the bigdir