On 10/18/2021 9:25 AM, Dimiter_Popoff wrote:
> The devil is not that black (literally translating a Bulgarian saying)
> as you see. Worst fit allocation is of course crucial to get to
> such figures, the mainstream OS-s don't do it and things there
> must be much worse.
I think a lot depends on the amount of "churn" the filesystem
experiences, in normal operation. E.g., the "system" disk on
the workstation I'm using today has about 800G in use of 1T total.
But, the vast majority of it is immutable -- binaries, libraries,
etc. So, there's very low fragmentation (because I "build" the
disk in one shot, instead of incrementally revising and
"updating" its contents)
By contrast, the other disks in the machine all see a fair bit of
turnover as things get created, revised and deleted.
>>> Even on popular filesystems which have long forgotten how to do worst
>>> fit allocation and have to defragment their disks not so infrequently.
>>> But I think they have to access at least 3 locations to get to a file;
>>> the directory entry, some kind of FAT-like thing, then the file.
>>> Unlike dps, where 2 accesses are enough. And of course dps does
>>> worst fit allocation so defragmentating is just unnecessary.
>>
>> I think directories are cached. And, possibly entire drive structures
>> (depending on how much physical RAM you have available).
>
> Well of course they must be caching them, especially since there are
> gigabytes of RAM available.
Yes, but *which*? And how many (how "much")? I would assume memory
set aside for caching files and file system structures is dynamically
managed -- if a process looks at lots of directories but few files
vs. a process that looks at few directories but many files...
> I know what dps does: it caches longnamed
> directories which coexist with the old 8.4 ones in the same filesystem
> and work faster than the 8.4 ones which typically don't get cached
> (these were done to work well even on floppies, a directory entry
> update writes back only the sector(s , if crossing) it occupies
> etc. Then in dps the CAT (cluster allocation tables) are cached all
> the time (do that for a 500G partition and enjoy reading all the
> 4 megabytes each time the CAT is needed to allocate new space...
> it can be done, in fact the caches are enabled upon boot explicitly
> on a per LUN/partition basis).
>
Yes, support for "bad block management" can be done outside the
drive. In the sanitizer's case, it has to report on whether or not
it was able to successfully "scrub" every PHYSICAL sector that
might contain user data (for some "fussy" users).
So, if it appears that a sector may have been remapped (visible
by a drop in instantaneous access rate), I query the drive's
bad sector statistics to see if I should just abort the process
now and mark the drive to be (physically) shredded -- *if*
it belongs to a "fussy" user. Regardless (for fussy users),
I will query those stats at the end of the operation to see
if they have changed during the process.
But, again, that's a very specific application with different
prospects for optimization. E.g., there's no file system
support required as the disk is just a bunch of data blocks
(sectors) having no particular structure nor meaning. (So,
no need for filesystem code at all! One can scrub a SPARC
disk just as easily as a Mac!)
>> I'll mock up some synthetic loads and try various thread-spawning
>> strategies to see the sorts of performance I *might* be able
>> to get -- with different "preexisting" media (to minimize my
>> impact on that).
>>
>> I'm sure I can round up a dozen or more platforms to try -- just
>> from stuff I have lying around here! :>
>
> I think this will give you plenty of an idea how to go about it.
> Once you know the limit you can run at some reasonable figure
> below it and be happy. Getting more precise figures about all
> that is neither easy nor will it buy you anything.
I suspect "1" is going to end up as the "best compromise". So,
I'm treating this as an exercise in *validating* that assumption.
I'll see if I can salvage some of the performance monitoring code
from the sanitizer to give me details from which I might be able
to ferret out "opportunities". If I start by restricting my
observations to non-destructive synthetic loads, then I can
pull a drive and see how it fares in a different host while
running the same code, etc.