Micky wrote:
>
> I'm in over my head here but doesn't more efficiency in memory
> management equate to better use of the swapfile? For example, iiuc
> now most programs read ahead, so that while you're reading page 21 to
> 25 of the file, the program foresees that you will soon need page 26
> to 30, or maybe even 31 to 35, and in background it gets it from the
> swapfile to RAM so that it's ready when you get there, and even if you
> page ahead.
Then you haven't seen just how ridiculous this is getting :-)
On the Win10 machine, I have a copy of Macrium, It has
a conversion routine, to convert a .mrimg backup file,
to a .vhd file.
I open Task Manager and watch.
The operation starts with a lot of prefetch. Around
5GB of memory is actually booked by the operation.
The destination drive is slower than the source drive.
The output file will be about the same size (the .vhd
is about the same size as the .mrimg, if the .mrimg
wasn't compressed).
The booked memory continues to increase.
Soon, 10GB of memory is used, some for read prefetch,
some for write cache.
Eventually, the program is done. I click quit.
I look over at the hard drive status LED. It's
still lit and going full tilt. The Task Manager
memory thing indicates that 5GB of memory is
"draining" and that is what keeps the hard drive
light running. When this caching mechanism gets near
the end, it stops running the disk full tilt. It
"burps" out smaller write operations in pulses.
The write activity at the end, is a declining
write curve.
Eventually, the disk drive settles down, and it
looks like the caching mechanism is now drained.
So if a person was measuring the "time to complete"
the operation, it would be from clicking the button
to start the operation in Macrium, until the last
"write burp" to the drive.
Well, how much does that gain us ?
The operation cannot go faster than the destination
drive is willing to go (in this case). At some point,
either the source disk or the destination disk is
an issue.
*******
When the first desktop computers existed, there
wasn't any overlapping I/O. Certainly, on a dual
floppy drive machine, you could blame having only
one floppy controller and two drives on the cable for
it. But the software was also blocking the operations,
and only allowing one outstanding operation at a time.
+----------+ +----------+
| Read #1 | | Read #2 |
+----------+-----------+----------+-----------+
| Write #1 | | Write #2 |
+-----------+ +-----------+
Later, OSes like Windows acquired non-blocking
operations, intended to support overlapped I/O.
It was up to the application to make the right calls,
so many programs continued to do it the old way.
The first program I saw here, to do overlapped
I/O, was Robocopy.
x <------ Program running -------> X
+----------+-----------+
| Read #1 | Read #2 |
+----------+-----------+-----------+
| Write #1 | Write #2 |
+-----------+-----------+
With the large prefetch and large write buffer case
I've seen just recently, I'm not going to try to do
an ASCII Art diagram of that, but in essence, the
difference is like this. The program running portion
can appear shorter, but some hardware is still huffing
and puffing after the fact.
x <- Program running -> X
+----------+-----------+
| Read #1 | Read #2 |
+----------+-----------+-----------+
| Write #1 | Write #2 |
+-----------+-----------+
X <-Cache-> X
Drains
I'm having trouble seeing whether this new behavior
is a big win or not. This could be due to the
application using MapViewOfFile(), but I can't really
be sure of that.
So yes, there are instances of prefetch going on.
Even Explorer in Win10 attempts prefetch, as it
affects the appearance of the progress graph during
a file copy.
There are, in fact, a couple of RAM buffering options.
If you read a file, the contents are left in memory.
md5sum file.txt
md5sum file.txt
On the first run, the command gobbles data at 100MB/sec.
It is limited by the disk drive.
On the second run, it gobbles data at 300MB/sec. Why ?
The system file cache (which can use all unallocated
memory), holds a copy of the file. As long as the
file system is convinced the cached copy is the latest,
and nothing has purged the system file cache, you see
a performance speedup. The king at this, was Win2K, where
the system file cache was every bit as good as the
competing ones (SunOS or Solaris may have had this
well before any desktop OS, MacOSX has a good system
file cache too). The modern Windows ones, find more
excuses not to use it. It's still there though. For
example, if you defragment, the defragmenter will not
refer to any files contained in the system file
cache. It does read_uncached() instead, for "safety".
This is separate from the MapViewOfFile or similar concept.
The memory in that case is "charged to the system"
and you can see the activity in Task Manager. Whereas
the system file cache, there isn't a visual representation
for it. So in fact, some MapViewOfFile activity, as
it acquires RAM and is charged for it, that could be
purging a portion of the system file cache.
In short, there is lots going on behind the scenes.
More than I can keep track of. And some of it
is downright silly. It distorts progress bars (when
a file copy pre-fetches part of the copy from the
system file cache) and also makes dangerous situations
(from the user perspective), when 5GB of write cache memory
drains to disk and takes a whole minute to do it.
Any buffering on writes, should be short enough
so the bad battery on my UPS isn't an issue on
a power failure (power drops, before the 5GB of writes
are done).
Most of the time, on a modern OS, when I look
at the file transfer graph, my mouth is open
and I have that "WTF" look on my face. Because
the numbers in the graph are nonsense, and the
usage of RAM for stuff is a root cause. But many
times, things can't go any faster than the slowest
hard drive, so it's all a merry joke.
And if you ever see a drive deliver only half
of what you were expecting, check the "alignment".
I had a 4K sector hard drive, where I had to
realign it, to get the damn thing to run at
the proper speed.
Paul