Does anybody know, why a memoryaccess to memory allocated with
"CreateFileMapping(...)" and "MapViewOfFile(...)" is extrem slower than to
memory allocated by "new". Must be set any Flag ?
I need the memoryaccess by function "StretchDIBBlt(...)".
For any Answer I will be happy !
Marco Kalweit
If hFile is (HANDLE)0xFFFFFFFF, the <...snip...> function creates a
file-mapping object of the specified size backed by the operating-system
paging file rather than by a named file in the file system.
Thanks,
PeterM
--
Bogus ZZZ added to address. Remove for reply.
Marco Kalweit <marco....@fh-zwickau.de> wrote in message
news:7j05l5$k8$1...@linuxnt.hrz.fh-zwickau.de...
I hate to sound authoritative like this, but you're wrong, Peter. Regardless
of what value you pass down as the handle, the memory will always be in a
file. In case of (HANDLE)-1 it is just the paging file, but *still* is a
file. Similarly, there is no way (for a ring-3 application) to ensure the
memory it uses is actually in RAM. It may be in RAM *now*, then it may be
paged out without you noticing that - and vice versa!
What makes the paged-file backed sections so special, is that when you
reference a new page from an ordinary section (mapped to some file - not the
paging file), the page must first be actually read from the disk to RAM, and
this is slow. When you reference a *new* page from the paging file, the
system does not have to read anything, it just uses one of the pre-allocated
zeroed pages (NT keeps a number of them, depending on actual memory usage),
and this is fast. But once you write anything to the page, they are no more
different - both have to be read from disk, if paged out. This is why using
paged-file backed sections are normally faster - but when the overall memory
usage is high, it makes little difference. Finally, the difference is
minimal if the section is used heavily - I mean if you read/write, then
read/write again, and then one more time and then something, the chances are
the system will keep the pages in RAM, and the things will be fast in either
case; contrarily, if you open/create the section only to read and write it
once, the difference in performance will be huge - but it's a questionable
practice to create sections with such short life time - and if you do,
you're just trading performance for something other.
> first parameter. It explains it in the help file, but it's poorly
written.
> Here's a slightly edited version...
>
> If hFile is (HANDLE)0xFFFFFFFF, the <...snip...> function creates a
> file-mapping object of the specified size backed by the operating-system
> paging file rather than by a named file in the file system.
Absolutely. See that it *never* says anything about RAM?
> Marco Kalweit <marco....@fh-zwickau.de> wrote in message
> news:7j05l5$k8$1...@linuxnt.hrz.fh-zwickau.de...
> > Hello,
> >
> > Does anybody know, why a memoryaccess to memory allocated with
> > "CreateFileMapping(...)" and "MapViewOfFile(...)" is extrem slower than
to
> > memory allocated by "new". Must be set any Flag ?
Because the memory is in a file, as Peter correctly indicated. If you need
to use memory-mapped files (sections) and still need fast access (these may
be somewhat contradictory depending on your use of them), you'll have to
minimize the number of creations/destructions of the sections. Try to reuse
the sections instead of creating new ones. Or, if you're opening the
sections to files that store images that you want to bitblt on the screen
*fast*, you might try to read the contents of the section before calling
bitblt(), like this:
HANDLE section = CreateFileMapping();
void *buffer = MapViewOfFile(section);
DWORD file_size = SIZE_OF_THE_FILE_YOU_GOT_EARLIER;
volatile DWORD bit_bucket;
for( DWORD i = 0; i < file_size; i += 0x1000 )
{
bit_bucket = (volatile DWORD*)buffer + i;
}
The code above touches every page in the section, ensuring that bitblt()
will be reading RAM, not disk.
--
Slava
Please send any replies to this newsgroup.
I 'm sure, that the pages, wich I need are swapped in.
The problem is: The image, i want to Blit is very huge(50 - 150 MB). I
cannot the whole image at one time load in the memory. This is the reason, I
used Filemapped-Memory. I blit normaly only a piece the image to the window.
At first blit, the page where is located the image - piece, must be swapped
in. But on the next blit the page is located in memory, and i think it must
be very fast to blit it. But the blit is very slow(althrough the page is in
RAM).
If I load the piece self in a memory-field (created by *new*) and blit this
to the window it very fast.
I don't understand : both memory zones are RAM but there is a speed -
difference.
I hope you understand me and has an answere to this problem.
Marco
Wow. Thanks for adding to my understanding of memory mapped files. I'm
going to have to re-read your post a couple of times to make sure I fully
understand what's going on. I have been using page file backed memory
mapped files for over a year in a project of mine and now I want to explore
if I am encountering any hidden performance issues. One question - is there
a way to force (or trick or beg or plead...) the OS into keeping shared
memory in memory? The machine I'm running on has 256 MB of RAM.
> I 'm sure, that the pages, wich I need are swapped in.
> The problem is: The image, i want to Blit is very huge(50 - 150 MB). I
> cannot the whole image at one time load in the memory. This is the reason,
I
> used Filemapped-Memory. I blit normaly only a piece the image to the
window.
> At first blit, the page where is located the image - piece, must be
swapped
> in. But on the next blit the page is located in memory, and i think it
must
> be very fast to blit it. But the blit is very slow(althrough the page is
in
> RAM).
When you bitblt for the second time, do you process the same piece of image
as at the first time?
Could you post the piece of your code that does it? I mean the code that
handles the memory and bitblts.
> One question - is there a way to force (or trick or beg or plead...)
> the OS into keeping shared memory in memory?
> The machine I'm running on has 256 MB of RAM.
When you have plenty of memory like this, and don't share gigs of memory,
normally you don't have to do anything to keep the pages in RAM. Just use
the memory.
If you don't use it, the system will notice that some pages just occupy
physical memory
and will page them out. If you want your process lie dormant and still have
the memory in RAM, you might try to increase the minimum working set of the
process. Then, after setting the minimum working set of the process, you may
use VirtualLock() to lock the pages (just setting the min working set does
not guarantee it will always be present). Beware, though, that
SetProcessWorkingSetSize() requires a privilege that is given, normally,
only to administrators, and setting the working set size unreasonably high
and *then* locking pages may bring the system to a crawl. Generally, when
the system has enough memory, it keeps most of pages in RAM, and does not
try to trim the working sets to the bare minimum (it does not even try to
trim to the maximum in most cases!). But as the free RAM goes down, the
system gradually becomes less and less liberal.
Good to hear. My shared memory alocation is only about 2.3 MB, so it
doesn't sound like I'll be getting into any real trouble. Thanks again for
the info!
Thanks,
PeterM
--
Bogus ZZZ added to address. Remove for reply.
Slava M. Usov <stripit...@usa.net> wrote in message
news:#HiTIGTr#GA.70@cppssbbsa03...
here is the relevant Code to show a piece of image:
void CTarga::Create(const char* szFileName)
{
...
m_hFile = ::CreateFile(m_szFileName,GENERIC_READ,FILE_SHARE_READ |
FILE_SHARE_WRITE, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL /*| FILE_FLAG_RANDOM_ACCESS */,
NULL);
...
m_hFileMap = CreateFileMapping(m_hFile, NULL, PAGE_READONLY | SEC_RESERVE,
0, 0, NULL);
...
m_pTarga = MapViewOfFile(m_hFileMap, FILE_MAP_READ, 0, 0, 0);
...
}
void CTarga::Draw(HDC hDC,int nXDest,int nYDest,WORD wDestWidth,WORD
wDestHeight,
WORD wXSrc, WORD wYSrc,WORD wSrcWidth, WORD wSrcHeight)
{
...
/* I calculate the Startadress of the piece, wich I want show, in the
datafield of whole image */
/*Basisadresse*/
pBits = (LPBYTE)m_pTarga + sizeof(m_TargaHeader) +
m_TargaHeader.bLengthImageIdentifikation +
m_TargaHeader.bColorMapEntrySize / 8 * m_TargaHeader.wColorMapLength;
//Offset of the StartLine
lLetzteZeile = m_TargaHeader.wHight - wYSrc - wSrcHeight;
lLetzteZeile = (lLetzteZeile < 0 ) ? 0 : lLetzteZeile ;
pBits += (lLetzteZeile) * m_TargaHeader.wWidth *
m_TargaHeader.bColordepth/8;
//BitmapInfostruktur (readapt Height )
m_pBitmapInfo->bmiHeader.biHeight = (wSrcHeight > m_TargaHeader.wHight)
? m_TargaHeader.wHight
: wSrcHeight;
/*this function lasts ca. 400 ms this is to long */
nError = StretchDIBits(hDC, nXDest, nYDest, wDestWidth, wDestHeight, wXSrc,
0, wSrcWidth, wSrcHeight, pBits, m_pBitmapInfo, DIB_RGB_COLORS, SRCCOPY);
...
}
the Function *Draw* blits into a memoryDC.
If I must show the same piece at once I Blit only the memoryDC into the
ClientDC. (this takes only 17 ms). I know that the memoryDC is a compatible
DC to ClientDC and this is faster than a DIB-Blit, but if I load the piece
of image in memory created by *new* an call StretchDIBits(..) with use of
this memory it run very fast.
I have also tried to Blit from FileMapped - Memory direct to ClientDC, it is
not faster than into a memoryDC.
I have created a version, wich load a image-window into RAM, wich is bigger,
than the imagepiece i want to show. If the user scrolles out of the range of
the image-window, wich is situated in RAM, it will be loaded a new
imagewindow. If the clientwindow is inside of the memory-image-window it
works very fast(no smoothscrolling, but really fast). The problem is, if the
memory-window is to Big for RAM then will piece of memory - window loaded
from harddisk and then swapped out to pagefile.
At Using Filemapped memory, it must be faster, because the Memory has
already a place at pagefile and must not swapped. This spares time.
ok, I hope you understand my minds
thanks, Marco
And I come to the conclusion, there must be a Difference between Memory
allocated by *new* and memory allocated by *MapViewOfFile*, under the
acceptance that the page of Filemapped-Memory are swapped in.
[code snipped - nothing suspicious detected]
> I have created a version, wich load a image-window into RAM, wich is
bigger,
> than the imagepiece i want to show. If the user scrolles out of the range
of
> the image-window, wich is situated in RAM, it will be loaded a new
> imagewindow. If the clientwindow is inside of the memory-image-window it
> works very fast(no smoothscrolling, but really fast). The problem is, if
the
> memory-window is to Big for RAM then will piece of memory - window loaded
> from harddisk and then swapped out to pagefile.
>
> At Using Filemapped memory, it must be faster, because the Memory has
> already a place at pagefile and must not swapped. This spares time.
So you were telling that after you called StretchDIBits() to render one
image area, and then to render *another* image area, it was slow again? But
this is as it should be. When you touched the memory during the first run,
only pages corresponding to the first area were paged in; when you
referenced the memory during the second run, the system had to page in
*different* pages. The only case when the second run will go fast is when
you're rendering *the same* area of your image, because all the required
pages are already in RAM.
And, Marco, you're a bit confused: when you use file-mapped memory, this
does not mean "the Memory has already a place at pagefile and must not
swapped", no. When you create a file mapping and map a view of it, this does
not fetch all pages from disk to RAM. The pages are only fetched when you
actually access them - i.e., when you try to access page number n and it's
not in memory, a page fault occurs (much like the "Access Violation"
exception), and the system tries to bring the page into RAM, by paging it in
from the corresponding file. When you have fetched too many pages from your
file, the system cannot fetch them anymore (it has no RAM), so it has to
discard one or more pages you fetched before, just like it does with pages
that you allocate with "new".
> And I come to the conclusion, there must be a Difference between Memory
> allocated by *new* and memory allocated by *MapViewOfFile*, under the
> acceptance that the page of Filemapped-Memory are swapped in.
When you do it the "new way", you have a buffer in memory (several
megabytes, right?) that you read from the file at once, and then call
StretchDIBits(). The operation of reading one large block of data from the
file is very effective in terms of time (it is only *one* request to the
system). Contrarily, when you use the memory mapping, you don't read the
file, you just call StretchDIBits() - but the system reads the file for you,
this time in a much less effective manner, one page at a time.
If you're after a "smoothscroll" solution, I suggest you to stop using
memory-mapped files, and use just <read to buffer> - <bitblt buffer>
approach. You may then create the buffer 3 times larger than the visible
image size, like this:
1/3 of the buffer: the part of image that is just *before* the visible part
1/3 of the buffer: the visible part of image
1/3 of the buffer: the part of the image that is just "after" the visible
part
When the user scrolls the image, you just bitblt the prev/next parts of the
buffer, while reading in (in another thread or using async IO) the next
parts of the image. Of course, this is only an idea, not a complete working
solution...
>thanks for your detailed answers.
>But I had specified the problem better.
>It is clear, that if the page is swapes out, it takes a lot of time to load
>the page from harddisk. This is Ok.
>
>I 'm sure, that the pages, wich I need are swapped in.
>The problem is: The image, i want to Blit is very huge(50 - 150 MB). I
>cannot the whole image at one time load in the memory. This is the reason, I
>used Filemapped-Memory. I blit normaly only a piece the image to the window.
>At first blit, the page where is located the image - piece, must be swapped
>in. But on the next blit the page is located in memory, and i think it must
>be very fast to blit it. But the blit is very slow(althrough the page is in
>RAM).
It is true that the first read of a virtual page will cause the memory
to swap in. But the virtual page swapped in has a special status after
that first swap-in. Pages in that state can be swapped out
immediately. In your case, the blitting process is going through the
virtual pages sequentially. The OS probably swaps out the first pages
of your image just to swap in later pages. If you were blitting from a
small rectangle of that image, repeatedly, it's possible that the
virtual pages for that rectangle would become established enough,
after several accesses, that the blit would be faster.
Are you using the file-map-oriented features of CreateDIBSection?
This web page has some interesting remarks, under "Data memory
paging":
http://msdn.microsoft.com/library/techart/msdn_gameperf.htm
--
Mike Enright
mi...@cetasoft.com
(Not speaking for Cetasoft)
thanks for your good hints and help. I will optimize the "new" - Way.
Probably memory-mapped files ara not qualified for bliting a big image.
the MemoryBuffer will be need approx 20 - 30 MB. I think, that will be work.
ciao Marco
In a test I ran recently to try to find what was causing committed bytes to
rise, I did not find this to be the case. Physical memory usage started out
high, as one would expect for starting up a number of processes
(initialization code which can be discarded after it executes). Then, over
the course of 5 days, physical memory used decreased the whole time. On the
test machine (128 meg), at the end of the test, Task Manager showed that 80
megs or so of the RAM was free. This would seem to contradict your statement
that the system will keep pages in memory if there is space for it.
Somewhere in the depths of MSDN, there was a statement that NT tries to keep
half the RAM for file cache, yet Task Manager never showed it over 10 megs.
Any explanation? Or reference (I really could use it, as I'm finding it
necessary to know these things)?
Also, any known reason why paged pool allocs/memory would rise without
seeming to charge it to a specific process (that's what the above test
showed as the problem)?
Neil Gilmore
ra...@raito.com
... my statement. I'm not the one who coded it anyway. What I said was my
belief that had been backed up by my personal experience. Incidentally, my
experience agrees with what they say in the documentation. However, I do
understand that the *actual* behavior may be more complex than that.
Keep in mind that the defaults for the minimum working set size and the
maximum working set are 200K and 1380K (NT 4.0, SP 5). You say that about 48
megs were used... let's see. NT 4.0 needs 12 megs just to run the kernel.
Leaves 36 megs for processes. As I'm writing that, my machine is running 29
processes, which is I guess a bit higher than typical...
A typical scenario would be (mem usage from my machine though):
system 200K
session manager (SMSS.EXE) 0K (hmm, all paged out)
win32 subsystem (CSRSS.EXE) 1212K
winlogon 44K
standard services (SERVICES.EXE) 3240K
security subsystem (LSASS.EXE) 1036K
spooler (SPOOLSS.EXE) 168K
license manager (LLSSRV.EXE) 584K
RPC subsystem (RPCSS.EXE) 1328K
explorer 4132K
and probably a couple more task-specific processes. The "mandatory"
processes take only 11944K ~ 11.7M ~ 12M
Oh, yeah, file cache is 21 megs. As you say below, more than 10 megs were
never used, so we're subtracting only 10M.
Thus, we have 14 megs for your process. Isn't it enough for it? It's about
ten times higher than its max working set. This stands in firm agreement
with "when the system has enough memory, it keeps most of pages in RAM, and
does not try to trim the working sets to the bare minimum".
Well, these are all just artificial arithmetic that is most likely
irrelevant for your situation. Give us more facts - which processes are
running, how much memory is used for each, etc.
Also notice that some executables may be marked as "trim my working set
aggressively", so they're always kept at a bare minimum. If your processes
reference lots of pages *often* and the machine has lots of free RAM, and
the processes are not being trimmed aggressively, their working sets are in
RAM.
> Somewhere in the depths of MSDN, there was a statement that NT tries to
keep
> half the RAM for file cache, yet Task Manager never showed it over 10
megs.
Well, what the task manager says to me is 21 megs file cache. Probably, you
didn't ever reference that much files to be kept in the cache.
> Any explanation? Or reference (I really could use it, as I'm finding it
> necessary to know these things)?
Sorry, this kind of information is not easy to find in one place. Pieces of
it are scattered are all over the MSDN, articles, etc. One good starting
point is "Inside Windows NT", either edition.
In MSDN, read intro articales on virtual memory management, and on
SetProcessWorkingSetSize().
>
> Also, any known reason why paged pool allocs/memory would rise without
> seeming to charge it to a specific process (that's what the above test
> showed as the problem)?
This could be a driver issue. You stated that the RAM usage was continually
decreasing, yet the allocs count was climbing. This indicates that the
memory allocated was then returned to the pool, then allocated again. What's
wrong with it?
OK, I understand the arithmetic (and the useful example). I know that my
apps, once up and running, don't use a lot of their code, and that it can be
paged out. I guess the actual working set sizes part is the part causing my
confusion. I can see where the system would keep the working set high of
there was RAM available.
>Also notice that some executables may be marked as "trim my working set
>aggressively", so they're always kept at a bare minimum. If your processes
>reference lots of pages *often* and the machine has lots of free RAM, and
>the processes are not being trimmed aggressively, their working sets are in
>RAM.
Unknown. It's set to whatever Delphi sets it to.
>> Somewhere in the depths of MSDN, there was a statement that NT tries to
>keep
>> half the RAM for file cache, yet Task Manager never showed it over 10
>megs.
>
>Well, what the task manager says to me is 21 megs file cache. Probably, you
>didn't ever reference that much files to be kept in the cache.
Well, on my own machine it shows 15. The test machine never went above 10.
>> Any explanation? Or reference (I really could use it, as I'm finding it
>> necessary to know these things)?
>
>Sorry, this kind of information is not easy to find in one place. Pieces of
>it are scattered are all over the MSDN, articles, etc. One good starting
>point is "Inside Windows NT", either edition.
So it's the same situation as most of the docs (I would insert a smiley, but
I don't often smile when dealing with MSDN).
>> Also, any known reason why paged pool allocs/memory would rise without
>> seeming to charge it to a specific process (that's what the above test
>> showed as the problem)?
>
>This could be a driver issue. You stated that the RAM usage was continually
>decreasing, yet the allocs count was climbing. This indicates that the
>memory allocated was then returned to the pool, then allocated again.
What's
>wrong with it?
Well, pool memory usage is also rising, so I don't think it's being returned
to the pool, but we're already discussing that over in the kernel group. So
is the allocs per second, or what? If it's a total, then I'd expect it to
rise. If it's not, then there's definitely a problem.
Neil Gilmore
ra...@raito.com
JD
Neil Gilmore wrote in message ...
[snip]
> FYI - there was a bug fixed in NT4 SP4 that caused non-paged pool leaks if
> you had file system filter drivers installed. Since most virus scanners are
> one of these, this may be your problem if you're pre-SP4.
>
Well, John, we are SP3, and the test machine is running Norton's anti-virus.
However, the problems I see are in paged pool, not the non-paged pool. I'd like
to move to a more recent SP, if I can coerce the third-party stuff into running
(which may be possible, but not immediately obvious).
Neil Gilmore
ra...@raito.com
...
> >Also notice that some executables may be marked as "trim my working set
> >aggressively", so they're always kept at a bare minimum. If your
processes
> >reference lots of pages *often* and the machine has lots of free RAM, and
> >the processes are not being trimmed aggressively, their working sets are
in
> >RAM.
>
>
> Unknown. It's set to whatever Delphi sets it to.
I don't know if the Delphi environment includes the standard Win32 tools,
but, anyway, you can get them for free from the MSDN site. You don't have to
download the entire MSDN; get only the tools. Its linker, LINK.EXE, can
decode binaries. Use
link -dump /headers <executable file>
For example, the output for the SMSS.EXE:
D:\WINNT\system32>link -dump /headers smss.exe
Microsoft (R) COFF Binary File Dumper Version 5.10.7303
Copyright (C) Microsoft Corp 1992-1997. All rights reserved.
Dump of file smss.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (i386)
5 number of sections
36D5E9BA time date stamp Fri Feb 26 03:24:26 1999
0 file pointer to symbol table
27B number of symbols
E0 size of optional header
312 characteristics
Executable
Aggressively trim working set
32 bit word machine
Debug information stripped
....
There are lots of data following that, but we're only interested in the
above. As you can see, the SMSS.EXE is linked to with the "Aggressively trim
working set" option set. This option, by the way, is set in all of the
"mandatory" processes (SMSS, CSRSS, winlogon, etc); the only exception is
the LLSSRV.EXE (license service).
Slava M. Usov wrote in message ...
>I don't know if the Delphi environment includes the standard Win32 tools,
>but, anyway, you can get them for free from the MSDN site. You don't have
to
>download the entire MSDN; get only the tools. Its linker, LINK.EXE, can
>decode binaries. Use
Thanks for the info. We do have the tools, so I used them. It doens;t show
aggressive trimming for the apps. It shows (somebody out there may care what
Delphi puts in): Executable, Line numbers stripped, Symbols stripped, Bytes
reversed, 32 bit word machine. I'm pretty sure I just misunderstood what
you were saying the first time about the system keeping pages in RAM.
Neil Gilmore
ra...@raito.com
Doesn't Delphi have a dialog (perhaps somewhere deep) that lets one specify
these options? Aggressive memory trimming may be good (or may be not),
especially for services.
It may have them somewhere, but not under the project options, which seems
to be the most likely place for it, as that's where the compiler and linker
settings seem to be. I rather doubt that I'd really want to write a service
in Delphi. It seems to be pretty set on doing things its own way. I'm really
more comfortable with C++, which is what all the next stuff is being
developed in.
Neil Gilmore
ra...@raito.com