Agner Fog <
ag...@dtu.dk> schrieb:
> Thomas Koenig wrote:
>>>>What is your strategy for memory-mapped I/O?
>
>>> The user process is calling a system function. The system call
>>> instruction has a feature for sharing a block of memory. The system
>>> function does not have access to everything, as current systems
>>> have, but only to the memory block shared by the application. The
>>> system function may use DMA or whatever the system designer
>
>>So, each shared memory block would then have its own entry in the
>>table for each of the processes, correct?
>
> There is no "the table". Each thread has its own memory map.
> You only add one entry to the map of each thread.
>
>>Do I see it correctly that an application which, for whatever reason,
>>opened 1000 files via memory-mapped I/O at the same time would run
>>out of descriptors?
>
> The shared memory block belongs to the application process. The process needs no
> extra map entry. When you call the system function and enter system mode,
> you switch to a system map with one entry for the memory block shared by the process,
> and one entry for the DMA I/O area.
> You cannot call multiple I/O functions simultaneously unless they are in separate
> threads. Each thread has its own memory map.
>
>>Same thing could apply for shared memory between processes.
> This case is discussed in chapter 9.4 p. 123 in the ForwardCom manual.
> Several possible solutions are discussed there.
I've read this chapter, and I have to say I find it less than
convincing.
Let me comment on a few selected paragraphs.
# The memory space may become fragmented despite the use of these
# techniques. Problems that can result in memory fragmentation are
# listed below
# Recursive functions can use unlimited stack space. We may require
# that the programmer specifies a maximum recursion level.
Programmers are likely to reject that - this feels a bit like the
old days of mainframes.
# Allocation of variable-size arrays on the stack using the alloca
# function in C. We may require that the programmer specifies a
# maximum size.
Already discussed upthread - JCL REGION parameter, here we come.
# Script languages and byte code languages. It is difficult to predict
# the required size of stack and heap when running interpreted or
# emulated code. It is recommended to use a just-in-time compiler
# instead.
[...]
Reads like a recommendation to not use Emacs, Perl or awk, at least
not on anything serious.
# Unpredictable number of threads without protection. The required
# stack size for a thread may be computed in advance,
Are there papers or other resources which show how far this is,
in fact, possible?
# but in some
# cases it may be difficult to predict the number of threads that
# a program will generate.
Probably, yes.
[...]
# Shared memory for inter-process communication. This requires extra
# entries in the memory map as explained below.
And it's not a problem that is solved, I belive.
Skipping down a bit, the manual has
# If one program needs to communicate with a large number of other
# programs then we can use one of these solutions: (1) let the program
# that needs many connections own the shared memory and give each
# of its clients access to one part of it,
This implicitly assumes client-server, and would not, for example,
address a PGAS language. And PGAS is used for high performance
applications.
# (2) run multiple threads in (or multiple instances of) the program
# that needs many connections so that each thread has access to only
# one shared memory block,
That is quite wasteful - create a thread to avoid a memory map entry,
so you have additional task switching overhead.
# (3) let multiple communication channels use the same shared memory
# block or parts of it,
If you have (let's say) a web server application, you will want
protection against buffer overruns as much as possible.
# (4) communicate through function calls
Quite some overhead.
# (5) communicate through network sockets
Even more.
# (6) communicate through files.
... which should not be memory mapped, so again quite a high overhead.
Jumping up again...
# Memory mapping of files and devices. This practice should be
# avoided if possible.
I belive most databases use memory-mapped files, in order to
combine speed and persistence. This would pretty much exclude
that application from your architecture.
More generally, memory-mapped files are very fast in current
architectures. If you penalize software (and programmers) who
have used this for speed, you will lose a large part of them.
More generally, I am (as I wrote previously) concerned about
swapping out whole processes vs. only unused pages, and about
large processes. ps shows me, on my 16GB home system on which
I type this, /usr/lib/xorg/Xorg with a virtual memory size of
24.3 GB and a resident size of 94MB. And chromium has several
theads with a virtual memory size of 1.1TB. Both of this is
no problem if only the active pages are in memory.
Plus, there is the question of what to do with a process whose
virtual memory size exceeds total memory.
You can argue that both programs are incredibly wasteful, but
they work under the current memory management, where yours
would not.
And swapping takes much more time than paging out (or never
paging in) irrelevant pages.
So, if your architecture without TLB is implemented, my current
conculsion is that it would not work well for
- databases
- web servers using lots of shared memory
- computers used interactively (laptops, desktops)
- high-performance computing using shared memory to communicate
(at least for some PGAS variants)
In this sense, I mostly share your conclusion that
[big snip]
> Right now, it looks like a ForwardCom without TLB will be optimal for many applications
> in dedicated systems, while a ForwardCom with TLB and page tables will be optimal for
> the most demanding server applications.
except for a subset of them (databases and web servers might
have to be redesigned at least, and might not perform as well
even if they are).
However, in this way, you would introduce a dependency on
microarchitecture: Code which is assumes pages and heavily uses
memory-mapped I/O and shared memory will preform badly or not at
all on systems lacking paging and a TLB. Would it be wise
to expose that microarchitectural detail to the ISA? Hmm...