Embedded linux: With or without MMU

MMJ

unread,

Feb 5, 2008, 10:27:24 AM2/5/08

to

Hi all,

In extend of my thread "Embedded Linux Vs. Real time Linux" I have another
question regarding embedded systems based on Linux.

Is it possible to run an ordinary linux on an 32bit architecture that does
not include MMU hardware? In that case what is the Idea of ucLinux if you
can use and ordinary distro (if ported offcourse)? Also if it's possible
what is required in term of kernel setup, in my head there most be allot of
kernel code that is irrelevant because of the lacking MMU.

I would like a CPU to include MMU for the reasson of avoiding tricky memory
violation bugs and problems with heap fragmentation. Also it gives me the
perfect separation between the different threads of execution in my
application(s) but also between the high level of my application code and
the low level kernel code (drivers and such).....any other reasson to want a
MMU included ?

Is the performance loss by using a MMU only dependant on the hardware
architecture of the MMU or is it also software (linux kernel) dependant. Do
you have any idea of the performance loss when using the MMU hardware?

Again sorry for the noob questions.....Brave new world ;)

Best Regards

MMJ

Michael Schnell

unread,

Feb 5, 2008, 12:30:55 PM2/5/08

to MMJ

MMJ wrote:
> Hi all,
>
> In extend of my thread "Embedded Linux Vs. Real time Linux" I have another
> question regarding embedded systems based on Linux.
>
> Is it possible to run an ordinary linux on an 32bit architecture that does
> not include MMU hardware? In that case what is the Idea of ucLinux if you
> can use and ordinary distro (if ported offcourse)? Also if it's possible
> what is required in term of kernel setup, in my head there most be allot of
> kernel code that is irrelevant because of the lacking MMU.

No that is why it's called "full" Linux (if that is what you mean by
"ordinary").

But the "official" kernel distribution does support MMU-less CPU by
selecting the appropriate architecture. Some architectures always come
without an MMU and for some (such as ARM) an MMU-version and an MMU-less
version is supplied (selectable in the Kernel configuration). The
MMU-less Linux version is called 湣Linux.

>
> I would like a CPU to include MMU for the reasson of avoiding tricky memory
> violation bugs and problems with heap fragmentation. Also it gives me the
> perfect separation between the different threads of execution in my
> application(s) but also between the high level of my application code and
> the low level kernel code (drivers and such).....any other reasson to want a
> MMU included ?

Debugging might be better, as a user process going wild can't destroy
the Kernel.

If you have a safety-critical application an MMU provides a better
fall-back behavior of the device in case part of the user software fails.

>
> Is the performance loss by using a MMU only dependant on the hardware
> architecture of the MMU or is it also software (linux kernel) dependant.

The MMU itself slows down the CPU (fast cheap CPUs like Blackfin don't
have MMUs with virtual CPUs in FPGAs often MMUs are avoided due to
performance considerations). Moreover the Kernel needs to deal with
programming the MMU and with ill-designed hardware (such as ARM) the MMU
is "behind" the cache (viewed from the CPU) and thus the cache needs to
be cleared whenever the MMU content gets changed (i.e. with any task
switch). (The x368 CPUs are fine on that behalf).

> Do
> you have any idea of the performance loss when using the MMU hardware?

That depends greatly on the application. If you mainly run a single task
(and you don't have the choice of a faster CPU chip) you will not notice
any slow down, but if you do many task switches the MMU might provide a
problem.

-Michael

MMJ

unread,

Feb 6, 2008, 2:29:19 AM2/6/08

to

> No that is why it's called "full" Linux (if that is what you mean by
> "ordinary").
>
> But the "official" kernel distribution does support MMU-less CPU by
> selecting the appropriate architecture. Some architectures always come
> without an MMU and for some (such as ARM) an MMU-version and an MMU-less
> version is supplied (selectable in the Kernel configuration). The MMU-less
> Linux version is called 湣Linux.

Does the lack of the MMU cause many limitations of which systems calls that
is possible to use in user software? ucLinux does as far as I know not
include the full Linux API?

> If you have a safety-critical application an MMU provides a better
> fall-back behavior of the device in case part of the user software fails.

So another point here might be boot-time. If my application should crash the
kernel will keep living and should be able to restart the application
without booting the board?

> The MMU itself slows down the CPU (fast cheap CPUs like Blackfin don't
> have MMUs with virtual CPUs in FPGAs often MMUs are avoided due to
> performance considerations). Moreover the Kernel needs to deal with
> programming the MMU and with ill-designed hardware (such as ARM) the MMU
> is "behind" the cache (viewed from the CPU) and thus the cache needs to be
> cleared whenever the MMU content gets changed (i.e. with any task switch).
> (The x368 CPUs are fine on that behalf).

Is this "mistake" in design general? How about MIPS and PPC ?

> That depends greatly on the application. If you mainly run a single task
> (and you don't have the choice of a faster CPU chip) you will not notice
> any slow down, but if you do many task switches the MMU might provide a
> problem.

Great info. Why is task switches a problem? Is it because the "cached table
entries" must be refreshed when changeing memory space?

gordy

unread,

Feb 6, 2008, 3:59:27 AM2/6/08

to

On Feb 6, 8:29 pm, "MMJ" <S...@aldrig.com> wrote:
> > No that is why it's called "full" Linux (if that is what you mean by
> > "ordinary").
>
> > But the "official" kernel distribution does support MMU-less CPU by
> > selecting the appropriate architecture. Some architectures always come
> > without an MMU and for some (such as ARM) an MMU-version and an MMU-less
> > version is supplied (selectable in the Kernel configuration). The MMU-less

> > Linux version is called µCLinux.

>
> Does the lack of the MMU cause many limitations of which systems calls that
> is possible to use in user software? ucLinux does as far as I know not
> include the full Linux API?

The biggest problem is that the lack of an MMU means that it is
impossible to support the fork() system call. Not having this system
call means that many Linux applications cannot be supported.

> > If you have a safety-critical application an MMU provides a better
> > fall-back behavior of the device in case part of the user software fails.
>
> So another point here might be boot-time. If my application should crash the
> kernel will keep living and should be able to restart the application
> without booting the board?

No MMU means that your Kernel may be totally trashed or, more likely,
corrupted in some perverse fashion that causes weird stuff to happen.
You best guardian against this is a system watchdog that forces a
microprocessor reset if the watchdog goes off.

> > The MMU itself slows down the CPU (fast cheap CPUs like Blackfin don't
> > have MMUs with virtual CPUs in FPGAs often MMUs are avoided due to
> > performance considerations). Moreover the Kernel needs to deal with
> > programming the MMU and with ill-designed hardware (such as ARM) the MMU
> > is "behind" the cache (viewed from the CPU) and thus the cache needs to be
> > cleared whenever the MMU content gets changed (i.e. with any task switch).
> > (The x368 CPUs are fine on that behalf).
>
> Is this "mistake" in design general? How about MIPS and PPC ?

Every OS that uses an MMU runs more slowly than one that does not. The
degree of slowdown, however, is quite small for general purpose CPUs
(e.g. x86, 68K, MIPS, etc). For DSP chips such slowdowns would be a
killer since running small amounts of code very quickly is what they
are designed to do. DSPs are not designed to run a general purpose OS.

> > That depends greatly on the application. If you mainly run a single task
> > (and you don't have the choice of a faster CPU chip) you will not notice
> > any slow down, but if you do many task switches the MMU might provide a
> > problem.
>
> Great info. Why is task switches a problem? Is it because the "cached table
> entries" must be refreshed when changeing memory space?

In general you response is correct. A task switch will require the MMU
hardware to be reloaded for the new task (i.e. process). However, this
operation is normally highly optimised for each processor architecture
that an OS supports and runs as fast as possible. The scope for
speeding up existing code in this area is very small. You will need to
be a genius from another dimension to get another factor of 2 out of
the existing code for reload an MMU (or TLB).

Of course, under Linux, some task switches are really thread context
switches. The great thing about threads is that all threads share the
same address space. Thus a task switch that is a thread context switch
does not require a MMU reload to occur. They are, therefore, faster.

AZ Nomad

unread,

Feb 6, 2008, 7:52:46 AM2/6/08

to

On Wed, 6 Feb 2008 08:29:19 +0100, MMJ <Sp...@aldrig.com> wrote:
>> No that is why it's called "full" Linux (if that is what you mean by
>> "ordinary").
>>
>> But the "official" kernel distribution does support MMU-less CPU by
>> selecting the appropriate architecture. Some architectures always come
>> without an MMU and for some (such as ARM) an MMU-version and an MMU-less
>> version is supplied (selectable in the Kernel configuration). The MMU-less

>> Linux version is called µCLinux.

>Does the lack of the MMU cause many limitations of which systems calls that
>is possible to use in user software? ucLinux does as far as I know not
>include the full Linux API?

>> If you have a safety-critical application an MMU provides a better
>> fall-back behavior of the device in case part of the user software fails.

>So another point here might be boot-time. If my application should crash the
>kernel will keep living and should be able to restart the application
>without booting the board?

If you have an application crash on a system without an MMU, you probably
don't want to just restart the app. The app may very well have caused
memory corruption to any other process or even the kernel and those
errors might not show for hours or even days.

Michael Schnell

unread,

Feb 6, 2008, 3:36:07 PM2/6/08

to MMJ

>
> Does the lack of the MMU cause many limitations of which systems calls that
> is possible to use in user software? ucLinux does as far as I know not
> include the full Linux API?
>

AFAIK the API calls are quite different with 湣Linux, but you never do
direct Linux API calls. You need to link your application against one of
the 湣-aware libraries (instead of gLibC) and thus you will notice no
difference in "normal" applications. OK there is no "fork" and you need
to use vfork instead that works a bit different, but you supposedly will
not use fork a lot anyway in your own code in an embedded device. If you
want to do multitasking with your supposedly will use the pthread library.

>> If you have a safety-critical application an MMU provides a better
>> fall-back behavior of the device in case part of the user software fails.
>
> So another point here might be boot-time. If my application should crash the
> kernel will keep living and should be able to restart the application
> without booting the board?

You will need a hardware watchdog, of course, to detect crashing.
Supposedly 湣Linux will boot faster than full Linux.

>
> Is this "mistake" in design general? How about MIPS and PPC ?

I don't know anything about MIPS. I do suppose that the PPC MMU/Cache
system works like that of the X86, as this is originally meant for
Desktops as well.

>
> Great info. Why is task switches a problem? Is it because the "cached table
> entries" must be refreshed when changeing memory space?
>

If there are no task switches there in fact is no difference in the
(single) running application whether it runs in full or 湣Linux.

(Besides the potential difference in hardware execution speed) With any
task switch the OS has to do some work to reprogram the MMU (and with
ARM the cache gets invalidated).

-Michael

Morten M. Jørgensen

unread,

Feb 8, 2008, 4:31:00 AM2/8/08

to

> AFAIK the API calls are quite different with µCLinux, but you never do

> direct Linux API calls. You need to link your application against one of

> the µC-aware libraries (instead of gLibC) and thus you will notice no

> difference in "normal" applications. OK there is no "fork" and you need
> to use vfork instead that works a bit different, but you supposedly will
> not use fork a lot anyway in your own code in an embedded device. If you
> want to do multitasking with your supposedly will use the pthread library.

In terms of MMU performance lack I guess that the pthread lib will do
best - since (AFAIK) multiple POSIX threads within the same
application will run in the same address space? a fork() call will
(AFAIK again) spawn a replicate of the process in another address
space - or is this assumption wrong?

Won't I run into problems when importing general Linux application
that I need in my systems - if I run uCLinux?

> Supposedly µCLinux will boot faster than full Linux.

Again because of the lack of a virtuel memory system?

--
MMJ

Michael Schnell

unread,

Feb 8, 2008, 10:47:11 AM2/8/08

to "Morten M. Jørgensen"

>
> In terms of MMU performance lack I guess that the pthread lib will do
> best - since (AFAIK) multiple POSIX threads within the same
> application will run in the same address space? a fork() call will
> (AFAIK again) spawn a replicate of the process in another address
> space - or is this assumption wrong?

You are right that by default threads (e.g created by pthread) use the
same address space while different processes (if not on 湣Linux) (e.g
created by fork) by default use different address spaces. But if that
means that switching threads is much less overhead, I can't say. Between
the two threads the Kernel must run and same uses another address space
than the threads. So the MMU might be involved anyway.

>
> Won't I run into problems when importing general Linux application
> that I need in my systems - if I run uCLinux?

As said, "normal" applications (that can be linked against 湣Linux aware
libraries) should not notice the difference.

>
>> Supposedly 湣Linux will boot faster than full Linux.

>
> Again because of the lack of a virtuel memory system?
>

I do suppose so.

-Michael

David Brown

unread,

Feb 8, 2008, 12:08:19 PM2/8/08

to

Morten M. Jørgensen wrote:
>> AFAIK the API calls are quite different with µCLinux, but you never do
>> direct Linux API calls. You need to link your application against one of
>> the µC-aware libraries (instead of gLibC) and thus you will notice no
>> difference in "normal" applications. OK there is no "fork" and you need
>> to use vfork instead that works a bit different, but you supposedly will
>> not use fork a lot anyway in your own code in an embedded device. If you
>> want to do multitasking with your supposedly will use the pthread library.
>
> In terms of MMU performance lack I guess that the pthread lib will do
> best - since (AFAIK) multiple POSIX threads within the same
> application will run in the same address space? a fork() call will
> (AFAIK again) spawn a replicate of the process in another address
> space - or is this assumption wrong?
>
> Won't I run into problems when importing general Linux application
> that I need in my systems - if I run uCLinux?
>

A good indication of whether or not a given Linux application will run
under ucLinux is if it has a mingw windows port. Windows does not
implement a "fork", so programs compiled with mingw (which is a fairly
minimal wrapper) can't use fork - they must use "vfork" for new
processes. If the application has a cygwin windows port but no mingw
port, then it *may* use "fork", since cygwin implements it (slowly).

Michael Schnell

unread,

Feb 8, 2008, 4:21:57 PM2/8/08

to David Brown

> A good indication of whether or not a given Linux application will run
> under ucLinux is if it has a mingw windows port. Windows does not

> implement a "fork", ...

IMHO, "normal" applications don't spawn other independent applications.
So there is no need for fork anyway.

Moreover the "standard" purpose of fork() is not to let the running
application run twice, but to spawn a different executable file. And
this is done with vfork() (nearly) exactly as with fork(). AFAI
remember, I read, that in full Linux you can do vfork() as well (though
it's not recommended) and in 湣Linux you can only do vfork. It looks
like in most cases the only porting effort is adding the "v" :).

-Michael

Michael Schnell

unread,

Feb 9, 2008, 6:32:06 AM2/9/08

to David Brown

I found an appropriate reference:

http://www.unixguide.net/unix/programming/1.1.2.shtml

The basic difference between the two is that when a new process is
created with vfork(), the parent process is temporarily
suspended, and the child process might borrow the parent's address
space. This strange state of affairs continues until the child process
either exits, or calls execve(), at which point the parent
process continues.

-Michael

Michael Schnell

unread,

Feb 9, 2008, 6:36:26 AM2/9/08

to David Brown

IMHO this means if you do the "normal" stuff (i.e. just starting a
program from a file):

if (!vfork()){
exec?(...);
}

it does not matter if you use fork() or vfork().

-Michael

Michael Schnell

unread,

Feb 9, 2008, 6:43:36 AM2/9/08

to David Brown

nother reference that also talks about the differences between fork and
vfork that you will see if the child not immediately calls exec?().

-Michael

http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/archive/2006/08/msg00070.html

If your app does an exec immediately after the fork, then it's usually
really easy. Just replace fork with vfork, and that's about it.

If not, then life is more difficult - you need to very carefully audit,
and sometimes re-factor, the application flow around the fork.

Things to remember:

1. The parent blocks until the child calls exec() or _exit()

2. The child shares all data and stack with parent, so must not return
from a function call (unwinds the stack).

3. Any variable modifications (local or global) by the child must be
carefully checked for side effects in the parent. Watch for side
effects in library calls, like the "errno" variable.

Michael Schnell

unread,

Feb 9, 2008, 6:46:29 AM2/9/08

to

Here is a rather detailed article regarding the topic

http://www.linuxjournal.com/article/7221

-Michael

David Brown

unread,

Feb 11, 2008, 5:23:28 AM2/11/08

to

There are three common uses that I can think of for (v)fork from
"normal" applications. One is in forking servers, another other is for
executing external subtasks, and the third is for splitting a task into
parallel executed parts.

In the first case, things like webservers will often fork new processes
to handle incoming connections. In this situation, it must be a real
fork, since the new process keeps the same code and inherits things like
file handles from the parent. This is a traditional unix server
arrangement, and does not work well under ucLinux or windows ("fork" in
*nix is extremely efficient using COW, but very slow if you don't have
an MMU and must copy everything, or if the OS simply doesn't support the
concept). Such servers need to be heavily modified to work without fork
- they need to either use select() and other such asynchronous
techniques, or they must use threads instead of processes. (Modern
apache, for example, uses a mixture of forks and threads.)

For applications that fork off external subtasks, you normally see a
fork/exec pair, often connected by a pipe to the parent. This sort of
structure is normally fairly easily modified to a vfork.

For applications that use fork to parallise (is that a word?) their
execution (keeping the same binary, but with different processes
executing different parts of the code), it is probably better to
re-write using threads. Traditionally, *nix was bad at thread handling
- there was no standardisation, and it was very unclear how threads
relate to processes for scheduling). Since fork was so cheap on *nix,
there was no real need for threads - unlike on windows, where fork is
expensive so threads were needed. But modern linux and ucLinux handle
threads well, making it a good choice in many situations.

MMJ

unread,

Feb 11, 2008, 9:47:24 AM2/11/08

to

Hi Michael,

Thx for all you great answers! I'll look into all the links you have thrown!

BR

--
MMJ

Xenu The Enturbulator

unread,

Feb 12, 2008, 4:15:37 AM2/12/08

to

Michael Schnell wrote:

> (Besides the potential difference in hardware execution speed) With any
> task switch the OS has to do some work to reprogram the MMU (and with
> ARM the cache gets invalidated).

Are you sure about this ? If it were true, the performance would
completely suck. Remember, ARM was originally designed for desktop use
as well, AFAIK .. wasn't it used in those Acorn Archimedes machines ?

David Brown

unread,

Feb 12, 2008, 5:41:33 AM2/12/08

to

The first ARM was for the Acorn Archimedes machines, but it did not have
an MMU, and I'm not sure that it even had a cache (it ran at 8 MHz IIRC,
and in those days memory was not much slower than cpus).

There are two ways to handle cache and MMU - you can cache by physical
address (which causes slower access to the cached data, as addresses
need to be translated before accessing the cache), or you can cache by
virtual address (which is faster for the cpu to access as the logical
addresses are used directly, but it requires a cache flush when changing
the MMU maps).

I don't know which method the ARM uses. I've a vague feeling that on
larger processors, L1 caches use virtual addresses while L2 (and L3) use
physical addresses, but that could be wrong.

Michael Schnell

unread,

Feb 12, 2008, 1:37:07 PM2/12/08

to David Brown

An excellent explanation, IMHO !

Thanks,
-Michael

Michael Schnell

unread,

Feb 12, 2008, 1:41:57 PM2/12/08

to Xenu The Enturbulator

> Are you sure about this ?

I'm not sure how it really works (you would need to take a look at the
Linux source code). But as the application can't can't access the
Kernel's memory and the Kernel can, there needs to be done _something_
_somehow_ in entering and leaving the Kernel land.

-Michael

Michael Schnell

unread,

Feb 12, 2008, 1:43:59 PM2/12/08

to David Brown

> I don't know which method the ARM uses.

With ARM the cache is between the MMU and the CPU. Thus you need to
flush it when the MMU table is changed.

Thus with ARM for high performance you either don't use the MMU or do as
little tasks switches as possible.

-Michael