Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Where to site the memory allocator

18 views
Skip to first unread message

James Harris

unread,
Dec 16, 2009, 6:16:02 PM12/16/09
to
IIRC the original Unix had malloc and free in the kernel. Whether it
did or not I'd like to ask you folks for your opinions on where the
allocator should be. AIUI Unix now only has brk/sbrk in the kernel and
malloc etc have been moved to the language domain.

(Of course, on a paged machine really it's the address space that gets
allocated rather than memory.)

What options are there?

Option 1. Userspace allocator.
* Disadvantage: Memory control structures can be overwritten by a
rogue pointer.

Option 2. Kernelspace allocator.
* Disadvantage: Requires kernel call.

Option 3. Hybrid - Userspace called first, calls kernel where needed.
* Disadvantage: Memory control structures can be overwritten by a
rogue pointer.

Not sure if it's relevant but the old "text + data .... heap ....
stack" model seems inflexible. With modern apps there may be multiple
stacks and there are apps which would like to use as much of 4Gbytes
as possible on a 32-bit machine. We also sometimes use memory more
cleverly by playing games with the page bits. So the simple brk/sbrk
now appears to me to be simplistic. Corrections welcome!

Any comments on how best to address these issues?

James

Maxim S. Shatskih

unread,
Dec 17, 2009, 12:06:26 AM12/17/09
to
> IIRC the original Unix had malloc and free in the kernel.

IIRC it had brk/sbrk and user-mode malloc based on brk/sbrk

--
Maxim S. Shatskih
Windows DDK MVP
ma...@storagecraft.com
http://www.storagecraft.com

BGB / cr88192

unread,
Dec 17, 2009, 1:35:59 AM12/17/09
to

"James Harris" <james.h...@googlemail.com> wrote in message
news:9ef85636-becd-4809...@j4g2000yqe.googlegroups.com...

the "generic" way here, is to implement something along the lines of mmap or
VirtualAlloc.

in this case, the app allocates memory in the form of "mappings" (sort of
like malloc, but page-based), and then implements malloc by dividing up
these bigger chunks of memory into smaller memory objects.

likewise, these mappings may also be used to allocate stacks, regions to
load libraries into, ...

for example:
VirtualAlloc a region for loading a DLL into;
read the PE/COFF image into this region, and fix-up for the correct load
address (note that this simplistic 1:1 loading strategy assumes that the
file-layout matches the image-layout, and I am not sure if this much is
gueranteed, as the PE/COFF spec doesn't say much on this...).

note that kernel involvement may be needed to set up the initial process
image (such as loading an EXE and core DLLs).

after initial loading, then the app takes over, and itself may grab memory
for things like malloc, ... via the kernel-provided facilities.

it is not likely to be too much different for an ELF-based system (only that
I think Linux uses 'ld.so', which is apart from the kernel but manages app
loading, although personally I am not sure how it works exactly...).

or, FWIW, someone could even use something along the lines of a MS-DOS
'.COM' file, although there are technical reasons for why this would kind of
suck...


> James


Rod Pemberton

unread,
Dec 17, 2009, 4:56:34 AM12/17/09
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:9ef85636-becd-4809...@j4g2000yqe.googlegroups.com...
> AIUI Unix now only has brk/sbrk in the kernel and
> malloc etc have been moved to the language domain.

I thought Linux converted to mmap().

> IIRC the original Unix had malloc and free in the kernel. [...]
> AIUI Unix now only has brk/sbrk in the kernel [...]

I thought either brk or sbrk was in the Unix kernel.

> (Of course, on a paged machine really it's the address space that gets
> allocated rather than memory.)

?

> What options are there?
>
> Option 1. Userspace allocator.
> * Disadvantage: Memory control structures can be overwritten by a
> rogue pointer.

safety: prohibit pointers
safety: insert bounds checking code
privilege: RW for OS. RO for user.
slower (?): less privilege slows code execution
relocate (?): don't put memory control structures near memory that uses them
to reduce chance of "rogue" pointer overwrite
checksum (?): ...

Two selectors with different rights and privilege can cover the same memory
region, yes? Differents rights and privilege can be setup via page mapping?

> Option 2. Kernelspace allocator.
> * Disadvantage: Requires kernel call.

overhead: kernel call vs. user-space call
less safe (?): more code with more privilege allows for more potential
breaches, "privilege escalation" attacks
faster (?): fewer privilege related checks via cpu hardware when code runs
with more privilege

Overhead for both should be similar for non-privileged OS. For privileged
OS, how much overhead do the SYSENTER/SYSEXIT or similar methods add versus
a call/jmp instruction?

> Option 3. Hybrid - Userspace called first, calls kernel where needed.
> * Disadvantage: Memory control structures can be overwritten by a
> rogue pointer.

**AND**

> * Disadvantage: Requires kernel call.

Hybrid should have disadvantages of _both_, yes? If not, is it hybrid?

OK, back to option 3 with both disadvantages:

> Option 3. Hybrid - Userspace called first, calls kernel where needed.
> * Disadvantage: Memory control structures can be overwritten by a
> rogue pointer.
> * Disadvantage: Requires kernel call.

I think you should make a distinction between "hybrid" or mixed and "dual".
"Dual" being two allocators and "hybrid" being part of allocator in
user-space and part in kernel-space. I wasn't quite clear as to what you
meant. Some of my comments below are for one, some for the other...

* Disadvantage: Requires two allocators to be coded.
* Advantage: Userspace gets large blocks from from kernel, manages with own
allocator. User-space allocator can be optimized for smaller
allocations/reclamations. Kernel space allocator optimized for larger block
allocations for processes, and kernel memory usage. User-space allocator
reduces kernel calls.

It might be useful to push out all allocator code not required to be in the
kernel to the user space. That'll probably leave a small amount of code in
the kernel as well as the overhead of a kernel call. If in a privileged OS,
it reduces the amount of code run with more privilege. Safer? Having most
of the allocator in user-space might also ease reimplementation or changing
of the allocator. It might also allow different allocators on a per process
basis. OTH, the allocator will be used heavily. The overhead of many
kernel calls may be a bottleneck.

> Not sure if it's relevant but the old "text + data .... heap ....
> stack" model seems inflexible.

How so?

> With modern apps there may be multiple
> stacks

It's all just memory. I see no reason why heap, stack, data, are treated
distinctly. If so designed, they could all be allocated from the same
memory allocator. What are multiple stacks but one big stack divided?

> and there are apps which would like to use as much of 4Gbytes
> as possible on a 32-bit machine.

True. That can almost be done in a flat memory model. IIRC, PCI devices
can be memory mapped way up there. But with paging, won't you run out of
space in the page tables, esp. if a page size is 4k, when trying to map all
that memory?

> We also sometimes use memory more
> cleverly by playing games with the page bits. So the simple brk/sbrk
> now appears to me to be simplistic. Corrections welcome!

Didn't I post links to different allocators? I have yet to study them.
But, you mean to say that none of them were a good fit with paging?
http://groups.google.com/group/alt.os.development/msg/a67a97467bbcfffd


Rod Pemberton


Jake Waskett

unread,
Dec 18, 2009, 5:29:20 AM12/18/09
to
On Wed, 16 Dec 2009 15:16:02 -0800, James Harris wrote:

> Option 1. Userspace allocator.
> * Disadvantage: Memory control structures can be overwritten by a rogue
> pointer.

On pretty much any system, userspace programs can cause themselves to
crash in some way or another, so I wonder whether this is a significant
disadvantage. It seems to me that poorly written programs will be
unreliable in any environment.

In embedded systems, especially those with small memories (think 8-bit
systems), it can be an advantage to share the allocator code between the
"kernel" (often just an executive) and the user programs.

I once devised an interesting design in which a real-time kernel actually
relied entirely on an external thread (executing in the kernel's address
space context) for allocating the kernel's own data structures. The
kernel kept pools of various kinds of objects available for allocation,
and sent a message to a designated thread (thus waking it up) when a pool
was empty (or down to the emergency level). As long as that thread had a
sufficiently high priority, it was able to donate memory to the pool
before the kernel had to start failing system calls. A similar mechanism
was used for switching address space contexts - the kernel just scheduled
threads, and was essentially unaware of the MMU.

Rod Pemberton

unread,
Dec 18, 2009, 6:12:25 AM12/18/09
to
"Jake Waskett" <ja...@waskett.org> wrote in message
news:4QIWm.42230$IZ1....@newsfe19.ams2...

> On Wed, 16 Dec 2009 15:16:02 -0800, James Harris wrote:
>
> > Option 1. Userspace allocator.
> > * Disadvantage: Memory control structures can be overwritten by a rogue
> > pointer.
>
> On pretty much any system, userspace programs can cause themselves to
> crash in some way or another, so I wonder whether this is a significant
> disadvantage. It seems to me that poorly written programs will be
> unreliable in any environment.
>

The problem is actually deeper than that. To prevent a user from "trashing"
the system, you must prevent the user from executing binary code. But,
that's exactly what a cpu is designed to do: execute binary code. I.e.,
contradiction.

In a system without privileges, if the user has the ability to store data,
then the user has the ability to execute code. One of the best ways to
limit the users ability to execute code is to use an interpreter. The user
is limited to a few interpreter functions and the "sandboxed" memory used by
the interpreter. But, even then you can't stop the user from executing
binary code. If the user only has access to a text editor, then the user
can store data. If executed, this stored "text data" will function as
binary code. Being able to store data, can help the user with his/her
desire to bypass the limitations of the intrepreter to directly execute
binary code. Of course, you can't implement _any_ useful program without
allowing the user to store data somehow... If an interpreter is only a
partial solution, how do you prevent the user from executing binary code and
still do work? Or, how do you severely limit what binary code the user can
execute? Privileges on x86 can be used to prevent code in the "wrong"
memory regions, i.e., "text data", from being executed. Privileges on x86
can be used to reduce the instruction set executable by the user. If you
use an interpreted, threaded, stack language, like FORTH, the executable
code can be much less than 4k. That'll fit in a single 4k page. All other
pages or memory regions in the entire system can then be marked as
non-executable, via XD/NX bits, or be range restricted via segmentation. If
you allow a user to execute binary code, instead of using an interpreter,
you're asking for a crash.


Rod Pemberton


Marven Lee

unread,
Dec 19, 2009, 8:04:26 AM12/19/09
to

James Harris wrote:
> Option 3. Hybrid - Userspace called first, calls kernel where needed.
> * Disadvantage: Memory control structures can be overwritten by a
> rogue pointer.

Use a system call like mmap()/VirtualAlloc() to allocate largish blocks of
memory that is a multiple of the page size, then let user-mode library
functions malloc() and free() carve up these blocks into smaller
allocations. If more memory is needed, call mmap() again and carve
up.

The kernel structures that manage mapped regions created with mmap()
can't be corrupted by user-mode code. structures used by malloc() to
carve up mmap() areas can be corrupted, but at least the problem is
contained in one process.


My system calls for allocating address space are...

vm_addr UMap (vm_addr addr, vm_size len, uint32 prot, uint32 flags)

int UUnmap (vm_offset addr)

Very similar to mmap()/munmap(), but unlike POSIX munmap() I only
specify an address to unmap, munmap() uses 2 arguments, start address
and length.


On top of this my malloc() is implemented as a slab-allocator.

malloc() allocates large blocks of memory (called a slab) and carves
the slab into smaller chunks of equal size. It is these chunks that
are dished out by malloc(). Iin my case the slabs are 64k in size and
also -aligned- on a 64k boundary through a flag passed to UMap().
I use this alignment as a quick way to find the slab header as described
below.

Each 64k slab is carved up into dozens of chunks of memory of equal
size with a header at the beginning of each slab. So I maintain several
lists of slabs with different sizes, I've forgot what I actually use, but
usually something like lists of slabs that contain either 32, 64, 128,
256, 512, 1024, 2048, 4096 size chunks.

So a 64k slab with 64 byte chunks looks like...

------- 0
Slab Header
------- 128
chunk 0
------- 192
chunk 1
------- 256
chunk 2
-------- 320
chunk 3
-------- 384
....
...

-------- 64k

The chunk is either memory that is allocated or holds a pointer in
the linked list of free chunks, there is no linked list nodes between
each chunk.

The slab header is attached to a linked list of of other slabs that
contain the same sized chunks of memory. The slab header
of a particular slab maintains a list of free chunks in the slab.

So malloc() searches the appropriate slab list until it finds a
slab with chunks free, a chunk is then removed from the slab's
free list and returned to the user.

free() basically rounds down the address to the nearest 64k
to find the slab header, hence why I specify the slabs are
allocated on 64k boundaries by UMap(). the chunk can
then be freed by placing it on the slab's free list. If all chunks
in the slab are free then the slab itself can be removed from
the list of slabs and the slab freed with UUnmap().

Allocations larger than 4096 bytes can be mmaped directly,
but still aligned on 64k boundaries. When free() is called
on a 64k aligned object, it knows it is a large object so
UUnmap() is called directly. Since my kernel knows the
length of the object allocated with UMap() there is no
need for a length argument to be stored in user-space.

I use a memory overcommit policy in my kernel, or so
it seems according to what I've read on other OSes.
Memory ranges created by UMap() aren't guaranteed
underlying pages of memory. The pages are mapped
in on demand. So it is possible for malloc() to succeed
but for the OS to run out of pages during a page fault.
I think Linux and FreeBSD uses (or used to use) this
policy, don't think Windows does. I liked this
description of it, http://lwn.net/Articles/104179/
I might change this policy.

I use this as an optimization in my slab allocator, a
64k of address space is allocated, but initially only
the first 4k is used, holding the slab header and
a number of chunks that can fit in 4k. The slab then
grows upto the maximum size of 64k.

A description of my kernel system calls and
in-kernel data structures for managing memory
created with UMap() and UUnmap() system calls
are described in this post...

http://groups.google.co.uk/group/alt.os.development/msg/d6d6fd78ff96452c?hl=en

I do have some ideas on how to modify my memory
management structures to implement copy-on-write
and maybe memory mapped files, but haven't touched
my OS in a while. Don't have Cygwin installed, nor
any tools to view rar archives so haven't read my source
either, everything is on my old computer and website.
I really need to setup cygwin on this laptop.

--
Marv

J de Boyne Pollard

unread,
Dec 24, 2009, 11:58:13 PM12/24/09
to
JH> Not sure if it's relevant but the old "text + data .... heap ....
JH> stack" model seems inflexible.

... which is why a different model has been in use for some 22 years
at least. Neither Windows nor OS/2 ever used such a model. (OS/2 had
to cope with single processes comprising multiple executable images,
with multiple text and data segments, multiple threads, with multiple
stack segments, and multiple heaps, right from its very first version
in 1987.) Unix took a while to catch up, but Linux and Unices haven't
done things this way for a fair number of years now, either.

JH> So the simple brk/sbrk now appears to me to be simplistic.
JH> Any comments on how best to address these issues?

For starters: Get yourself some decent operating systems internals
books for _modern_ operating systems (where "modern" is _more modern
than Unix System V R3_) and read them. The syscall-level memory
management functions, upon which RTL-level heap management is layered,
are things like VirtualAlloc/VirtualFree (in Win32), DosAllocMem/
DosFreeMem/DosSetMem (in OS/2), and mmap/mprotect (in OpenBSD). (OS/2
and Win32 also provide most of the layer above, in the form of non-
syscall system library functions for managing heaps: DosSubSetMem/
DosSubAllocMem/DosSubFreeMem/DosSubUnsetMem and HeapCreate/HeapAlloc/
HeapFree/HeapDestroy respectively.)

James Harris

unread,
Jan 14, 2010, 3:48:14 PM1/14/10
to
On 17 Dec 2009, 05:06, "Maxim S. Shatskih"
<ma...@storagecraft.com.no.spam> wrote:

> > IIRC the original Unix had malloc and free in the kernel.
>
> IIRC it had brk/sbrk and user-mode malloc based on brk/sbrk

I don't know. I have a copy of Lions' Commentary on Unix. It includes
routines malloc and mfree and a routine called sbreak (which is not
called by malloc). There seem to be no brk or sbrk calls.

This isn't the malloc as directly called in C. It has the coremap as
its first parameter. It allocates first fit and returns the base of
the allocated space.

James

James Harris

unread,
Jan 14, 2010, 4:50:15 PM1/14/10
to
On 17 Dec 2009, 09:56, "Rod Pemberton" <do_not_h...@nohavenot.cmm>
wrote:
...

> > AIUI Unix now only has brk/sbrk in the kernel and
> > malloc etc have been moved to the language domain.
>
> I thought Linux converted to mmap().

Do you mean that malloc calls mmap to allocate more address space
rather than calling brk or sbrk? That would be significant if it does.

>
> > IIRC the original Unix had malloc and free in the kernel. [...]
> > AIUI Unix now only has brk/sbrk in the kernel [...]
>
> I thought either brk or sbrk was in the Unix kernel.

See my reply to Max. There was an sbreak call in the version in Lions'
Commentary.

>
> > (Of course, on a paged machine really it's the address space that gets
> > allocated rather than memory.)
>
> ?

Where paging is in play I would only expect address space to be
allocated by the memory allocator. The physical memory probably won't
be allocated until required by a page fault.

>
> > What options are there?
>
> > Option 1. Userspace allocator.
> > * Disadvantage: Memory control structures can be overwritten by a
> > rogue pointer.
>
> safety: prohibit pointers

True. That would work. OK for address spaces containing memory-safe
code.

> safety: insert bounds checking code

Would also work.

> privilege: RW for OS.  RO for user.
> slower (?): less privilege slows code execution

> relocate (?): don't put memory control structures near memory that uses them
> to reduce chance of "rogue" pointer overwrite

Not a guarantee but yes this would be a helpful safeguard. It would
stop there being that little 8-byte or so gap between allocations.
That may or may not be a good thing in terms of performance. Need to
consider cache line contention in innermost loops. Something that I
doubt existing compilers generally do.

> checksum (?): ...
>
> Two selectors with different rights and privilege can cover the same memory
> region, yes?  Differents rights and privilege can be setup via page mapping?

Yes

> > Option 2. Kernelspace allocator.
> > * Disadvantage: Requires kernel call.
>
> overhead: kernel call vs. user-space call
> less safe (?): more code with more privilege allows for more potential
> breaches, "privilege escalation" attacks

No, more safe. At least, more safe if written properly. IMHO it is
essential that any privileged routine checks *all* passed-in
parameters. Trust no-one. Always check.

> faster (?): fewer privilege related checks via cpu hardware when code runs
> with more privilege

I doubt it. At least I cannot think of any significant examples on
x86.

> Overhead for both should be similar for non-privileged OS.  For privileged
> OS, how much overhead do the SYSENTER/SYSEXIT or similar methods add versus
> a call/jmp instruction?

The overhead should be smaller than people generally expect. Frankly
I'm not sure where fear over system calls comes from unless it's from
horror stories on old architectures.

Some operating systems may save and restore a lot of state
unnecessarily, again giving system calls a bad name.

I never thought of this before but maybe the slow saving of state is
due to writing kernel code in a high level language. Even an optimiser
is not going to know where the call comes from and so pretty much
everything must be saved.

By contrast, my interrupt handling code in assembler is written to
initially save just eax and edx before it decides what to do. If all
it has to do is update counters and other housekeeping and return I
don't think I save anything else. I may save two other 32-bit
registers in some cases. I can't remember off hand. IIRC my thought
was that saving registers in twos to adjacent memory allows them to
take advantage of the CPU's write buffers. You just can't work at that
level of detail in a HLL.

>
> > Option 3. Hybrid - Userspace called first, calls kernel where needed.
> > * Disadvantage: Memory control structures can be overwritten by a
> > rogue pointer.
>
> **AND**
>
> > * Disadvantage: Requires kernel call.
>
> Hybrid should have disadvantages of _both_, yes?  If not, is it hybrid?

No. It does retain at least the speed _advantage_ of user-space
because there would be far fewer calls to privileged mode.

> OK, back to option 3 with both disadvantages:
>
> > Option 3. Hybrid - Userspace called first, calls kernel where needed.
> > * Disadvantage: Memory control structures can be overwritten by a
> > rogue pointer.
> > * Disadvantage: Requires kernel call.
>
> I think you should make a distinction between "hybrid" or mixed and "dual".
> "Dual" being two allocators and "hybrid" being part of allocator in
> user-space and part in kernel-space.  I wasn't quite clear as to what you
> meant.  Some of my comments below are for one, some for the other...
>
> * Disadvantage: Requires two allocators to be coded.
> * Advantage: Userspace gets large blocks from from kernel, manages with own
> allocator.  User-space allocator can be optimized for smaller
> allocations/reclamations.  Kernel space allocator optimized for larger block
> allocations for processes, and kernel memory usage.  User-space allocator
> reduces kernel calls.
>
> It might be useful to push out all allocator code not required to be in the
> kernel to the user space.  That'll probably leave a small amount of code in
> the kernel as well as the overhead of a kernel call.  If in a privileged OS,
> it reduces the amount of code run with more privilege.  Safer?  Having most
> of the allocator in user-space might also ease reimplementation or changing
> of the allocator.  It might also allow different allocators on a per process
> basis.  OTH, the allocator will be used heavily.  The overhead of many
> kernel calls may be a bottleneck.

The last point is an excellent one. Even though I anticipate a cheaper-
than-normal syscall interface some apps do make very heavy use of
memory allocation and deallocation.

Incidentally, as an aside, I have a 'killer' idea for fast memory
allocation and deallocation requests - but that's off topic here. :-|

I'm not so sure about pushing too much allocator code out for
security. Userspace can only update memory management structures that
rogue pointers can also update. The kernel can update memory
structures that are protected from misbehaving apps.

I'd go with the idea of having different allocators, though. What I'm
writing at the moment distinguishes address spaces. So there could be
a different allocation mechanism per address space and address spaces
would be protected from one another. (They could share memory if
desired which would lessen the protection somewhat.)

>
> > Not sure if it's relevant but the old "text + data .... heap ....
> > stack" model seems inflexible.
>
> How so?

As mentioned, many ways such as

1. multithreading needs multiple stacks

2. the stacks need to be sized, not unlimited until they run into the
heap

3. multiple instances of a process shouldn't require the code to
appear in each one

>
> > With modern apps there may be multiple
> > stacks
>
> It's all just memory.  I see no reason why heap, stack, data, are treated
> distinctly.  If so designed, they could all be allocated from the same
> memory allocator.

I agree.

> What are multiple stacks but one big stack divided?

Well, individual stacks should have limits top and bottom.

> > and there are apps which would like to use as much of 4Gbytes
> > as possible on a 32-bit machine.
>
> True.  That can almost be done in a flat memory model.  IIRC, PCI devices
> can be memory mapped way up there.  But with paging, won't you run out of
> space in the page tables, esp. if a page size is 4k, when trying to map all
> that memory?

Can you say a bit more about what you are thinking of here? For paging
on x86 unless playing games with the page tables each process needs a
minimum of two, maybe three pages for the page tables. (I'm including
the page dir in this.) If each page dir entry controls the allocation
of 4Mbytes that will allow allocation of 8 Meg or 12 Meg of address
space. A maximum 32-bit paging structure is 4Mbytes + 4k, i.e. about a
thousandth of the ram in a 4Gbyte machine. That doesn't sound too bad.
There may be many processes but not all processes would use that much.

>
> > We also sometimes use memory more
> > cleverly by playing games with the page bits. So the simple brk/sbrk
> > now appears to me to be simplistic. Corrections welcome!
>
> Didn't I post links to different allocators?  I have yet to study them.
> But, you mean to say that none of them were a good fit with paging?
> http://groups.google.com/group/alt.os.development/msg/a67a97467bbcfffd

I think I had a look at all of the links that worked but it was brief.
The problem with many documented solutions is that they have already
come up with a complete answer. I don't really want to take on board
too much of what someone else has done - at least until and unless I
get stuck. I might miss something innovative. I am open to individual
ideas, though. Comments made by folks on this newsgroup are often
better for generating new ideas.

James

James Harris

unread,
Jan 14, 2010, 5:05:10 PM1/14/10
to
On 18 Dec 2009, 10:29, Jake Waskett <j...@waskett.org> wrote:
> On Wed, 16 Dec 2009 15:16:02 -0800, James Harris wrote:
> > Option 1. Userspace allocator.
> > * Disadvantage: Memory control structures can be overwritten by a rogue
> > pointer.
>
> On pretty much any system, userspace programs can cause themselves to
> crash in some way or another, so I wonder whether this is a significant
> disadvantage.  It seems to me that poorly written programs will be
> unreliable in any environment.

True. I mustn't forget that. The problem I have with it is where the
error manifests itself. I don't mind if user code corrupts itself.
That's the author's problem. If I write a memory allocator, though,
and trust the pointers passed in to me as I suspect frees and reallocs
generally do I could do all kinds of damage.

>
> In embedded systems, especially those with small memories (think 8-bit
> systems), it can be an advantage to share the allocator code between the
> "kernel" (often just an executive) and the user programs.
>
> I once devised an interesting design in which a real-time kernel actually
> relied entirely on an external thread (executing in the kernel's address
> space context) for allocating the kernel's own data structures.  The
> kernel kept pools of various kinds of objects available for allocation,
> and sent a message to a designated thread (thus waking it up) when a pool
> was empty (or down to the emergency level).  As long as that thread had a
> sufficiently high priority, it was able to donate memory to the pool
> before the kernel had to start failing system calls.

Where did it get the memory from?

>  A similar mechanism
> was used for switching address space contexts - the kernel just scheduled
> threads, and was essentially unaware of the MMU.

I take it the memory management thread ran in its own protected
memory?

James

Rod Pemberton

unread,
Jan 14, 2010, 6:25:56 PM1/14/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:259ee4dd-535e-4786...@m16g2000yqc.googlegroups.com...

> On 17 Dec 2009, 09:56, "Rod Pemberton" <do_not_h...@nohavenot.cmm>
> wrote:
...
> > > AIUI Unix now only has brk/sbrk in the kernel and
> > > malloc etc have been moved to the language domain.
> >
> > I thought Linux converted to mmap().
>
> Do you mean that malloc calls mmap to allocate more address space
> rather than calling brk or sbrk? That would be significant if it does.
>

Yes, that's what I thought... You'll have to search or ask someone to
confirm.

> > > (Of course, on a paged machine really it's the address space that gets
> > > allocated rather than memory.)
> >
> > ?
>
> Where paging is in play I would only expect address space to be
> allocated by the memory allocator. The physical memory probably won't
> be allocated until required by a page fault.

Ah...

> Incidentally, as an aside, I have a 'killer' idea for fast memory
> allocation and deallocation requests - but that's off topic here. :-|

...

> > > Not sure if it's relevant but the old "text + data .... heap ....
> > > stack" model seems inflexible.
> >
> > How so?
>
> As mentioned, many ways such as
>
> 1. multithreading needs multiple stacks
>

Multiple processes can use the same stack. Collision or corruption needs to
be prevented. Using multiple stacks is an "easy" solution for that.

> 2. the stacks need to be sized, not unlimited until they run into the
> heap
>

Has the stack ever run into the heap - unintentionally? How much stack does
an application need? Those questions always run through my mind. I could
see the heap colliding with the stack... But, it seems to me there is
always "more than enough". By "sizing" the stack, I'm assuming you mean
limiting the stack to a specific size. If so, a stack overflow could occur.
Your choices are to "bail out", attempt to recover, or force programmer to
manage the stack via control functions. Nobody wants to do extra coding to
be "stack safe". Recovering from a stack overflow seems more problematic to
me than having an excessively large stack.

> 3. multiple instances of a process shouldn't require the code to
> appear in each one

How do you handle the logistics of executing the same code, in the same
memory space, for multiple data sets in different memory spaces? This
"reeks" of parallelism. You can go there. I don't want to go there. :-)

If you remember some of my past statements, that seems problematic to me.
IMO, for safety, they should be kept separate.

> > > and there are apps which would like to use as much of 4Gbytes
> > > as possible on a 32-bit machine.
> >
> > True. That can almost be done in a flat memory model. IIRC, PCI devices
> > can be memory mapped way up there. But with paging, won't you run out of
> > space in the page tables, esp. if a page size is 4k, when trying to map
all
> > that memory?
>
> Can you say a bit more about what you are thinking of here?

Well, my experience is with paging and non-paging DPMI hosts. The
non-paging DPMI hosts can access all memory and all memory is physically
mapped, i.e., flat, base 0, 4GB. With a paging DPMI host using 4KB page
size, AIUI, they can't physically map all memory. Supposedly, with a 4KB
page size, one needs more page tables than can be created to physically map
all memory. With a 4MB page size, they've got no problem physically mapping
all memory.

> For paging
> on x86 unless playing games with the page tables each process needs a
> minimum of two, maybe three pages for the page tables. (I'm including
> the page dir in this.) If each page dir entry controls the allocation
> of 4Mbytes that will allow allocation of 8 Meg or 12 Meg of address
> space. A maximum 32-bit paging structure is 4Mbytes + 4k, i.e. about a
> thousandth of the ram in a 4Gbyte machine. That doesn't sound too bad.
> There may be many processes but not all processes would use that much.

Ok.


Rod Pemberton


James Harris

unread,
Jan 19, 2010, 12:19:43 PM1/19/10
to
On 14 Jan, 23:25, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> > > > Not sure if it's relevant but the old "text + data .... heap ....
> > > > stack" model seems inflexible.
>
> > > How so?
>
> > As mentioned, many ways such as
>
> > 1. multithreading needs multiple stacks
>
> Multiple processes can use the same stack.

They can? As much as I run that phrase over in my mind I can't see
what you mean. Perhaps you mean with cooperative multitasking. When
preempting AFAICS you need a stack per thread.

>  Collision or corruption needs to
> be prevented.  Using multiple stacks is an "easy" solution for that.

I suppose it could be called easy. I'm not aware of another *sensible*
way. What other way did you have in mind?

>
> > 2. the stacks need to be sized, not unlimited until they run into the
> > heap
>
> Has the stack ever run into the heap - unintentionally?  How much stack does
> an application need?  Those questions always run through my mind.  I could
> see the heap colliding with the stack...  But, it seems to me there is
> always "more than enough".

True, this is probably nearly always the case on a PC system because
there is so much address space available and we don't generally write
apps with such a large footprint. I have run an app which simulates
routers, though, and, for it, memory is limited.

>  By "sizing" the stack, I'm assuming you mean
> limiting the stack to a specific size.

Yes. If there are to be multiple threads in a process doesn't
something need to say how large a stack will be? It could be the app
or it could be a default. You could think of this either as a stack
limit or a minimum separation between stacks.

>  If so, a stack overflow could occur.
> Your choices are to "bail out", attempt to recover, or force programmer to
> manage the stack via control functions.  Nobody wants to do extra coding to
> be "stack safe".  Recovering from a stack overflow seems more problematic to
> me than having an excessively large stack.

Sure. It depends on how many threads and stacks exist in the same
memory space.

>
> > 3. multiple instances of a process shouldn't require the code to
> > appear in each one
>
> How do you handle the logistics of executing the same code, in the same
> memory space, for multiple data sets in different memory spaces?  This
> "reeks" of parallelism.  You can go there.  I don't want to go there.  :-)
>
> If you remember some of my past statements, that seems problematic to me.
> IMO, for safety, they should be kept separate.

Yes, we have discussed stuff like this. Let's not go back there!

...

> > Can you say a bit more about what you are thinking of here?
>
> Well, my experience is with paging and non-paging DPMI hosts.  The
> non-paging DPMI hosts can access all memory and all memory is physically
> mapped, i.e., flat, base 0, 4GB.  With a paging DPMI host using 4KB page
> size, AIUI, they can't physically map all memory.  Supposedly, with a 4KB
> page size, one needs more page tables than can be created to physically map
> all memory.  With a 4MB page size, they've got no problem physically mapping
> all memory.

Maybe an issue with the DPMI implementation? I think that for a 4Gbyte
address space the max page tables size is 4Mbytes + 4k. This is only
1/1000th of the address space. The proportion of the RAM will
naturally be dependent on how much RAM there is in the box. A smaller
set of page tables will be required for most processes.

I'm not sure where DPMI comes into the OS equation. Maybe booting or
testing? As you know, IMHO bare metal may be easier than beginning
from a DOS system. There's more to do but less baggage left over to
learn about. Sorry if I'm making some wrong assumptions about your
plans here.

James

Rod Pemberton

unread,
Jan 19, 2010, 8:45:26 PM1/19/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:65768760-8aa5-45b6...@34g2000yqp.googlegroups.com...

On 14 Jan, 23:25, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> > > > Not sure if it's relevant but the old "text + data .... heap ....
> > > > stack" model seems inflexible.
>
> > > How so?
>
> > As mentioned, many ways such as
>
> > 1. multithreading needs multiple stacks
>
> > Multiple processes can use the same stack.
>
> They can? As much as I run that phrase over in my mind I can't see
> what you mean. Perhaps you mean with cooperative multitasking. When
> preempting AFAICS you need a stack per thread.
>

Are you thinking single core? With multiple core, I can see the need for
each core to have it's own stack. But, with a single core, there is only
one location where code is being executed at any point in time. So, only
one process/thread is executing at a time. I.e., it should be possible for
multiple processes to use the same stack since only one process etc. can
access the stack at a time. Each process uses the stack, cleans up what it
did so the stack is returned to the state used by prior process. Even with
interrupts or preempting this should work with multiple processes.

(My current OS is written almost entirely in C. It only has one stack for
everything, interrupts included. I.e., if it's a problem, I haven't found
it yet...)

> > Collision or corruption needs to
> > be prevented. Using multiple stacks is an "easy" solution for that.
>
> I suppose it could be called easy. I'm not aware of another *sensible*
> way. What other way did you have in mind?

Oh, I never said *sensible*... :-) The whole point of coding your own OS
is to do things your way. Isn't it? E.g., I know the layout for my
keyboard tables and their shift states isn't sensible, in terms of speed.
But, it's organized.

IMO, as long as each process only accesses it's data stored on the stack,
there is no problem. Implementation details left to the reader.

Let's take C. C uses the same stack for storage and retrieval of both data
and control flow information. Data being the parameters and locals. Each
of these activities could be viewed as a being done by a separate process.
If we view them as processes, the "control flow" process puts control flow
information onto the stack which is weaved in-between the data information
that "data flow" process puts onto the stack. By design, each "process" is
kept separate or only accesses it's data on the stack. This works because
they are "in-sync", or ping-pong, so they don't overwrite each others
information. I.e., only one active user of the stack at a time.

> If there are to be multiple threads in a process doesn't
> something need to say how large a stack will be? It could be the app
> or it could be a default. You could think of this either as a stack
> limit or a minimum separation between stacks.

Say there is. The thread ran into the wall. What do you do?

1) Do you stop the thread? If the stack limit is a hard boundary, it can't
allocate more space on the stack. But, it needs more stack space to
continue executing. If you stop threads because they keep hitting the stack
boundaries, no work gets done. There's no point in executing anything. So,
you don't want to stop threads. (Except when the user selects to terminate
them.) You need a bigger stack.

2) Do you allocate more stack space? If you do, your stack is disjointed or
non-contiquous. I.e., a stack pointer which is following the stack elements
as a contiquous array, won't be able to cross the boundary or gap created by
two different memory allocations. You could move, reallocate, and copy, the
stack. Ah, but, you've got paging enabled, so you remap memory. But, now
the code pointers and addresses to the data that was in the region which was
remapped is incorrect. You changed their addresses via remapping. How do
you correct these? Since you don't know which memory is going to get
remapped to increase a stack, every pointer or address will have to be an
indirect pointer. You need tables of indirect pointers. Let's hope you
didn't locate the indirect pointer tables anywhere near a stack. If it gets
relocated...

What's *sensible*?... ;-)

> As you know, IMHO bare metal may be easier than beginning
> from a DOS system.

Well, I recently tested/reworked some Multi-Boot (e.g., GRUB bootable)
assembly code to start up a C routine. It worked. So, I grafted it onto my
OS. I *almost* got my OS to build Multi-Boot versions. Since they didn't
build, I couldn't check to see if the header was correctly located. I had
to change the build sequence to manually link in the assembly code. That
caused a slight problem. Manually linking doesn't link to the default
compiler libraries. And, I'm using quite a few functions in the default
libraries. I.e., missing functions... I'm hoping to either 1) figure out
how to specify the default C libraries or 2) rewrite the OS to be
independent of the C libraries, which needs to be done anyway. #1 should be
fairly easy for DJGPP, hopefully. I'd like to get that working before I do
#2 (rewrite). I think #1 might be a problem for OpenWatcom... If I ever
get my compiler working, that won't be needed. In which case, I'll need to
do another rewrite to convert to my compiler. Unfortunately, that may take
as long as the OS. So, I'll probably still be developing in parallel.


Rod Pemberton

Alexei A. Frounze

unread,
Jan 20, 2010, 5:49:15 AM1/20/10
to
On Jan 19, 5:45 pm, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> Well, I recently tested/reworked some Multi-Boot (e.g., GRUB bootable)
> assembly code to start up a C routine.  It worked.  So, I grafted it onto my
> OS.  I *almost* got my OS to build Multi-Boot versions.  Since they didn't
> build, I couldn't check to see if the header was correctly located.  I had
> to change the build sequence to manually link in the assembly code.  That
> caused a slight problem.  Manually linking doesn't link to the default
> compiler libraries.  And, I'm using quite a few functions in the default
> libraries.  I.e., missing functions...  I'm hoping to either 1) figure out
> how to specify the default C libraries or 2) rewrite the OS to be
> independent of the C libraries, which needs to be done anyway.  #1 should be
> fairly easy for DJGPP, hopefully.  I'd like to get that working before I do
> #2 (rewrite).  I think #1 might be a problem for OpenWatcom...  If I ever
> get my compiler working, that won't be needed.  In which case, I'll need to
> do another rewrite to convert to my compiler.  Unfortunately, that may take
> as long as the OS.  So, I'll probably still be developing in parallel.
>
> Rod Pemberton

Standard C functions like string functions are relatively trivial to
implement. sprintf() is OK too, unless you go into things like floats
and *. Sometimes it's even hard to implement them in a significantly
different manner than in the existing libraries. I wrote or partially
borrowed a replacement subset of standard functions for Turbo C and
Watcom. Pretty easy, especially if you don't need all weird things for
full compatibility. I had the following:
abort
access
atexit
chdir
chmod
closedir
fclose
fcloseall
feof
fflush
fgetc
fgetpos
fgets
flushall
fnmatch
fopen
fputc
fputs
fread
free
fseek
fsetpos
ftell
fwrite
getch
getcwd
gmtime
localtime
longjmp
malloc
memchr
memcmp
memcpy
memset
mkdir
mktime
movedata
opendir
printch
printf
pwd
readdir
remove
rewind
rewinddir
rmdir
setjmp
sprintf
stat
strcat
strchr
strcmp
strcpy
strcspn
strlen
strlwr
strncat
strncmp
strncpy
strpbrk
strrchr
strspn
strstr
strtok
strupr
time
unlink
utime
vsprintf

Sounds like a whole lot, but it's all very doable and is about enough
to make a teeny tiny OS running. My FAT demo used nothing from the
supplied standard libraries but header files and a few hard-coded or
expected to be such and such subroutines (copy/shift/mul/div for
Borland/Turbo C/C++ and mul/div for Open Watcom).
I believe you can do it too. :)

Alex

James Harris

unread,
Jan 20, 2010, 8:37:49 AM1/20/10
to
On 20 Jan, 01:45, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> > > 1. multithreading needs multiple stacks
>
> > > Multiple processes can use the same stack.
>
> > They can? As much as I run that phrase over in my mind I can't see
> > what you mean. Perhaps you mean with cooperative multitasking. When
> > preempting AFAICS you need a stack per thread.
>
> Are you thinking single core?  With multiple core, I can see the need for
> each core to have it's own stack.  But, with a single core, there is only
> one location where code is being executed at any point in time.  So, only
> one process/thread is executing at a time.  I.e., it should be possible for
> multiple processes to use the same stack since only one process etc. can
> access the stack at a time.  Each process uses the stack, cleans up what it
> did so the stack is returned to the state used by prior process.  Even with
> interrupts or preempting this should work with multiple processes.

Either single or mutliple cores. For example, what about the call
chain? Say function A called B called C. The return addresses (and
normally activation records too) would be on the stack. If you preempt
the process at this point and switch to another process what happens?
Assume the second process tries to return from a called function. Does
it return to process 1's B routine?

>
> (My current OS is written almost entirely in C.  It only has one stack for
> everything, interrupts included.  I.e., if it's a problem, I haven't found
> it yet...)

I can see that interrupts should be OK as long as they clean up the
stack on exit; though, normally, interrupts run privileged while the
apps they interrupt run non-privileged - i.e. on separate stacks. In
x86 terms they have PL0 and PL3 stacks respectively.

At a guess: is your whole OS currently single threaded (like DOS)? Or
do you have a preemptive scheduler that switches between processes?

>
> > > Collision or corruption needs to
> > > be prevented. Using multiple stacks is an "easy" solution for that.
>
> > I suppose it could be called easy. I'm not aware of another *sensible*
> > way. What other way did you have in mind?
>
> Oh, I never said *sensible*...  :-)   The whole point of coding your own OS
> is to do things your way.  Isn't it?

Well, yes but to a point. :-)

Seriously, I agree. We can do it whatever way makes sense to us. I'm
just trying to understand how you can say, "Multiple processes can use
the same stack."

>  E.g., I know the layout for my


> keyboard tables and their shift states isn't sensible, in terms of speed.
> But, it's organized.
>
> IMO, as long as each process only accesses it's data stored on the stack,
> there is no problem.  Implementation details left to the reader.
>
> Let's take C.  C uses the same stack for storage and retrieval of both data
> and control flow information.  Data being the parameters and locals.  Each
> of these activities could be viewed as a being done by a separate process.
> If we view them as processes, the "control flow" process puts control flow
> information onto the stack which is weaved in-between the data information
> that "data flow" process puts onto the stack.  By design, each "process" is
> kept separate or only accesses it's data on the stack.  This works because
> they are "in-sync", or ping-pong, so they don't overwrite each others
> information.  I.e., only one active user of the stack at a time.

Cooperative multitasking (not preemptive) where each process cleans up
the stack before it will switch to another? Basically coroutines?

>
> > If there are to be multiple threads in a process doesn't
> > something need to say how large a stack will be? It could be the app
> > or it could be a default. You could think of this either as a stack
> > limit or a minimum separation between stacks.
>
> Say there is.  The thread ran into the wall.  What do you do?

Kill it. More accurately, raise an exception which would normally lead
to termination of the thread unless that exception had been trapped.
The trap handler would need to run without extra stack space which
could be difficult.

>
> 1) Do you stop the thread?  If the stack limit is a hard boundary, it can't
> allocate more space on the stack.  But, it needs more stack space to
> continue executing.  If you stop threads because they keep hitting the stack
> boundaries, no work gets done.  There's no point in executing anything.  So,
> you don't want to stop threads.  (Except when the user selects to terminate
> them.)  You need a bigger stack.

Either the stack is unbounded or it has a limit. Allowing the thread
to specify its stack size should give a guarantee of address space
available.

>
> 2) Do you allocate more stack space?  If you do, your stack is disjointed or
> non-contiquous.  I.e., a stack pointer which is following the stack elements
> as a contiquous array, won't be able to cross the boundary or gap created by
> two different memory allocations.  You could move, reallocate, and copy, the
> stack.  Ah, but, you've got paging enabled, so you remap memory.  But, now
> the code pointers and addresses to the data that was in the region which was
> remapped is incorrect.  You changed their addresses via remapping.  How do
> you correct these?  Since you don't know which memory is going to get
> remapped to increase a stack, every pointer or address will have to be an
> indirect pointer.  You need tables of indirect pointers.  Let's hope you
> didn't locate the indirect pointer tables anywhere near a stack.  If it gets
> relocated...

As I think you are saying, paging allows remapping of physical memory
but not the addresses that are seen by programs.

I've never considered a discontiguous stack. I'm not sure I want to!

*If* pointers to the stack are allowed (as in C) the stack has to stay
where put and cannot be unbounded. If it runs into other memory space
the thread should be killed but do allow the process to specify the
size of stack needed so it can avoid an overflow.

If pointers to the stack are not allowed there may be other
possibilities....

>
> What's *sensible*?...   ;-)
>
> > As you know, IMHO bare metal may be easier than beginning
> > from a DOS system.
>
> Well, I recently tested/reworked some Multi-Boot (e.g., GRUB bootable)
> assembly code to start up a C routine.  It worked.  So, I grafted it onto my
> OS.  I *almost* got my OS to build Multi-Boot versions.  Since they didn't
> build, I couldn't check to see if the header was correctly located.  I had
> to change the build sequence to manually link in the assembly code.  That
> caused a slight problem.  Manually linking doesn't link to the default
> compiler libraries.  And, I'm using quite a few functions in the default
> libraries.  I.e., missing functions...  I'm hoping to either 1) figure out
> how to specify the default C libraries or 2) rewrite the OS to be
> independent of the C libraries, which needs to be done anyway.  #1 should be
> fairly easy for DJGPP, hopefully.  I'd like to get that working before I do
> #2 (rewrite).  I think #1 might be a problem for OpenWatcom...  If I ever
> get my compiler working, that won't be needed.  In which case, I'll need to
> do another rewrite to convert to my compiler.  Unfortunately, that may take
> as long as the OS.  So, I'll probably still be developing in parallel.

I presume that means adding Grub's multiboot header

http://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format

into the compiled and linked output. Can you add the multiboot header
as part of the code segment - even as assembler declared storage
embedded in the C if necessary? It seems it only needs to be in the
first 8k.

Using Grub is only one step back from using DOS. You still need to
cope with Grub's way of doing things. Though when your kernel starts
you do at least have a bare machine, AIUI, without DOS having hooked
interrupt routines and programmed hardware.

I'm surprised you have no way to include compiler libraries when
manually linking. Are there no command line options on your linker
which allow that? It's interesting that you have managed to call
standard C library functions. IIRC Linux calls specialised versions
such as printk or similar because it has no access to clib. That comes
later. I've not even started the high level language parts yet mainly
due to considering how to link them - maybe similar to the issue you
are having.

James

Rod Pemberton

unread,
Jan 20, 2010, 11:56:45 AM1/20/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:748bece3-3d9f-4448...@k17g2000yqh.googlegroups.com...

On 20 Jan, 01:45, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> > > > 1. multithreading needs multiple stacks
> >
> > > > Multiple processes can use the same stack.
> >
> > > They can? As much as I run that phrase over in my mind I can't see
> > > what you mean. Perhaps you mean with cooperative multitasking. When
> > > preempting AFAICS you need a stack per thread.
> >
> > Are you thinking single core? With multiple core, I can see the need for
> > each core to have it's own stack. But, with a single core, there is only
> > one location where code is being executed at any point in time. So, only
> > one process/thread is executing at a time. I.e., it should be possible
for
> > multiple processes to use the same stack since only one process etc. can
> > access the stack at a time. Each process uses the stack, cleans up what
it
> > did so the stack is returned to the state used by prior process. Even
with
> > interrupts or preempting this should work with multiple processes.
>
> Either single or mutliple cores. For example, what about the call
> chain? Say function A called B called C. The return addresses (and
> normally activation records too) would be on the stack. If you preempt
> the process at this point and switch to another process what happens?
> Assume the second process tries to return from a called function. Does
> it return to process 1's B routine?

Yes, you're correct. Preemptive switching could mess up the control flow.
A possible fix might be to mix preemptive and cooperative multi-tasking.
I.e., the pre-emptive interrupt could set a flag and quickly return, then
the code could cooperatively switch conditionally based on the flag state.

> At a guess: is your whole OS currently single threaded (like DOS)?

Did you even need to ask? :-) It's very basic and incomplete. There is no
task or process switching, except interrupts and procedure calls, if those
qualify. There is no ability to execute non-kernel code. It's PL0, flat,
non-paging.

> Either the stack is unbounded or it has a limit. Allowing the thread
> to specify its stack size should give a guarantee of address space
> available.

Ah, assuming the programmer, compiler, user, or thread is smart, or
sufficiently smart to correctly determine that.

I've got no idea how many nested levels of procedure calls there are in C
code of mine. I'd have to profile it to determine a realistic number. It
could be 3. It could be 50000. I.e., if it was my OS and my code, I could
make a guess, but I'd have no idea whether that guess was way to the left,
way to the right, just about right, etc. I could tweak it until most stuff
seems to work. But, then I could predict that as soon as I started to run
something more intensive, I'll be tweaking again.

> I presume that means adding Grub's multiboot header

Assembly setup and MB header for DJGPP, yes. DJGPP needs the a.out "kludge"
settings. OpenWatcom produces ELF which Grub boots without need of a
header.

> Can you add the multiboot header
> as part of the code segment

Supposedly, one can use a linker script to control how the objects are
combined. I.e., the linker script should allow one to direct the MB header
to be placed at the start of the file.

I don't know if you recall, but I was trying to get by without using linker
scripts, since I was using two compilers. I wanted the compiler to put the
header where it needed to be without extra complexity. I had problems
getting the header below 8k. But, that was without the assembly or entry
point. That might be the critical difference.

I know that the small sized example I tested works. I don't know if the
much larger OS will end up having the header down low or not after linking,
since I didn't get it to build. The linker might put it somewhere else. My
thoughts were that since the assembly code with the MB header contains the
entry address that the linker(s) would likely place the code low.

I previously tried to control the placement by alphabetical naming as
suggested on a few forums. That didn't work. My final two solutions, if
the other stuff doesn't work, are 1) stub program 2) linker script.

> ... - even as assembler declared storage


> embedded in the C if necessary?

? Even with the first part of the sent., I'm not sure what you're asking.

> It seems it only needs to be in the
> first 8k.

Yes.

> Using Grub is only one step back from using DOS.

...

> You still need to
> cope with Grub's way of doing things.

How is that any different than some other bootloader?

If I were to complete some of my assembly code and create a bootloader, I'd
basically have two flavors: simplest and safest. Either way, the OS the
bootloader is attempting to start has to cope with what was done before it
was started.

The only real issue I've seen so far with Multi-boot is how to get the MB
header below 8k with C code. With a small assembly file it's easy, just put
it where you want it. For C, you can supposedly use a linker script to do
fix that.

> You still need to
> cope with Grub's way of doing things.

Multi-boot's startup state is pretty clean:

32-bit PM
CS code selector
DS=ES=FS=GS=SS data selector
base 0
flat memory
4GB range
EAX is set to a magic value
EBX is set to point to a struct
A20 enabled
CR0.PG clear (no paging)
CR0.PE set (PM)
EFLAGS IF and VM clear
OS must setup new GDT, an IDT, and ESP (and/or stack)
no modifications to the IVT and PIC's

> Though when your kernel starts
> you do at least have a bare machine, AIUI, without DOS having hooked
> interrupt routines and programmed hardware.

The code is 32-bit, i.e., no need for 16-bit interrupt routines, i.e., IVT,
after a certain point. If I get it to boot via Grub, then I'll get to see
if I missed something, like PIC or EOI etc, that DOS could've touched. I
reprogram everything that I've implemented so far. Hopefully, that's a
decent hardware bug filter. I know that I missed a register in my video
code, etc.

> I'm surprised you have no way to include compiler libraries when
> manually linking. Are there no command line options on your linker
> which allow that?

Uh, let's make sure were on the same page. If I ever get my compiler to
work, I've got no method to link. At this point, it'd be cut-n-paste text
together into a single file for an assembler... By "manually", I meant
linking with DJGPP's or OpenWatcom's linker but compiling the C code and
linking via separate commands. E.g., instead of a single command
(simplified):

gcc -o OS.exe OS.c

"manually":

gcc -c OS.c
ld -o blah blah blah

DJGPP (GCC, LD) has a flag to pass the directory of a library and another to
pass the library name. But, I don't know which library/libraries. And, the
path length with all the other compiler stuff was too long for the command
line. It might be fine in a .bat. OpenWatcom's linker uses an odd
scripting language I haven't been able to locate good info on.

> It's interesting that you have managed to call
> standard C library functions.

I'm not using my compiler to compile my OS yet. My OS still needs the DJGPP
and/or OpenWatcom compilers until I rewrite it a few times. The compilers
DJGPP and OpenWatcom are linking my OS to their C libraries. My OS, since I
was starting it from DOS, was compiled as a normal 32-bit DPMI executable.
For DJGPP and OW, the compilers use default settings to pull in the standard
C libraries for linking an executable. When you compile "manually", the
libraries aren't automatically included. It's like declaring no standard
libraries or free standing.

> IIRC Linux calls specialised versions
> such as printk or similar because it has no access to clib. That comes
> later.

I'm currently using a few C library functions which are independent of the
OS.
...

Ok, I'm using 17. 2 aren't needed - used in debugging. 2 are in, out to
ports. 2 are compiler's specific version of cli, sti. (OpenWatcom clears
direction flag...) 5 are string functions: 4 simple + sprintf. 6 are for
DJGPP related C code and/or DOS startup. That probably means 3 or so for OW
compiler. I'll have to review those. A few will get removed for
Multi-boot. The OS might be dependent on the others... :-( I may need to
code equivalents.


Rod Pemberton


James Harris

unread,
Jan 20, 2010, 2:57:15 PM1/20/10
to
On 20 Jan, 16:56, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:

...

> > Either single or mutliple cores. For example, what about the call


> > chain? Say function A called B called C. The return addresses (and
> > normally activation records too) would be on the stack. If you preempt
> > the process at this point and switch to another process what happens?
> > Assume the second process tries to return from a called function. Does
> > it return to process 1's B routine?
>
> Yes, you're correct.  Preemptive switching could mess up the control flow.
> A possible fix might be to mix preemptive and cooperative multi-tasking.
> I.e., the pre-emptive interrupt could set a flag and quickly return, then
> the code could cooperatively switch conditionally based on the flag state.

Rather than do that I think it would be simpler to implement different
stacks but ... just on your idea of a flag I have a similar plan which
would make task switching more efficient. It's a bit off topic but I
write it here purely for interest.

Imagine there are two status words, both normally zero and both
accessible only to privileged code. The idea is that a privileged
routine can set the first word to say it should not be timesliced. For
example, it might set the word if it is handling a system call and the
call will terminate soon. Of course, if the entire syscall is known to
be short it can set the word as soon as it starts. Then when the
timeslice expires, if the scheduler sees this word set it doesn't
invoke reschedule but sets the other word saying a timeslice is
pending and goes back to the privileged procedure. Finally, the
privileged routine before going back to user mode checks the second
word. If set the task switch is taken at this point. If not set it
just goes back to the caller as normal. In either case the two words
end up back at zero.

The main idea is to stop a task switch happening while in a privileged
procedure that would otherwise require a lock but it should also cut
down on the number of context switches. It's a bit like disabling the
task-switch function while allowing other interrupts to continue to
happen.

A bit off topic I know but your comment on the flag reminded me of it.

> > Either the stack is unbounded or it has a limit. Allowing the thread
> > to specify its stack size should give a guarantee of address space
> > available.
>
> Ah, assuming the programmer, compiler, user, or thread is smart, or
> sufficiently smart to correctly determine that.
>
> I've got no idea how many nested levels of procedure calls there are in C
> code of mine.  I'd have to profile it to determine a realistic number.  It
> could be 3.  It could be 50000.  I.e., if it was my OS and my code, I could
> make a guess, but I'd have no idea whether that guess was way to the left,
> way to the right, just about right, etc.  I could tweak it until most stuff
> seems to work.  But, then I could predict that as soon as I started to run
> something more intensive, I'll be tweaking again.

Perfectly valid points. I don't know, either. How can a programmer can
be expected to size a stack in the general case? All I know is it must
be done. There's no way to let an arbitrary number of stacks (one per
thread) share an address space without placing them at certain points
in that space. And they can't normally be moved around once started.
This is less of an issue with a 64-bit address space but it could be
an issue with 32-bit addressing.

I would say it's not a case of "tweaking." I expect a stack will be
4k, 64k, 1M, or 16M etc. Unless address space is very tight there's no
need to be too precise. Just make sure it is plenty big enough.

It might be a good ideal to keep high water marks for stacks and log
or otherwise record them on thread exit. Page granularity should be
enough.

>
> > I presume that means adding Grub's multiboot header
>
> Assembly setup and MB header for DJGPP, yes.  DJGPP needs the a.out "kludge"
> settings.  OpenWatcom produces ELF which Grub boots without need of a
> header.
>
> > Can you add the multiboot header
> > as part of the code segment

...


> > ... - even as assembler declared storage
> > embedded in the C if necessary?
>
> ?  Even with the first part of the sent., I'm not sure what you're asking.

Yes, I wasn't too clear. I was thinking that if you need the header to
be in the bottom 8k and you are using a compiler that puts the code
section (.text) early in the object file you might be able to code
something like the following at the start of your kernel C source
file.

/* Set up multiboot header */
asm {
jmp code_start
dd 0x1BAD_B002 #Magic number
dd 0x0001_1000 #Flags
dd 0xffff_ffff #Checksum
...etc...
code_start: nop
}
/* Multiboot header set up, now begin the code */

I'm not sure if you could populate the entries directly. You might
need some asm code in there as well to be the initial entry point but
the idea is that this sits in the first 8k of the linked file.

>
> > It seems it only needs to be in the
> > first 8k.
>
> Yes.
>
> > Using Grub is only one step back from using DOS.
>
> ...
>
> > You still need to
> > cope with Grub's way of doing things.
>
> How is that any different than some other bootloader?

Well, you've got the mutliboot header to set up (and place in the
first 8k - which may not be easy). By contrast a simple bootloader
might load a flat binary and jump to the first byte.

Grub will also occupy some of the memory with GDT, IDT etc; and,
possibly, some of its own code may still be relevant or in use such as
PMode interrupt handlers. In real mode the only memory in use will be
the real mode IVT and the BDA at the bottom and the EBDA and ACPI
tables at the top of the first 640k ... I think.

There are advantages to Grub such as enabling A20 - hopefully
reliably. Personally I'd rather do that myself so I can log details of
what didn't work, what worked and how long they took. Grub can switch
to PMode but that in itself is very easy.

It's handy that Grub allows the whole file to be in 32-bit mode
especially if it can all be written in a high level language - though
some asm may still be required.

If you get Grub to start you in PMode you don't get a chance to obtain
some info that could have been obtained from 16-bit BIOS calls. There
are some things in there that are worth logging at least.

You have to work with Grub's boot information structure. There are a
lot of optional fields in there possibly leading to more complex code.
What do you do on a machine which doesn't have the fields you expect?

By contrast a simple loader requires the programmer to get information
as defined in RBIL. I guess RBIL is much more complex.

Grub's memory map, if present, "is guaranteed to list all standard ram
that should be available for normal use." But what about ACPI reclaim
areas (type 3)?

...

> Multi-boot's startup state is pretty clean:

OK

...

> > I'm surprised you have no way to include compiler libraries when
> > manually linking. Are there no command line options on your linker
> > which allow that?
>
> Uh, let's make sure were on the same page.  If I ever get my compiler to
> work, I've got no method to link.  At this point, it'd be cut-n-paste text
> together into a single file for an assembler...  By "manually", I meant
> linking with DJGPP's or OpenWatcom's linker but compiling the C code and
> linking via separate commands.  E.g., instead of a single command
> (simplified):
>
> gcc -o OS.exe OS.c
>
> "manually":
>
> gcc  -c OS.c
> ld -o blah blah blah
>
> DJGPP (GCC, LD) has a flag to pass the directory of a library and another to
> pass the library name.  But, I don't know which library/libraries.

Maybe someone else will know....

>  And, the
> path length with all the other compiler stuff was too long for the command
> line.  It might be fine in a .bat.  OpenWatcom's linker uses an odd
> scripting language I haven't been able to locate good info on.
>
> > It's interesting that you have managed to call
> > standard C library functions.
>
> I'm not using my compiler to compile my OS yet.  My OS still needs the DJGPP
> and/or OpenWatcom compilers until I rewrite it a few times.  The compilers
> DJGPP and OpenWatcom are linking my OS to their C libraries.  My OS, since I
> was starting it from DOS, was compiled as a normal 32-bit DPMI executable.
> For DJGPP and OW, the compilers use default settings to pull in the standard
> C libraries for linking an executable.  When you compile "manually", the
> libraries aren't automatically included.  It's like declaring no standard
> libraries or free standing.

OK

>
> > IIRC Linux calls specialised versions
> > such as printk or similar because it has no access to clib. That comes
> > later.
>
> I'm currently using a few C library functions which are independent of the
> OS.
> ...
>
> Ok, I'm using 17.  2 aren't needed - used in debugging.  2 are in, out to
> ports.  2 are compiler's specific version of cli, sti.  (OpenWatcom clears
> direction flag...)  5 are string functions: 4 simple + sprintf.  6 are for
> DJGPP related C code and/or DOS startup.  That probably means 3 or so for OW
> compiler.  I'll have to review those.  A few will get removed for
> Multi-boot.  The OS might be dependent on the others...  :-(  I may need to
> code equivalents.

OK

James

Rod Pemberton

unread,
Jan 21, 2010, 5:03:05 AM1/21/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:90d9de57-801d-4e32...@r19g2000yqb.googlegroups.com...

On 20 Jan, 16:56, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
>
> I'm not sure if you could populate the entries directly.

That is one of the reasons why I'm now using an assembly header. IIRC, one
of the compilers exported some values, while the other one didn't.

> > > You still need to
> > > cope with Grub's way of doing things.
> >
> > How is that any different than some other bootloader?
>
> Well, you've got the mutliboot header to set up (and place in the
> first 8k - which may not be easy). By contrast a simple bootloader
> might load a flat binary and jump to the first byte.

.com's are nice for startup. But, my OS is larger than a .com. Also, I'm
not to interested in coding a bootloader, although I've written some partial
code. I'd rather not have to completely code an entire bootloader in
addition to my other projects. Multi-boot and implementations like Grub,
Grub4DOS, and MBLOAD, are readily available. If there is another widely
available OS boot protocol, I'd look into that also. I'm not even sure if
Grub is still being used. Linux might be using something else today.

> Grub will also occupy some of the memory with GDT, IDT etc;

Which is no longer executing or needed, after it transfers control to your
OS...

> and,
> possibly, some of its own code may still be relevant or in use such as
> PMode interrupt handlers.

Of course, your OS won't be calling PM interrupts in the bootloaders IDT,
will it? And, your OS should immediately setup it's own IDT...

> In real mode the only memory in use will be
> the real mode IVT and the BDA at the bottom and the EBDA and ACPI
> tables at the top of the first 640k ... I think.

I'm not sure about ACPI. I *really* haven't looked at the more "modern"
standards, except some ATA/ATAPI. E.g., I'm using PIC's, not APIC's yet.
I've got to keep things to a minimum, and that's becoming more and more
important to me.

> There are advantages to Grub such as enabling A20 - hopefully
> reliably.

Good point. I haven't looked. It's outside my current "scope" of activity.
Given the problems with A20, perhaps we should...

> Personally I'd rather do that myself

I agree.

I've got no problem writing C code for A20, except that it'd be 32-bit code
because of the DJGPP compiler, when it needs to be 16-bit code.

> so I can log details of
> what didn't work

Personally, I prefer no messages upon bootup. I like DOS' lack of messages.
Linux's (any Unix...) bootup messages really annoy me. I don't have a
problem with Linux saving log's that I can look at later, at my leisure and
discretion. It's useful with Linux for finding issues since it never seems
to boot identically twice... As for my OS, I don't want to be writing lots
of text to the screen. IMO, it's a big slowdown even on high end video
cards. I'm also not interested in saving log's. I'd prefer that the code
just works. I don't want logs. I prefer not look at them. It's extra
complexity in the OS. It also requires a few things I've found to cause
problems with other OSes: requires writing to boot device, requires
sufficient space on boot device to store logs, overwrites recently deleted
tracks preventing undeletes or recovery of recently deleted files, etc. I
want to be able to boot without the boot device or it's data being modified.
(Deja vu... I must've mentioned that previously.)

> what worked and how long they took.

Not interested. I'm assuming that it works, since I coded and tested it.
Not the best decision, but I'm only one person. I'm striving for some
"fault tolerance" in my OS so that if there is a missing device or problem,
I can still boot and run the OS. If I can't boot, I can't fix the problem.
The machine must boot even if it's a "bad" boot. OSes that can't boot
because a file is missing or corrupt, or fail to boot or crash because they
can't communicate with a device, annoy me to no end. Let me boot!
Unfortunately, one usually needs to boot to fix the problem with such OSes.

> If you get Grub to start you in PMode you don't get a chance to obtain
> some info that could have been obtained from 16-bit BIOS calls.

True. Some BIOSes support that 32-bit interface, "BIOS 32 Service
Directory"... (I always forget about that.) Otherwise, v86 mode or x86
emulation will be needed, perhaps for video calls.

> There
> are some things in there that are worth logging at least.

If you recall, my OS doesn't determine available memory. That's one thing I
may need from real-mode. The Multi-boot structure *can* pass memory
information. If Grub is used, I'm assuming it will. If some other
Multi-boot loader is used, it doesn't have to pass a filled in structure.

> You have to work with Grub's boot information structure. There are a
> lot of optional fields in there possibly leading to more complex code.

My code doesn't use it, yet. But, I could use and will need some of the
information, especially memory sizing and ranges. Of the methods of memory
sizing I've seen, the only non-BIOS solution I'm aware of is memory probing,
which supposedly is problematic with PCI memory mapped devices and/or APIC.
If there is a way around this, I haven't seen it or figured it out yet. Of
course, I haven't tried to read any spec.'s on APIC or PCI yet.

> What do you do on a machine which doesn't have the fields you expect?

At this point, I don't expect any... I wouldn't mind having the memory
fields, but AFAIR, almost nothing is required for the Multi-boot header,
depending on the flags.

> Grub's memory map, if present, "is guaranteed to list all standard ram
> that should be available for normal use." But what about ACPI reclaim
> areas (type 3)?

I'm aware of, know what it's for, have spec.'s for, but not familiar with OS
related issues and programming of ACPI...


Rod Pemberton

James Harris

unread,
Jan 21, 2010, 6:17:40 AM1/21/10
to
On 21 Jan, 10:03, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:

...

> > I'm not sure if you could populate the entries directly.


>
> That is one of the reasons why I'm now using an assembly header.  IIRC, one
> of the compilers exported some values, while the other one didn't.

Having looked at the multiboot header in more detail it looks like you
might be able to do it all in the header. What follows is untested but
is intended to be complete.

To be clear, as long as you can embed assembler in your kernel's C
code the intention is to do away with the external assembler file.
Everything should be in C.


----------

/* Dip into assembler to set up the multiboot header */
asm {
jmp code_start

MY_CODE_BASE = 1 << 20 #Where to load

#
#Set up multiboot header (MBH) values
#

#MBH identification
MBH_MAGIC equ 0x1BAD_B002
MBH_FLAGS equ 0x0001_0002
MBH_CHECKSUM equ 0 - MBH_MAGIC - MBH_FLAGS

#Where and what to load
MBH_HEADER_ADDR equ MY_CODE_BASE
MBH_LOAD_ADDR equ MY_CODE_BASE
MBH_LOAD_END_ADDR equ 0 #Load until end of file
MBH_BSS_END_ADDR equ 0 #Don't zero a bss
MBH_ENTRY_ADDR equ code_start

#Video parameters
MBH_MODE_TYPE equ 1 #Video mode text
MBH_WIDTH equ 0 #No preference
MBH_HEIGHT equ 0 #No preference
MBH_DEPTH equ 0 #No preference

#The MBH_FLAGS value, above, says
# bit 16: use the load addresses herein
# bit 1: fill in the mem_ values in info structure


#Define the multiboot header
dd MBH_MAGIC
dd MBH_FLAGS
dd MBH_CHECKSUM
dd MBH_HEADER_ADDR
dd MBH_LOAD_ADDR
dd MBH_LOAD_END_ADDR
dd MBH_BSS_END_ADDR
dd MBH_ENTRY_ADDR
dd MBH_MODE_TYPE
dd MBH_WIDTH
dd MBH_HEIGHT
dd MBH_DEPTH

code_start: nop
times 31 nop #Visual info in a memory dump

#
# Execution begins here
#
# C code should have been compiled to be
# position independent. If not possible
# some assembly-written relocation may be
# needed here.
#
# If PIC code is possible just start.
#

jmp _os_start

align 32 #Alignment for start of C code, if desired
}

os_start: /* OS C code begins here */

----------


Any help? If it works there will be no issues with linking or scripts.


James

Rod Pemberton

unread,
Jan 21, 2010, 6:52:49 AM1/21/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:5caa07fe-3e9b-43b4...@c34g2000yqn.googlegroups.com...

On 21 Jan, 10:03, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
> > > I'm not sure if you could populate the entries directly.
> >
> > That is one of the reasons why I'm now using an assembly header. IIRC,
one
> of the compilers exported some values, while the other one didn't.
>
> Having looked at the multiboot header in more detail it looks like you
> might be able to do it all in the header.

For DOS, the executables are for DPMI. That means there is a large block of
startup and setup code prior to the C code. IIRC, I couldn't get the header
to be stored within the first 8k.

> To be clear, as long as you can embed assembler in your kernel's C
> code the intention is to do away with the external assembler file.
> Everything should be in C.

Yes, for certain environments, that could/should work.


Rod Pemberton


James Harris

unread,
Jan 21, 2010, 8:42:03 AM1/21/10
to
On 21 Jan, 10:03, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:

...

> > > > You still need to


> > > > cope with Grub's way of doing things.
>
> > > How is that any different than some other bootloader?
>
> > Well, you've got the mutliboot header to set up (and place in the
> > first 8k - which may not be easy). By contrast a simple bootloader
> > might load a flat binary and jump to the first byte.
>
> .com's are nice for startup.  But, my OS is larger than a .com.  Also, I'm
> not to interested in coding a bootloader, although I've written some partial
> code.  I'd rather not have to completely code an entire bootloader in
> addition to my other projects.  Multi-boot and implementations like Grub,
> Grub4DOS, and MBLOAD, are readily available.  If there is another widely
> available OS boot protocol, I'd look into that also.  I'm not even sure if
> Grub is still being used.  Linux might be using something else today.

You don't need to write a .com to start at the first byte. All you
need is a flat binary - i.e. without a header. How to get that? You
can generate one from an assembler for sure. Not sure if you can
compile to get one. Maybe that depends on your linker. Something like
a sys file target should do. Or maybe no linker is needed. That would
be good as linkers are often OS-specific.

Any other ways if your linker won't play ball?

1. After linking concatenate the compiled and linked file on to the
back of an assembler stub.

copy /b asm_stub + linked_file kernel_image

2. Nasm has an INCBIN directive. Maybe other assemblers have similar.
It can pull a binary file such as your C executable directly into code
being assembled:

bits 32
cpu 386

#... startup code here ...

align 64
incbin "compiled and linked C kernel file"


> > Grub will also occupy some of the memory with GDT, IDT etc;
>
> Which is no longer executing or needed, after it transfers control to your
> OS...
>
> > and,
> > possibly, some of its own code may still be relevant or in use such as
> > PMode interrupt handlers.
>
> Of course, your OS won't be calling PM interrupts in the bootloaders IDT,
> will it?  And, your OS should immediately setup it's own IDT...


I accept both your points as long as interrupts and NMI are disabled
before you write outside the load module's addresses.

>
> > In real mode the only memory in use will be
> > the real mode IVT and the BDA at the bottom and the EBDA and ACPI
> > tables at the top of the first 640k ... I think.
>
> I'm not sure about ACPI.  I *really* haven't looked at the more "modern"
> standards, except some ATA/ATAPI.   E.g., I'm using PIC's, not APIC's yet.
> I've got to keep things to a minimum, and that's becoming more and more
> important to me.

No need to worry about it (in this context). IIRC ACPI tables are
above the conventional memory top returned by int 0x12. My point was
that as long as real-mode code avoids the known (or nearly known - the
exact byte address has been a long term source of discussion) low
memory values and keeps beneath the int 0x12 upper limit the code will
be safe from overwriting anything active.

> > There are advantages to Grub such as enabling A20 - hopefully
> > reliably.
>
> Good point.  I haven't looked.  It's outside my current "scope" of activity.
> Given the problems with A20, perhaps we should...

I've been looking at the A20 info you posted on another thread and
into other information on A20. I think this is something to come back
to as a separate topic.

> > Personally I'd rather do that myself
>
> I agree.
>
> I've got no problem writing C code for A20, except that it'd be 32-bit code
> because of the DJGPP compiler, when it needs to be 16-bit code.

I'll make some comments below where you and I seem to see things
differently but don't be offended. I think the differences are
interesting and worth exploring but I'm not saying you should think as
I do!


>
> > so I can log details of
> > what didn't work
>
> Personally, I prefer no messages upon bootup.  I like DOS' lack of messages.
> Linux's (any Unix...) bootup messages really annoy me.   I don't have a
> problem with Linux saving log's that I can look at later, at my leisure and
> discretion.  It's useful with Linux for finding issues since it never seems
> to boot identically twice...  As for my OS, I don't want to be writing lots
> of text to the screen.

When I said to log details I meant just that. They don't need to
appear on screen though it can be helpful if some of them do.

>  IMO, it's a big slowdown even on high end video
> cards.

I can't see this. How can writing text to the screen be a big
slowdown?

>  I'm also not interested in saving log's.  I'd prefer that the code
> just works.  I don't want logs.  I prefer not look at them.  It's extra
> complexity in the OS.

If you are always running on the same machine I can see that logs are
not needed. Once you solve a problem it's fixed.

If you want your OS to run on different machines, however, logs can
record what the OS found as it started up: devices discovered, tests
that failed, time taken for devices to respond, devices that responded
but not as expected, and other information. These can be invaluable
both when developing and if something goes wrong later.

>  It also requires a few things I've found to cause
> problems with other OSes: requires writing to boot device,

It is not necessary to write to the boot device. I currently write the
logs to an in-memory buffer (because I don't have working disk
drivers). A lot of useful text can be placed in a little bit of
memory. Once I can write to disk I will do so but it doesn't have to
be the boot device.

> requires
> sufficient space on boot device to store logs, overwrites recently deleted
> tracks preventing undeletes or recovery of recently deleted files, etc.  I
> want to be able to boot without the boot device or it's data being modified.
> (Deja vu...  I must've mentioned that previously.)

I agree. I don't necessarily want to modify the boot device.

>
> > what worked and how long they took.
>
> Not interested.  I'm assuming that it works, since I coded and tested it.
> Not the best decision, but I'm only one person.

If you are the only user and the OS will only ever run on one machine
I can see your viewpoint. Even then *I* would still prefer logs that I
can refer back to but I can see where you are coming from.

If you ever want to run on another machine reports can be essential.

As a small example, remember the Toshiba Tecra problem Linux had. AIUI
the keyboard controller was slow and Linux didn't wait long enough -
and didn't check for success. Yet rather than noticing and reporting
Linux just carried on. Apart from checking and not just assuming it
had waited long enough I would much rather record in the logs how long
the keyboard controller took and what responses were received. This
can be especially important for A20 since there are reputed to be many
variants. On bootup I would like to see metrics for all such
transactions (but not reports on each time through a loop: one report
per test type is enough).

Information in logs can be extracted and collated later. Imagine
having a database of machine type against A20 response. The more
machine types were in there the more useful it would be for us to know
what certain machines require.

Admittedly I may be a bit of an information freak. I like to record
information as I do. For example, once running I have some code to
tally the logarithm of how long certain tasks take. The idea is to
allow me to look at some in-memory arrays to see the distribution of
how long different functions took.

Basically, I'd rather *see* what's going on than guess.

>  I'm striving for some
> "fault tolerance" in my OS so that if there is a missing device or problem,
> I can still boot and run the OS.  If I can't boot, I can't fix the problem.

True. But if you have a bad boot where do you go to find out what was
bad about the boot? If you have no logs I suppose you could have a
separate diagnostic program which runs through the same steps as the
boot code, but why? Surely it's better to see what went wrong at the
time and it's far less effort.

> The machine must boot even if it's a "bad" boot.  OSes that can't boot
> because a file is missing or corrupt, or fail to boot or crash because they
> can't communicate with a device, annoy me to no end.

Me too. Getting an OS to boot what it can is essential. Just as
important is conveying information about what went wrong, if anything.
Like any program the OS should not just stop or fail without saying
why.

One thing that can help to make the OS reliable is to place all the
essential functions in one file. As long as the file gets loaded the
OS can run and communicate. (There should also be diagnostics if the
file doesn't load.)

...

> > There
> > are some things in there that are worth logging at least.
>
> If you recall, my OS doesn't determine available memory.  That's one thing I
> may need from real-mode.  The Multi-boot structure *can* pass memory
> information.  If Grub is used, I'm assuming it will.  If some other
> Multi-boot loader is used, it doesn't have to pass a filled in structure.

It's a good idea to know what memory is free before writing to it -
i.e. anything outside your load module. As you know int 0x15-e820 is
ideal for that - falling back to other methods as needed.

>
> > You have to work with Grub's boot information structure. There are a
> > lot of optional fields in there possibly leading to more complex code.
>
> My code doesn't use it, yet.  But, I could use and will need some of the
> information, especially memory sizing and ranges.  Of the methods of memory
> sizing I've seen, the only non-BIOS solution I'm aware of is memory probing,
> which supposedly is problematic with PCI memory mapped devices and/or APIC.
> If there is a way around this, I haven't seen it or figured it out yet.  Of
> course, I haven't tried to read any spec.'s on APIC or PCI yet.

Yes, probing memory is not good and not necessary these days. E820 or,
perhaps, Grub's info structure is really the only way.

James

James Harris

unread,
Jan 21, 2010, 8:53:14 AM1/21/10
to
On 21 Jan, 11:52, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:

...

> For DOS, the executables are for DPMI.  That means there is a large block of


> startup and setup code prior to the C code.  IIRC, I couldn't get the header
> to be stored within the first 8k.

DPMI? ISTM that all that's needed is 32-bit code. We can create our
own environment. Are you loading DOS and making DOS calls from within
your kernel?

Could you do away with DPMI and DOS stuff?

James

Rod Pemberton

unread,
Jan 22, 2010, 4:31:59 AM1/22/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:a25ea72f-41b1-4e0b...@c34g2000yqn.googlegroups.com...

On 21 Jan, 11:52, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
> >
> > DPMI? ISTM that all that's needed is 32-bit code.

You need a _method_ to 1) load your OS, and 2) transfer execution from some
executing 32-bit code to the 32-bit code in your OS. A bootloader is the
usual method for both. Instead of using a bootloader, I used DOS. It loads
and transfers execution to DOS executables. My OS is a DOS executable.
(Alternately, I'm attempting Multi-boot builds, which aren't.) DOS
executables using DPMI setup 32-bit PM and transfer execution to compiled
32-bit C code. Once my code is executing with DOS and DPMI, I don't want
them around. So, I do a few things to gain total execution control (below)
and dispense with DOS and DPMI.

> > Are you loading DOS and making DOS calls from within
> > your kernel?

Ok, I explained this to someone a while back. I don't recall offhand if
that was you...

I start DOS. I load a TSR. I run my OS. My OS is compiled as a DOS DPMI
executable. DOS loads and executes the executable. The executable has
16-bit startup code and 32-bit application code. The 16-bit startup loads a
DPMI host which is 32-bit in this case. The startup using DPMI puts the cpu
into 32-bit mode. Then it calls my C main() also 32-bit. This is the way a
32-bit DPMI application normally works (approximately). At this point, I
cause the DPMI host to exit, that returns execution to 16-bit mode.
Normally, that would also cause the application to exit to DOS. The TSR
traps the exit. Then, the TSR does the same stuff as a bootloader. It
set's up A20, GDT, switches from RM to PM, etc. TSR does OS specific stuff.
Once 32-bit PM is properly setup, it transfers control to back to 32-bit C
code. From that point on, there is no DOS, DPMI calls, etc. GDT, IDT,
PIC's, video, keyboard, mouse, etc. are setup.

> > Could you do away with DPMI and DOS stuff?

There is no DOS or DPMI in the main OS. The application load and initial
application startup uses some DPMI. While that is a part of the OS, it's
not really a part of the OS proper.

If I did, I couldn't start from DOS... DOS executables are what DOS C
compilers produce. Originally, I wanted it to be able to run DPMI
applications produced by the DJGPP and OpenWatcom compilers. Yes, I think
the direction of my OS is moving away from DOS. I'm hoping my compiler will
be completed sometime. At which time, I won't need DOS application
compatibility to work with the compiler's output.


Rod Pemberton


Rod Pemberton

unread,
Jan 22, 2010, 4:32:17 AM1/22/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:7297b486-f1ab-48cf...@l30g2000yqb.googlegroups.com...

On 21 Jan, 10:03, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
> >
> > IMO, it's a big slowdown even on high end video
> > cards.
>
> I can't see this. How can writing text to the screen be a big
> slowdown?

Try displaying a large amount of text and then try sending a large amount of
text to a file. For DOS and Win98 console, there is a huge difference in
speed. Now that I've got a Linux system, maybe I should check it too.

> If you want your OS to run on different machines, however, logs can
> record what the OS found as it started up: devices discovered, tests
> that failed, time taken for devices to respond, devices that responded
> but not as expected, and other information. These can be invaluable
> both when developing and if something goes wrong later.

Ben Lunt's OS tests many things.

> As a small example, remember the Toshiba Tecra problem Linux had.

Never heard of it.

> AIUI
> the keyboard controller was slow and Linux didn't wait long enough -
> and didn't check for success.

I've got one old keyboard with a really slow controller. It took me a while
to figure out why I was experiencing A20 issues (with DOS).

> Yet rather than noticing and reporting
> Linux just carried on.

Excellent, in that it continued. Not excellent in that it might've been a
critical failure and probably should've stopped. It depends on the OS load
location and OS size.

Linux was messing with A20... "For every reckless act, there is a price."

Isn't the bootup state of A20 what is desired for A20? Could they have
toggled it by accident... ?

> Admittedly I may be a bit of an information freak.

Nothing wrong with that. I'm attempting to temper total user control,
safety, and fault tolerance, with simplicity, practicality, usefulness,
responsiveness, and time constraints. (ROFL)

> I like to record
> information as I do.

...


> Basically, I'd rather *see* what's going on than guess.

I agree. I don't keep logs. For debugging, I do display a bunch of
information, mostly hex, on the screen. But, that isn't a part of the
normal build. It's pretty though. Lot's of blinky and changing characters.
;)

> For example, once running I have some code to
> tally the logarithm of how long certain tasks take. The idea is to
> allow me to look at some in-memory arrays to see the distribution of
> how long different functions took.

I decided to "get it working first". I figured I could work out trivial
bugs, poorly optimized code, bad design, etc. once the OS was
self-supporting.

> But if you have a bad boot where do you go to find out what was
> bad about the boot?

Primary solution: take a guess, disable code, recompile, repeat as needed.
Hopefully, the OS startup is or becomes simple and reliable enough that that
isn't an issue. (Hint to any Linux programmers who startup graphics modes
immediately after boot... use text mode.)

> One thing that can help to make the OS reliable is to place all the
> essential functions in one file.

My entire OS code is in one file. Well, that's almost true. There are a
few small pieces that aren't. At some point, it may need to be broken into
smaller files. But, I figured that separating the OS into many files would
interfere with me being able to restructure the code. Lots of that was, and
is still likely, to be done. So, I didn't want to create a "blind spot" by
not being able to follow the code structure among multiple files.

> It's a good idea to know what memory is free before writing to it -
> i.e. anything outside your load module.

It's all free. Any (RAM) memory "outside my load module", isn't part of the
OS... I.e., available for use. ;-)

Yeah, I know by "free" you meant that neither ROM memory nor a memory mapped
device is written to, and you weren't referring to all that RAM memory
outside the OS. :-)


Rod Pemberton


James Harris

unread,
Jan 22, 2010, 1:12:23 PM1/22/10
to
On 22 Jan, 09:31, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> > > DPMI? ISTM that all that's needed is 32-bit code.
>
> You need a _method_ to 1) load your OS, and 2) transfer execution from some
> executing 32-bit code to the 32-bit code in your OS.  A bootloader is the
> usual method for both.  Instead of using a bootloader, I used DOS.  It loads
> and transfers execution to DOS executables.  My OS is a DOS executable.
> (Alternately, I'm attempting Multi-boot builds, which aren't.)

Sure. I was talking *only* about Grub. When I asked if you could
include the offered asm data-and-code fragment at the beginning of
your C source, you lost me when you said that you had "a large block
of startup and setup code prior to the C code" apparently relevant to
DOS and DPMI. I couldn't work out why you needed *any* DOS/DPMI stuff
if you were going to load the OS from Grub. Having read it again I
think you may mean that this startup code is *not* needed under Grub
but that it's too much work to change the code to remove it. Anywhere
close?

Another possibility just struck me: the large block of startup and
setup code could be something added by the compiler or by the linker
and for some related reason it can't be easily removed?

No, I mean as part of the executable - i.e. could you not do away with
this other stuff in your executable code. You'll have probably already
answered this question by the time you get here in the post but at the
time of writing I don't understand why your kernel image still has
this other stuff in it when you are booting with Grub.

>
> If I did, I couldn't start from DOS.  DOS executables are what DOS C


> compilers produce.  Originally, I wanted it to be able to run DPMI
> applications produced by the DJGPP and OpenWatcom compilers.  Yes, I think
> the direction of my OS is moving away from DOS.  I'm hoping my compiler will
> be completed sometime.  At which time, I won't need DOS application
> compatibility to work with the compiler's output.

Given this final paragraph a final guess as to the root problem: is it
that these C compilers will not produce plain 32-bit code - they
always target something that requires a matching run-time library or
at least a C library which is OS-specific? ... Some minutes later
(sounds like a line from Batman!) No, that can't be it. I see from one
of your comments that OpenWatcom can produce Elf which Grub will run
directly.... Time passes... (What was that from? Hunt the Wumpus
wasn't it?) No, I'll go back to the previous thought. From other
comments you made I'm now thinking you would have to make substantial
changes to your OS for it to generate Elf.

I think you've explained parts of your setup a few times on Usenet.
Why not set up a web page with the info? You could just refer people
to that and save time and typing. Plus the web page could show the
whole picture - with graphics and animations of course...!

James

James Harris

unread,
Jan 22, 2010, 2:22:12 PM1/22/10
to
On 22 Jan, 09:32, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:

...

> > > IMO, it's a big slowdown even on high end video


> > > cards.
>
> > I can't see this. How can writing text to the screen be a big
> > slowdown?
>
> Try displaying a large amount of text and then try sending a large amount of
> text to a file.  For DOS and Win98 console, there is a huge difference in
> speed.  Now that I've got a Linux system, maybe I should check it too.

I've found Windows to be *abysmal* at scrolling text in a command
window. I don't know why it is so bad but with lots to scroll the CPU
is overloaded.

By contrast take Putty. Text can scroll extremely quickly in it with
hardly any effect on the CPU. So the slowness in writing to the screen
in a command window is probably a software issue.

...

> > As a small example, remember the Toshiba Tecra problem Linux had.
>
> Never heard of it.
>
> > AIUI
> > the keyboard controller was slow and Linux didn't wait long enough -
> > and didn't check for success.
>
> I've got one old keyboard with a really slow controller.  It took me a while
> to figure out why I was experiencing A20 issues (with DOS).

I see you've started a new thead on this. That's good. I'll reply on
that presently.

> > Yet rather than noticing and reporting
> > Linux just carried on.
>
> Excellent, in that it continued.  Not excellent in that it might've been a
> critical failure and probably should've stopped.  It depends on the OS load
> location and OS size.

I can't agree. A20 is too important. OS initialisation code should
stop if it cannot enable it and flag it up so the user knows there is
a problem.

As a general point there's a balance to be struck in responses to
errors. IMHO a program should try to *detect* all relevant errors.
Then it can take a sensible decision on what to do next: report it and
stop or record it and carry on.

For example, a potential problem I've seen in some kbc code is it may
not detect a fault. It loops waiting for one of the 0x64 bits to show
ready. If the kbc hasn't shown ready in 65,536 loops it carries on
regardless. IMHO it's far better to write code to detect the timeout
and make an appropriate response - even if that response is to say
that the timeout must be longer.

>
> Linux was messing with A20...  "For every reckless act, there is a price."
>
> Isn't the bootup state of A20 what is desired for A20?  Could they have
> toggled it by accident... ?

IIRC A20 is enabled by default in the kb controller. If so something
BIOS-like must disable it on boot. Then we have to enable it again!

...

> > But if you have a bad boot where do you go to find out what was
> > bad about the boot?
>
> Primary solution: take a guess, disable code, recompile, repeat as needed.

Could be an expensive solution - especially if you are trying to boot
past problems *and* not logging them which I believe you said you do.

> Hopefully, the OS startup is or becomes simple and reliable enough that that
> isn't an issue.  (Hint to any Linux programmers who startup graphics modes
> immediately after boot... use text mode.)

Yes, graphics standards are a whole big area. We have other
priorities. Was it Tanenbaum who said, "the nice thing about standards
is that there are so many to choose from"?

...

> > It's a good idea to know what memory is free before writing to it -
> > i.e. anything outside your load module.
>
> It's all free.  Any (RAM) memory "outside my load module", isn't part of the
> OS...  I.e., available for use.  ;-)

Not sure if you are serious given the smileys in this and next
paragraph but it sounds like you stay within the compiled space - at
least for now.

>
> Yeah, I know by "free" you meant that neither ROM memory nor a memory mapped
> device is written to, and you weren't referring to all that RAM memory
> outside the OS.  :-)

I meant all the RAM outside the compiled image. If your OS stays
within its compiled space (data + bss) then, yes.

Of course, there's always the stack. Out of interest, given that you
don't use paging and, AIUI, don't read a memory map on startup where
do you place the stack?

James

Rod Pemberton

unread,
Jan 23, 2010, 6:25:45 AM1/23/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:81ac70a0-e39f-4169...@p8g2000yqb.googlegroups.com...

> IMHO a program should try to *detect* all relevant errors.

IMHO a program should try to *not generate* any relevant errors.

> For example, a potential problem I've seen in some kbc code is it may
> not detect a fault. It loops waiting for one of the 0x64 bits to show
> ready. If the kbc hasn't shown ready in 65,536 loops it carries on
> regardless.

Oh, you'd *love* my current keyboard code. As I mentioned previously I
think, I have no check for a keyboard, although I have a comment with
intent. But, that's not the real kicker. The real kicker is that if the
keyboard code fails to operate properly, the OS will hang. Love that? The
while() loops for the keyboard controller, currently, have no method to
timeout, i.e., infinite. I'm aware of it. But, it's one of those "it'll
get done much later or when it presents a problem" issues.

> > > But if you have a bad boot where do you go to find out what was
> > > bad about the boot?
> >
> > Primary solution: take a guess, disable code, recompile, repeat as
> > needed.
>
> Could be an expensive solution - especially if you are trying to boot
> past problems *and* not logging them which I believe you said you do.

If it starts and runs from DOS, it should start and run from Grub/Multiboot.
I can't confirm yet, since I couldn't build. However, I think I "correct"
or "discard" everything DOS "touches". At present, I can work on it from
DOS instead of a boot.

> Not sure if you are serious given the smileys in this and next
> paragraph but it sounds like you stay within the compiled space - at
> least for now.

Yes, I "stay within the compiled space", but that's because I can't execute
anything except the kernel, and I have no memory map. I was just pointing
out that:

1) everything -almost- is RAM memory - not ROM or memory mapped devices
2) that the "compiled space" is where the OS is in (RAM) memory
3) that (almost) everything outside the "compiled space" can be used by the
OS - since the OS isn't locate there.

The current problem is I don't have a memory map of the non-RAM areas. For
Grub/Multiboot, it can provide the info. For the TSR startup, I've got no
method yet to determine memory ranges. I'd need to extend the TSR startup
for "int 0x15-e820" or the half-dozen others, or use v86, or an x86
emulator, or maybe figure out the BIOS32 stuff...

That give me an idea. It might be worthwhile to convert the TSR to be
Multiboot - with any needed enhancements so the TSR transfer still
functions. I could then test Multiboot entry code without a Multiboot
build. It might be easier to rework too. Currently, there are two startup
entry points. That would merge them.

> > > It's a good idea to know what memory is free before writing to it -

> > > i.e. anything outside your load module. As you know int 0x15-e820 is
> > > ideal for that - falling back to other methods as needed.
> >

> > [...]


>
> I meant all the RAM outside the compiled image.
>

All the RAM outside the compiled image is free for use immediately after
your OS is loaded. Either nothing has used it, or something has but you can
reuse it, and for the most part, it's RAM. "int 0x15-e820" is used to
determine non-RAM locations: ROM, memory mapped devices, voids. Yes? Well,
I look at it that way. Everything is RAM except that which is not, but that
which is not may be RAM buffered underneath... Once you know the address
ranges where non-RAM is and your OS is, you can start doling out "all the
RAM outisde the compiled image" and keeping track of it, i.e.,
memory-management, e.g., for stack or applications. (Cringe, deja vu
again...)

> If your OS stays
> within its compiled space (data + bss) then, yes.

Except for the OS, stack, and non-RAM, what's to worry about?

> Of course, there's always the stack. Out of interest, given that you
> don't use paging and, AIUI, don't read a memory map on startup where
> do you place the stack?

Did you ask me this before?

The current Grub/Multiboot attempt is 0x7c00. That was used for programming
convenience. The Multiboot header is setup to load OS at 1MB+. DOS TSR
startup: inherited stack location from DPMI host, i.e., IIRC, below 1MB
somewhere - while app is 1MB or above. IIRC, I needed a small amount of
stack data for the transfer, probably call/ret. I didn't bother to reset
the stack to a "safe" location. It should already be. They (OS and stack)
shouldn't collide. They are in the same locations as a DOS DPMI
application.

An OS "wants to be" located in unimpeded RAM. Post 386, that means above
1MB. So, RAM areas below 1MB are good for a stack, as long as sufficient
memory is available. 512k? 640k? The only "things" I can think of at the
moment that wouldn't be in "compiled space" are: non-RAM, kernel stack,
applications and/or their stacks. The kernel sets up the GDT and IDT
without any memory management available. So, they should be in "compiled
space".


Rod Pemberton


Rod Pemberton

unread,
Jan 23, 2010, 6:27:49 AM1/23/10
to

"James Harris" <james.h...@googlemail.com> wrote in message
news:e9d16ff3-eeb5-433e...@f12g2000yqn.googlegroups.com...

On 22 Jan, 09:31, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> I couldn't work out why you needed *any* DOS/DPMI stuff
> if you were going to load the OS from Grub.

I've been using the compilers with DPMI as a target for so long, that I
didn't think to look a what target the objects are for the Grub/Multiboot
builds I'm attempting. The ELF could be Linux... (kaboom!) If they don't
produce the same compiled assembly for a DPMI host, like for my TSR startup
method, I'll need to make some changes to the C code. There are some things
I need to look into on this. Remainder of response delayed for a while...


RP


Rod Pemberton

unread,
Jan 23, 2010, 7:00:21 AM1/23/10
to

"Rod Pemberton" <do_no...@havenone.cmm> wrote in message
news:hjemgr$g1f$1...@speranza.aioe.org...

In the delayed response, I started to mention that I wanted to have the OS
compiled as a single file: a DOS DPMI .exe, *and* have it startable by
*both* the Grub/Multiboot startup and my DOS TSR method. I still like the
all-in-one idea. As I mentioned to someone previously, I could compile in
the Grub/Multiboot header in C even if it is located above 8KB. Then, I
could write a stub application that locates the header, copies and prepends
it to the OS. However, my notes on the C version indicated I had some
problems. Those were 1) locating below 8KB 2) self-referenced data in
header 3) dynamic data in header. One or the other will fall into line
eventually.

Anyway, remainder of response delayed for a while until I look at the
obj's...


Rod Pemberton


Rod Pemberton

unread,
Jul 19, 2010, 6:15:42 AM7/19/10
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:e9d16ff3-eeb5-433e...@f12g2000yqn.googlegroups.com...

On 22 Jan, 09:31, "Rod Pemberton" <do_not_h...@havenone.cmm> wrote:
...
> I couldn't work out why you needed *any* DOS/DPMI stuff
> if you were going to load the OS from Grub.
>

I didn't reread the thread, too much to reread. This post was comprised of
multiple responses. I tried to clean things up. So, hopefully, there
aren't too many contradictions. Let's backtrack a
bit...

At the very beginning, I wasn't writing an OS. Originally, I had no need
for Grub. My original goal was to allow me to replace the DPMI host used
with a DOS DPMI .exe with my own DPMI host. I was, and currently I still
am, using two C compilers. Each C compiler has it's own and different DPMI
host. I wanted to use the same DPMI host for both compilers. I didn't want
to figure out how to insert my unwritten, future DPMI host into both C
compiler's compile chain. The logical conclusion from that is I needed to
"unload" the DPMI host after the .exe was loaded into memory. Eventually, I
figured out a method to restart the 32-bit application code while releasing
the compiler's DPMI host from memory. This uses a small TSR to
save/transfer a few registers. Unloading the DPMI host occurs after the DOS
executable has been loaded into memory and run. That allowed me to setup or
install my own DPMI host after the .exe was loaded into memory. But, I
never wrote my own DPMI host. Instead, that idea evolved into writing an OS
using that method to start my OS' code. A DOS DPMI .exe is comprised of a
16-bit crt0.o startup, 32-bit DPMI host, and a 32-bit application. To run
an OS, you just need to run the 32-bit application portion. You don't want
that other stuff. So, after "unloading" the DPMI host there isn't DOS DPMI
.exe in memory, but just the 32-bit application portion is left in memory,
perhaps crt0.o. DOS acts as a bootloader - actually a memory loader - for
the 32-bit application. I.e., I got rid of many of the code dependencies
(16-bit startup, DPMI, etc.) created by using a DOS DPMI compiler.

Obviously, if I was starting from scratch, I might choose a linker file,
only one C compiler, a compiler with support for creating host independent
code, etc. All the things most of you guys use GCC for...

To respond to your concerns: *in theory*, no DOS or DPMI code is needed
for a pure Grub or Multiboot start. This requires that the C code has zero
dependencies. Currently, I'm using two compilers written for DOS. I.e.,
it's likely some issues will arise since these compilers weren't designed to
produce OS code. I'm working on my own C-like compiler so I don't have to
deal with some of these problems. I can do what I want.

For the "historical" reasons above, my current OS startup doesn't work from
a bootloader. I like my original startup method and would like to keep it.
It's convenient. But, I'm trying to get Grub or Multiboot to work. One
goal while using the two compilers, was to have the OS compiled as a single
file: a DOS DPMI .exe. Another goal is to keep it startable by both the
Grub or Multiboot startup and DOS TSR method. Compiling the OS as an .exe
for those two compilers inserts the DPMI based CRT startup into the .exe.
That adds a block of data which makes it difficult to put the Grub startup
in the first 8KB. Grafting on an assembly based Grub or Multiboot startup
was a recent development. Well, I guess that's about a year old now... I
was originally trying to locate the header within the .exe but those
attempts failed due to the size of the DPMI based CRT startup. If I rewrite
it to eliminate or disable the DOS TSR startup method, it should be
possible, in theory, to not have any DOS/DPMI stuff. But, the DJGPP
compiler has a number of dependencies I've recently found which may bust
that theory. The OW (OpenWatcom) compiler seems to have fewer limitations.

Since the Grub or Multiboot builds don't build yet, I can't confirm if DJGPP
or OW will remove any or all of the DPMI code. DJGPP can build
non-executables, or DJGPP .exe's - which are stubbed with 16-bit DPMI host -
can be un-stubbed to leave a linked COFF executable. Using a COFF
executable may prove to eliminate a few issues. IMO, DJGPP should've used
COFF executables - without binding to DPMI and a 16-bit startup - in the
first place... DJGPP only supports one target: DOS. So, while I believe
DJGPP will remove DPMI based code for non-executables, I don't know for
sure. Recent experiments indicate many interdependencies between the DJGPP
C startup crt0.o and the C libraries. I.e., you basically cannot have one
without the other. OpenWatcom cannot build non-executables, except for one
object type. But, OpenWatcom can build executables for many targets. It
can build ELF, which shouldn't have DPMI. Will OW's ELF code be for DOS or
Linux syscall API? I don't know.

I won't have all the answers until I can make the OS even less dependent
than it is. Booting via Grub or Multiboot will probably expose a few of
those issues or others. I guess once I get them to build for Grub or
Multiboot, I can retry locating the header in the C code... Eventually,
I've decided I'd like to remove the OW specific code, mostly assembly. For
now, it's a good filter on a few ANSI C errors that GCC 3.x doesn't catch.


My OS does have dependencies and far more than I was aware of:

1) uses a handful of C library functions which are independent of the
host OS. I can remove these.
2) DJGPP libraries are dependent on linking with crt0.o - use of my Grub asm
doesn't work with the C libraries...
3) DJGPP libraries are dependent on correct setup of segment registers in
crt0.o for the C libraries - i.e., dependent on crt0.o or equivalent code...
4) compiled as .exe - DJGPP compiled code dependent on three variables in
crt0.o - i.e., dependent on crt0.o or equivalent code...
5) dependent on a DOS and BIOS based interrupt or syscall interface
e.g., C code eventually calls DOS Int 0x21 and BIOS Int 0x13 instead of
Linux Int 0x80 or my own syscall/interrupt interface
6) dependent on a C library which calls DOS and BIOS syscall interface
- I don't absolutely need anything from the C libraries, but was using OS
independent functions for speed and ease of programming.

Most of those are resolvable, I think, by either eliminating calls to C
library routines or having my own C or C-like compiler.


My next, lofty goals are to:

1) eliminate the C library routines and as many purely C dependencies as
possible. This step won't eliminate underlying implementation issues, like
segment setup.
2) Grub or Multiboot startup
3) hopefully preserve my DOS TSR startup method, but this is not a
requirement
4) take the OS as far as I can with the DOS compilers
5) try code on Linux - it won't have the correct syscall interface...
6) finish one of my compilers so I have control over:
6a) the compiler
6b) the interrupt or syscall interface the compiler generates code for
7) create my own syscall API
8) rework DOS and BIOS related interrupt routines for my syscall API
9) eliminate OW code...
10) convert DJGPP code to my compiler

I meant to write some IDE code and then FS code. Those go into the list
somewhere... It'd be nice to do them while the codebase can still be
compiled by DJGPP and OW.


Rod Pemberton

0 new messages