BSD2.9, sbrk() and signal()

Jonathan Harston

unread,

Jul 18, 2022, 4:46:13 PM7/18/22

to

Is there something special on BSD2.9 with sbrk() or signal()?
If I claim memory and stick the stack there, and set a signal,
when the signal is triggered everything blows up with the
signal dispatcher in the kernel unable to access memory to
stack to call the signal handler.

This doesn't happen if I leave the stack at the value on
entry at &FF00+xxx, and doesn't happen with Bell Unix v5/6/7.
I can't just leave the stack at &FFxx as it will descend
across I/O memory at &E000 upwards.

I've stripped my test code down to this:

ORG 0
EQUW &0107 ; magic number, also branch to code
EQUW _DATA%-_TEXT% ; size of text
EQUW _BSS%-_DATA% ; size of initialised data
EQUW _END%-_BSS% ; size of uninitialised data
EQUW &0000 ; size of symbol data
EQUW _ENTRY%-_TEXT% ; entry point
EQUW &0000 ; not used
EQUW &0001 ; no relocation info
;
ORG 0
_TEXT%:
_ENTRY%:
mov #1,r0
trap 4 ; write(stdout, "RUN"..)
equw msg1
equw msg2-msg1
;
trap 17 ; sbrk(&E000)
equw &E000
bcs quit ; failed
mov #1,r0
trap 4 ; write(stdout, "SBRK"..)
equw msg2
equw msg3-msg2
;
trap 48 ; signal(SIGQUIT, quit)
equw 3
equw quit
mov #1,r0
trap 4 ; write(stdout, "SET"..)
equw msg3
equw msg4-msg3
;
mov #&E000,sp ; put stack at top of memory
mov #1,r0
trap 4 ; write(stdout, "STOP"..)
equw msg4
equw msg5-msg4
stop:
jmp stop
quit:
mov #1,r0
trap 4 ; write(stdout, "QUIT"..)
equw msg5
equw msg6-msg5
clr r0
trap 1 ; exit

_DATA%:
msg1: equb "RUN",13,10
msg2: equb "SBRK",13,10
msg3: equb "SET",13,10
msg4: equb "STOP",13,10
msg5: equb "QUIT",13,10
msg6:
align
_BSS%:
_END%:

When run, this should display:
RUN
SBRK
SET
STOP
then when I press Ctrl-\ should display
QUIT
and cleanly exit.

Instead it does this:

Berkeley Unix 2.9BSD
:login: root
Welcome to the 2.9BSD (Berkeley) UNIX system.
# cd usr/jgh
# infile
RUN
SBRK
SET
STOP
^\Segmentation fault (core dumped)
#

It does this for any signal I try, and any value of SP I set up.
The memory claimed up to &E000 is actually there, if I put in a
little loop to dump memory from 0000 to DFFF is happily dumps
it all out
0000 blah blah blah blah
etc.
DFF0 blah blah blah blah
E000 Segmentation fault (core dumped) (as expected, it's gone
past the end of memory)

Compiling a short bit of C does as expected:

#include <signal.h>
int null()
{ exit(); }

int main()
{ signal(3,null); while(1) ; }

And examining the generated source is essentially identical to
the above handmade PDP11 code.... *other* than setting the
sbrk() and stack.

So. What is BSD2.9 sbrk() or signal() doing or not doing such
that the signal dispatcher in the kernal explodes trying to
JSR PC,signalhandler

jgh

Bob Eager

unread,

Jul 18, 2022, 5:28:04 PM7/18/22

to

On Mon, 18 Jul 2022 13:46:12 -0700, Jonathan Harston wrote:

> So. What is BSD2.9 sbrk() or signal() doing or not doing such that the
> signal dispatcher in the kernal explodes trying to JSR PC,signalhandler

Is there any signal trampoline code at the base of the stack? Or is that
only on the VAX?

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

Johnny Billquist

unread,

Jul 19, 2022, 5:18:14 AM7/19/22

to

On 2022-07-18 22:46, Jonathan Harston wrote:
> Is there something special on BSD2.9 with sbrk() or signal()?
> If I claim memory and stick the stack there, and set a signal,
> when the signal is triggered everything blows up with the
> signal dispatcher in the kernel unable to access memory to
> stack to call the signal handler.

Nothing I'm aware of.

> This doesn't happen if I leave the stack at the value on
> entry at &FF00+xxx, and doesn't happen with Bell Unix v5/6/7.
> I can't just leave the stack at &FFxx as it will descend
> across I/O memory at &E000 upwards.

What are you talking about? User programs don't have I/O memory in the
top page.

What assembler is this? I find the notation a bit weird. Is #&E000
really the constant E000? Or is it whatever value is located at E000?

With DEC assemblers, I guess it would be
MOV #160000,SP

and with the normal Unix assembler, it would be
MOV $160000,SP

Looking a few lines higher up, it certainly looked like you used just #n
for a literal constant, and not #&, so this all looks very fishy to me.

Can you run the code with a debugger and verify that you are getting a
good value? Also, what kind of CPU are you running on? I'm sortof
wondering, in case this is something like an 11/70, if the stack limit
register is used by the OS (I haven't checked). That could play tricks
with you if so. But I wouldn't think the system plays with the stack
limit register. My primary suspect is that you are not actually setting
the SP to what you think you are.

Johnny

Johnny Billquist

unread,

Jul 19, 2022, 10:03:42 AM7/19/22

to

By the way - another thing. This can't have been compiled and run on a
2.9BSD system. Where did you get those syscall numbers from?

I am quite certain the numbers have not changed between 2.9BSD and
2.11BSD, and I can tell:

4 is write
17 is *not* sbrk, but chflags. sbrk is 69.
48 is *not* signal, but getegid. signal is just a wrapper around sigvec,
which is 108.

However, even the call to write is wrong. In 2.11BSD (and I believe
2.9BSD), you basically should have all arguments on the stack.

So, something like this:

mov $4,-(sp)
mov $L4,-(sp)
mov $1,-(sp)
clr -(sp)
sys 4
clr (sp)
sys 1

.globl
.data
L4:.byte 146,157,157,12,0

Would print out a small 4 byte text and exit.

I have little clue what you are actually doing, but your code looks very
weird, basically.

Johnny

Jonathan Harston

unread,

Jul 19, 2022, 11:21:27 AM7/19/22

to

On Tuesday, 19 July 2022 at 15:03:42 UTC+1, Johnny Billquist wrote:
> By the way - another thing. This can't have been compiled and run on a
> 2.9BSD system. Where did you get those syscall numbers from?

https://www.tuhs.org/cgi-bin/utree.pl?file=2.9BSD/usr/src/sys/sys/sysent.c
and elsewhere.

> I am quite certain the numbers have not changed between 2.9BSD and
> 2.11BSD, and I can tell:
> 4 is write
> 17 is *not* sbrk, but chflags. sbrk is 69.
> 48 is *not* signal, but getegid. signal is just a wrapper around sigvec,
> which is 108.

My BSD2.11 agrees with you, my BSD2.9 does not.

> However, even the call to write is wrong. In 2.11BSD (and I believe
> 2.9BSD), you basically should have all arguments on the stack.

That's what I thought, and I was initially pulling my hair out with
everything falling over with the parameters on the stack, but with
BSD2.9 they are definitely inline. Here is the assembler for signal()
from the above C snippet:

\ On entry: sp=>ret, signum, func
_signal:
MOV R5,-(SP) :\ Save R5, sp=>R5, ret, signum, func
MOV SP,R5 :\ R5=>stack frame
MOV &0004(R5),R1 :\ R1=signum
CMP R1,#&0014 :\ CMP MAXSIG
BCC L00B8 :\ Too big, bad signum
MOV &0006(R5),R0 :\ R0=func
MOV R1,&0170 :\ Store signum in TRAP
ASL R1 :\ signum*2, index into dispatch table
MOV &0178(R1),-(SP) :\ Stack old entry (default=0)
MOV R0,&0178(R1) :\ Store func in table
MOV R0,&0172 :\ Also store func in TRAP in case not C function
BEQ L00A4 :\ If zero, jump to pass to TRAP to turn off
BIT #&0001,R0 :\ Is func.b0=1?
BNE L00A4 :\ Also jump to pass to TRAP to disable
ASL R1 :\ R1 is now signum*4
ADD #&00C0,R1 :\ Index into jump block
MOV R1,&0172 :\ Store this as func in TRAP
L00A4:
TRAP &00 :\ TRAP indir
EQUW &016E :\ signal, signum, func
BCS L00BC :\ Error, jump to return it
BIT #&0001,R0 :\ Was old func bit 0 set?
BEQ L00B2 :\ No, skip past to return it
MOV R0,(SP) :\ Yes, overwrite stacked old func
L00B2:
MOV (SP)+,R0 :\ R0=old func from table
MOV (SP)+,R5 :\ Restore R5
RTS PC :\ Return

(snip)

sigtrap:
TRAP &30 :\ 016E 30 89 0.
HALT :\ 0170 00 00 ..
HALT :\ 0172 00 00 ..

At L00A4 it's definitely doing a Bell-style indirect TRAP with inline
parameters.

jgh

Johnny Billquist

unread,

Jul 19, 2022, 12:00:31 PM7/19/22

to

Very unexpected, and surprising that the numbers for the syscalls would
have changed between 2.9BSD and 2.11BSD.
Especially since seems to be some kind of 2.9 compatibility in 2.11.

But anyway. Let's assume that is correct then. What about the argument
to sbrk that I pointed at? It certainly looks broken to me. (And I'm
still confused by what assembler you are using.)

Johnny

Jonathan Harston

unread,

Jul 19, 2022, 12:22:09 PM7/19/22

to

When I do:

clr r1 ; r1=top of memory, start at &10000
mov #&8911,TRAP_BUF ; SYS sbrk
InitMemLp:
sub #256,r1 ; Step down 256 bytes
mov r1,TRAP_BUF+2 ; Store as TRAP argument
TRAP 0 ; SYS sbrk,addr
EQUW TRAP_BUF
bcs InitMemLp ; Memory not claimable, try a bit less
rts pc

TRAP_BUF:
EQUW 0
EQUW 0
EQUW 0
EQUW 0

I end up with E000 has the highest top of memory I can claim.
The addresses at E000 upwards are inaccessible as they are
"elsewhere". My documentation tells me that Exxx is used
for paged memory access, and Fxxx is I/O access, "E000+ is
I/O" is just lazy shorthand for "not addresses I (ie the
code) can access".

Jonathan Harston

unread,

Jul 19, 2022, 12:34:36 PM7/19/22

to

On Tuesday, 19 July 2022 at 17:00:31 UTC+1, Johnny Billquist wrote:
> But anyway. Let's assume that is correct then. What about the argument
> to sbrk that I pointed at? It certainly looks broken to me. (And I'm
> still confused by what assembler you are using.)

It's definely E000, and on Bell V5/V6/V7 I end up with memory all the way
up to DFFF that I can use, and set the stack to E000 and fully descend
all the way to the bottom of memory. This is code that has been working
for almost 20 years on V5/V6/V7. Adding a debug display confirms it's E000:

0001 4E48 0000 0000 0000 0000 FF9A (at ENTRY)
RUN
0001 4E48 0000 0000 0000 0000 FF9A (after sbrk)
SBRK
SET
0005 4E48 0000 0000 0000 0000 E000 (after mov #&E000,sp)

STOP
^\Segmentation fault (core dumped)

The debug display is r0 r1 r2 r3 r4 r5 r6

Johnny Billquist

unread,

Jul 19, 2022, 1:47:17 PM7/19/22

to

Things sounds rather crazy, but I suspect I should really look at 2.9
before I say too much.

But hardware wise, the PDP-11 usually have the I/O page at E000 in
kernel space. Physically it's at the top 8K, but with the MMU you could
in theory map things around as much as you'd like. And any sane OS do
not have the I/O page mapped for user space programs. E000 and F000 is
on the same page, and cannot be mapped separately from each other, so
your notes about paged memory and I/O access cannot be correct
literally. But I have no idea what your notes actually are about in the
end, so it's hard to say that much.

But hardware wise, there are only 8 pages on the PDP-11, and they start
at 0, 2000, 4000, 6000, 8000, A000, C000 and E000, if we talk hex.

And the notation where you seem to sometimes use #& for literals and
sometimes just # is still unclear to me. What does the '&' signify?

Is this really assembled on 2.9BSD? If so, it must be using a completely
different assembler than 2.11BSD, which I find very surprising.

Also, sbrk is about the *increment* to the dataspace you request. It's
not an absolute number. So if sbrk(E000) works, that means you probably
have all the memory, because when your program starts you already have
at least one page of memory for the code and initial data. So growing by
more than 7 pages are unlikely to ever succeed. But that do not mean the
top of the memory you have is at E000. It means you managed to *add*
E000 bytes of data to your space.

Johnny

Jonathan Harston

unread,

Jul 27, 2022, 6:11:58 PM7/27/22

to

As a temporary work-around, this "works":

mov sp,r1
loop:
tst -(sp)
cmp sp,#END+64 ; push stack all the way down to just above data section
bhi loop
mov sp,r0 ; r0=bottom of memory area
mov r1,sp ; r1=top of memory area

...but it leaves an inaccessible "gap" between the end of the uninitialised
data section and the memory forced into the memory map.

Jonathan Harston

unread,

Jul 29, 2022, 11:10:47 AM7/29/22

to

I did a load of digging and single-stepping through the BSD2.9 code
and think I've worked out what's going on - or at least enough to
get a solution that's workable enough for me.

In Bell PDP11 Unix, the signal dispatcher pages in the entire
process before calling its signal handler. BSD 2.9 appears to
do a form of lazy task switching in that it pages in the text
and stack segments and waits for actual memory access to
trigger any of the data to be paged in.

This works fine if the stack is in the stack segment, but if
the stack is in the data segment, it falls over because pushing
to the stack doesn't trigger the paging in.

This may not be completely accurate, but close enough to
work out what's happening.

Further digging finds there's a 'nostk' function call that
disconnects the SP register from the stack segment and allows
you to put it anywhere. See:
https://minnie.tuhs.org/cgi-bin/utree.pl?file=2.9BSD/usr/man/cat2/nostk.2

So, I add a call to nostk to my code, with careful shuffling
around to avoid the stack disappearing under my feet:

; Assume the only thing on the stack is a return address
; Any stack frame above SP has already been processed
;
clr r1 ; r1=top of memory, start at &0000-256=&FF00
mov #&8911,TRAP_BUF ; SYS brk
.IO_InitMemLp

sub #256,r1 ; Step down 256 bytes
mov r1,TRAP_BUF+2

trap 0 ; SYS brk,addr
equw TRAP_BUF
bcs IO_InitMemLp ; Memory not claimable, try a bit less
; ; r1=top of claimed memory
; ; and Carry is clear
mov (sp),r0 ; Get return address
mov r1,sp ; Put stack at top of claimed memory
mov r0,-(sp) ; and push the return address on it
trap 58 ; sys local,nostk
equw TRAP_NOSTK
bcs nostk_not_needed ; nostk doesn't exist, don't need it anyway
; ; nb: SIGSYS caught to set Cy and return
...
rts pc
...

TRAP_NOSTK:
trap 4 ; local nostk

TRAP_BUF:
trap 0
equw 0
equw 0
equw 0

Johnny Billquist

unread,

Jul 29, 2022, 11:46:55 AM7/29/22

to

I just want to make a quick comment on parts of this.
I doubt 2.11BSD is much different from 2.9BSD. With regards to pages,
there is no demand paging in general. All pages are loaded and mapped
before the process can run. However, the stack page can grow dynamically.
And that is where nostk() comes in. Because by default, page 7 is
reserved for the stack, and it grows downward, and can continue into
page 6 and so on. Which means any memory allocation with brk() or sbrk()
cannot grow any further than up to the lowest stack page.

Now, if you don't actually want to use the default stack behavior in
your process, you can call nostk(), and that frees page 7, so that brk()
and sbrk() can actually get memory all the way to the top.

This also should clearly tell you that the I/O page is not in the memory
map of user processes, since that also would normally be sitting at page
7, where you have your stack.

But nostk() don't actually have any impact on your actual stack as such.
It frees memory that would otherwise be reserved for the stack normally.
Not calling nostk() don't prevent you from setting your stack to
somewhere else. It just reduces the amount of memory you have left to
play with.

Johnny

Walter F.J. Müller

unread,

Aug 1, 2022, 4:43:12 AM8/1/22

to

For 2.11BSD, which most likely is similar to 2.9BSD in this respect, the situation is clear.
The [nostk man page](https://www.retro11.de/ouxr/211bsd/usr/man/cat2/nostk.0.html) makes a simple statement: _"its use is discouraged"_ .
The implementation is simple, see
https://www.retro11.de/ouxr/211bsd/usr/src/sys/pdp/kern_pdp.c.html#s:_nostk
https://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/vm_proc.c.html#s:_expand
The call `expand(0,S_STACK)` releases the stack segment(s).
Keep in mind, that this also includes the environment context.

Walter F.J. Müller

unread,

Aug 2, 2022, 5:02:56 AM8/2/22

to

2.11BSD has more syscalls than 2.9BSD. For the common ones, the syscall numbers are similar, but not always identical.
See the sysent tables
https://www.retro11.de/ouxr/29bsd/usr/src/sys/sys/sysent.c.html#s:_sysent
https://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/init_sysent.c.html#s:_sysent

'setuid' is 23 in 2.9BSD and 45 in 2.11BSD.

Beyond that a remark on octal and hex. Earlier on in this thread I saw a lot of HEX numbers.
All PDP-11 hardware documentation uses octal.
The MACRO-11 assembler has binary, octal and decimal radix, but no hex.
HEX is an alien in the PDP-11 hardware world, and shouldn't be used.

Johnny Billquist

unread,

Aug 2, 2022, 5:33:09 PM8/2/22

to

On 2022-08-02 11:02, Walter F.J. Müller wrote:
> 2.11BSD has more syscalls than 2.9BSD. For the common ones, the syscall numbers are similar, but not always identical.
> See the sysent tables
> https://www.retro11.de/ouxr/29bsd/usr/src/sys/sys/sysent.c.html#s:_sysent
> https://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/init_sysent.c.html#s:_sysent
>
> 'setuid' is 23 in 2.9BSD and 45 in 2.11BSD.

There are also differences on how parameters are passed. With 2.11 it's
always on the stack. With 2.9 it's some stack, some register content.

> Beyond that a remark on octal and hex. Earlier on in this thread I saw a lot of HEX numbers.
> All PDP-11 hardware documentation uses octal.
> The MACRO-11 assembler has binary, octal and decimal radix, but no hex.
> HEX is an alien in the PDP-11 hardware world, and shouldn't be used.

I would agree that it feels more "normal" to use octal. But it's
incorrect that MACRO-11 don't have hex. It do. You just need a new
enough version; ^X is in there.

Johnny