[ Google ate my first mail after I'd spent an hour and a half on it.
Grr. So here's my second attempt. Hopefully it makes sense.
And apologies for delay; I've not felt like replying to news the
past
week or so. ]
On Jun 8, 12:35 am, Matthew Phillips <
spam20...@yahoo.co.uk> wrote:
> In message <
63452a9752.Matt...@sinenomine.freeserve.co.uk>
> on 28 May 2012 Matthew Phillips wrote:
>
> > In message <
b498a2ed-3973-49cd-b3a8-a53b6333a...@q2g2000vbv.googlegroups.com>
[this might seem a little matter-of-fact - the original draft of my
mail
had me working out what was going on, but this time I know what's
going
on, so it's a little more fact and less steps-to-find-out]
That I can help with. If you use BTSDump to list the code following
the
failure (I don't know how far forward it looks, but I think it's
about
64 bytes or something, depending on how you've got the system
configured), you'll see that it's checking for a '.' and a '*'
character. The bit before the code that precedes the failure is
looking
for a '*' character and then doing some processing. This made me
think
that it was part of the printf code. A quick search further back in
the
SharedCLibrary in Zap shows that it's in _vfprintf - the RSBLT/EORLT
operation is unique within the module I think.
Anyhow, the code before the failing case (which isn't part of the
code
path) is dealing with a varargs variant of the width specifier (ie
getting the value from the passed registers for a width specifier
like
"%*s"). The failing code is something like this:
LDR r1, <SCL static address>
LDR r12, [r10, #-540] ; get C Library statics offset
ADD r1, r12,r1 ; get the address of the variable
LDRB r2, [r1,r8]
TST r2, #&20
this is essentially doing an 'isdigit' call on the character. The
'isdigit()' macro expands to a test of a bit in the locale character
type data, of which &20 is (I guess) the bit that means 'this is a
digit'.
So you're dying in a printf routine whilst it's trying to use the C
library statics to access the locale data structures to determine if
you've given it a digit or not whilst working out the width field
specifier.
> It has to be pretty fundamental because my callback
> function never gets called by DragAnObject_Start. I can tell this because I
> modified the application to write some debug to a file at various stages, and
> the line at the start of the callback function is not written to file but the
> line immediately before the call to DragAnObject_Start is written. (Or,
> perhaps, the debug function could be responsible for triggering the address
> exception in the same way that the code did before I added the debug.)
Bingo - if you used [fs]printf to write your debug information,
you've
just caused the problem that it is reporting.
Refer to my previous comment about using C functions within the
callback (albeit in the context of calling setjmp/longjmp):
"Because you don't have a C environment set up, you shouldn't rely on
any of the operations within the callback function working in C
environment."
So I'm not sure that the dump you've got is telling you anything
other
than 'the debug code is killing your application'.
> Changing the code back so that bit 17 is set and bit 18 is clear seems to
> solve the problem on RISC OS 6.20. I had better test on a few other versions
> of RISC OS before I can be happy it is fixed.
>
> I'm not totally sure about my interpretation of the dump, though, as the
> value of R10 given in the register listing looks like it could have 540
> deducted from it and still be in application memory, so on the face of it I
> don't see a complete explanation.
I think you're looking at the wrong bit - the 'Register dump' at
the top is the reconstructed C library view on the User mode
registers, not the ones that were in place at the time of the
exception. Look at the dump of registers under 'Exception
register dump' and you'll see that R10 = 0, and so consequently
[R10-540] would be bad.
> It still sounds to me as though there are no disadvantages for the C
> environment in having bit 17 set, so I'll stick with that if it works. I
> hope this is useful to other programmers in the future puzzling over the PRM
> documentation! I've certainly learned quite a lot.
I wouldn't do that. It sets up the environment in a way that is not
correct for your application.
Ok, here are the scenarios:
Normally when you are running your USR mode application, R10 = stack
limit, which is base of stack+540. The base of the stack contains
data
which is important to the C application.
[sl-540] = Offset from linkage address to C library statics
[sl-536] = Offset from linkage address to user code statics (ie the
stuff you write)
In a normal USR mode application:
[sl-540] = the offset of the application's C library workspace from
linkage location of the C library statics for the
version
of the SCL which was present when the application
initialised
[sl-536] = 0 (because the user code statics are at the address they
are linked at)
Every time you use a CLib function which needs access to one of its
statics, it offsets the address of the variable by [sl-540].
Every time you use a variable in your code you access it directly
(you
do not use [sl-536] to offset the code.
sl itself is the stack limit, and if your functions were to hit upon
that limit a stack extension would occur. The data at the base of the
stack, which includes the two values mentioned but also includes
pointers to the previous and next stack chunks, would be copied to a
new
chunk (and the pointers to the chunks updated).
So if you have R10=0 it'll crash when any C library statics are
accessed
for your code linked as a USR mode application, but you can access the
application statics ok (because the offset at [sl-536] will not be
used).
Now lets consider the usage with R10 = SVC stack base+540, which is
what
the DragAnObject module sets up. DragAnObject sets nothing else up.
The
entry point is intended for modules which previously set up the stack
entrails to contain the values it needs.
Let's explain that a bit more. The module entry sequence used by CMHG
(and CMunge, and anything else that wants to use SCL within the
module)
does the following (I've omitted things that aren't relevant):
* Preserve registers
* Work out stack base; all SVC stacks are based at a megabyte
boundary,
so this is simplfy clearing the bottom 20 bits.
* Preserve the two values found at the base of the stack - these are
sl-540 and sl-536.
* Store the offset from the module linkage location to the C library
statics in sl-540.
* Store the offset from the module linkage location to the module
statics in sl-536.
* Set FP = 0 (frame pointer of 0 indicates the end of backtrace)
* Enter the module code at the C routine specifed.
So if, at this point, the module calls DragAnObject_Start the
stack entrails contain the right values for the module, so any
callback function that is called (within the original calling
module) would work just fine. The important thing here is that
the callback function *is a C function*. Normally you wouldn't
do that. The normal way to provide a callback function in a C
module is to use an explicit generic veneer declaration in
CMHG/CMunge to set up the stack entrails for you.
Anyhow, that it differs from the more normal way of handling the
entry point isn't relevant in this case - except that it is the
root cause of the confusion.
The module code which accesses its statics will always use the
offset at [sl-536]; this is where it differs from the
application code that you would normally compile. The -zm switch
that you compile modules with changes this behaviour. Why do
modules need this ? Because the address that they are linked at
cannot contain a direct pointer to its module workspace. A
module at &21001324 could not simply have a pointer in the code
to the workspace at &2105678. Why ? One reason is that the
modules can be multiply instantiated, so the code may be used
with different module workspaces, so you have to have a
different area for each instance. Another reason is that the
module might be in ROM, and the workspace is dynamic in RAM.
So, this value exists in order that you can reference different
areas. APCS uses the term 'static base' for this type of
referencing, and it's possible to build your code to use a
static base register, instead of this offset. SCL doesn't and to
get into how much easier things might be if we used a static
base would probably drift too far off topic for most people to
care.
(just to drift very slightly off topic here, I just found
some ARM documentation that explicitly states that there are
"functions that do not use any static data of any kind, for
example fprintf()" - a fact that is not true of the SCL, as
fprintf does use static data, as I've described above)
Returning to the matter in hand....
The module's code will work because the values are available on
the stack, because DragAnObject set up R10 to point to the right
place - if it had been set up with R10 = 0, it would crash because
it referenced address -540 as we've seen above.
Assuming all is well, the module code returns to DragAnObject
and it restores things:
* Restore the old [sl-540] and [sl-536] contents
* Restore registers
* Set any flags necessary
* Return to OS
So you can see that under normal circumstances the stack base will
either be set to the offsets for the C library statics and module
statics from the module that is currently executing, or the previous
values preserved.
If a module were to exit from within the code that had stored values
at the base of the stack (eg because it aborted within its SWI
handler) there will remain values at the base of the stack which
relate to the module which exited.
The value of [sl-536] would be that of the offset from the
linkage address of the module statics to the module statics
within module's workspace. This is of no use to anyone but that
module.
The value of [sl-540] would be that of the offset from the
linkage address of the SharedCLibrary at the time that the module
initialised to the C library statics within that module.
So, consider the situation where nothing has gone wrong and the base
of the stack contains (0,0) - no offset present. If your user mode
application called DragAnObject with R10 pointing to the base of the
SVC stack+540, and called back into your application:
* Access to your application statics would be fine - because your
application code never uses [sl-536] to access its statics.
* Access to the C library statics wouldn't be right, but would
probably work so long as the values they contained were not
important. Why ?
Because the C library statics addresses that are being loaded
are the linkage addresses, and they're being offset by 0.
So the code will dereference values off the end of the
SharedCLibrary - the linkage address for statics in a module is
'as if' the statics were located immediately following the code.
So, for example, a read of the locale tables might get random
data, and in some cases that might work out just fine.
The important thing here is that unless there was a pointer in
the C library statics, it wouldn't crash. It would just get
the wrong data.
Now consider the state where the stack contains the data for some
other module, because it crashed as described above. If your user
mode application called DragAnObject with R10 pointing to the base
of the SVC stack+540 (which contains these offsets for the module
that broke), and called back into your application:
* Again, access to your application statics is fine, for the same
reasons.
* Access to the C library statics might seem to work just fine,
unless the module that crashed was killed. Why?
Because the offset in [sl-540] is the offset from the C library
linkage location to the statics of the failing module.
Consequently, any access to the statics works just fine, and
is actually a valid set of statics. Well. Valid so long as
the module didn't select a different locale, or whatever.
Standard file descriptors (stdout, stdin, stderr) will be the
descriptors /for the module/ not for the application. Similarly
the locale settings will be those for the module not the
application. And so on.
Consider the case where the base of the stack contains random values
or values which do not result in a valid block of memory when appplied
to the linkage locations of the static data in the SCL. In this case
you're pretty much guaranteed that you'll get a crash.
Ok, so step back further. We cannot rely on the value of those two
addresses from a given application - we don't know what they might be
and any of the above might have happened (or those addresses may be
updated to be markers in some other version of the system - yes that
could be useful).
And if we cannot rely on the value of those addresses, we cannot have
C code called with them set to any random values. Therefore... a
callback from DragAnObject which sets R10 to the SVC stackbase+540
WILL NOT WORK if you call an application C function. That it may work
sometimes is no justification.
If you can see anywhere in the above justification which is wrong and
means that it should work, please let me know - it's very possible
that
in the time that I've spent away from RISC OS that I've forgotten
something about how it functions. There are, I'm sure (/there ought to
be) people who know the way that the C library interactions work who
can corroborate or correct the above, and I'd be happy to be corrected
if someone can show where I'm wrong. It's been 6 or 7 years, so ... I
can easily misremember.
Anyhow, I hope I've convinced you that using bare C functions is not
going to work. DragAnObject was not meant to be used
by applications - and the documentation further confused things by
implying things which were not the case. The above description applies
to all versions of RISC OS (unless anyone's changed the way in which
C modules work, which is possible but ... tricky).
DragAnObject always felt a little unfinished to me. It doesn't quite
do
what you want and really I think you'd be better off having it as a
library
rather than a module. Maybe.
But, what can you do to mitigate the problem ? Create a veneer that
does the right thing for you would be my guess. That's all that the
modules do.
Here's a quick guess at how I'd provide the veneer.
--------8<--------
EXPORT callDragAnObject_Start
; _kernel_oserror *callDragAnObject_Start(unsigned long flags,
; void (*cfunc) (void
*private),
; void *private,
; bbox_t *dragbox,
; bbox_t *boundingbox);
;
; Call DragAnObject_Start, providing the rendering function as a C
; function that we can operate on.
; => R0 = flags
; R1 = pointer to C function to call
; R2 = private value to pass to C function in R0
; R3 = pointer to 16-byte block containing box
; [sp+0] = pointer to optional 16-byte block containing bounding
box
; <= R0 = pointer to error block, or 0 if no error.
callDragAnObject_Start SIGNATURE
MOV ip, sp
STMFD sp!, {v1,fp,ip,lr,pc}
SUB fp, ip, #4 ; fp -> pc on the stack
LDR r4, [fp, #4] ; r4(v1) = stacked value
ORR r0, r0, #(1<<16)+(1<<17) ; set the function callback
; and as C function flags
STMFD sp!, {r1,r2,sl} ; set up our registers
ADR r1, dao_callback_veneer ; use our veneer
MOV r2, sp ; workspace is on our stack
SWI XDragAnObject_Start
MOVVC r0, #0
[ {CONFIG}=32
LDMDB fp, {v1,fp,sp,pc}
|
LDMDB fp, {v1,fp,sp,pc}^
]
; Callback veneer, called by DragAnObject to render.
; Must call down to our C function, having set up the stack entrails
; properly.
; Will be entered in SVC mode.
; => R0 = C function pointer
; R1 = private value to pass in R0 to C function
; R2 = SL from our old veneer
; Can corrupt all R0-R10
dao_callback_veneer SIGNATURE
STMFD sp!, {r1, lr}
LDR v1, [sl, #-540] ; preserve SVC CLib statics
LDR v2, [sl, #-536] ; preserve SVC app statics
LDR v3, [r2, #-540] ; our CLib statics
LDR v4, [r2, #-536] ; our app statics
STR v3, [sl, #-540] ; replace SVC CLib statics
STR v4, [sl, #-536] ; replace SVC app statics
MOV fp, #0 ; stop backtrace here
MOV r0, r1
MOV lr, pc
LDR pc, [sp] ; call function
STR v1, [sl, #-540] ; restore SVC CLib statics
STR v2, [sl, #-536] ; restore SVC app statics
LDMFD sp!, {r1, pc}
--------8<--------
I'm not completely sure that that's right - I'm quite rusty on my
ARM code.
As for the 'fix' which was meant to call back to USR mode. Any
such fix needs to be able to restore SL, which is in R10. Which
is not available as part of the SWI entry parameters. Therefore
there is no legal way of getting the value of R10 within the
SWI call - therefore you couldn't restore the C environment
sufficiently in order to be useful.
Maybe I missed something about how they finessed that particular
'fix', but I feel that trying to fix a piece of code to allow
a documentation error to work probably isn't a good plan. Calling
down from a SWI call into USR mode I'm certain is a bad plan.
I'm pretty certain that calling back into an application through
a callback in SVC mode is also a bad plan but there you go.
Certainly you can do everything that DragAnObject does in a library
in your application which would save the unsafe callback, albeit
losing the advantage of having a block of code already in the
system.
> The other interesting thing is that some of the other calls my application
> makes to DragAnObject_Start are working fine. They have different callback
> functions but go through the same Drag_Object function as the problematic
> call, and were also using bit 17 clear, bit 18 set. It must have been down
> to stack usage and whether statics were accessed or the calling convention
> needed reading. And interesting that there were no problems on 4.02 or 4.39,
> as well as the RO5 branch, of course.
I imagine that the change in the way that applications are
called may have affected this. I'm not certain, but my
recollection is that the stack flattening performed by the
system does not manipulate the two words at the bottom of the
stack to clear them, so they would be left. Consequently the two
stack entrails values are probably the ones from AIF. Maybe. I'm
not certain and further speculation probably isn't worthwhile as
I've already shown that you're relying on values that you did
not initialise and therefore the behaviour is going to be
undefined.
Anyhow, I hope that using BTSDump has shown it to be useful as a
diagnostic tool, even if it merely tells you here that you broke
things by putting debugging code in.
If I were looking at the code now, I'd be very tempted to add a
PBTS point in DragAnObject, as its transferring control to
another chunk of code. That would allow you to trace the
transition to the C code (or any other code) and produce a
useful backtrace for BTSDump if anything went wrong.
--
Gerph
... Every nightmare that I dream;
make it mean something better.