Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

clang bug

31 views
Skip to first unread message

muta...@gmail.com

unread,
Mar 1, 2023, 11:06:28 AM3/1/23
to
Hello.

#Shouldn't @there be two mov `instructions` for 2 parameters?

I only see the second.

.ident "Android (7714059, based on r416183c1) clang version 12.0.8 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)"

mystart takes 2 parameters. I can see the second parameter:

movl %eax, 4(%esp)

but I don't see the first parameter, which I expect to be inserted
at 0(%esp)

/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char
**argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf,
size_t len, int prot);



.text
.p2align 4
.globl _start
.type _start, @function
_start:
.LFB0:
.cfi_startproc
endbr32
pushl %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
subl $8, %esp
.cfi_def_cfa_offset 16
leal 12(%esp), %eax
subl $8, %esp
.cfi_def_cfa_offset 24
movl %eax, paul
leal 24(%esp), %eax
pushl %eax



/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char
**argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf,
size_t len, int prot);



.text
.file "linstart.c"
.globl _start # -- Begin function _start
.p2align 4, 0x90
.type _start,@function
_start: # @_start
# %bb.0:
pushl %esi
subl $8, %esp
leal 12(%esp), %eax
movl %eax, paul
leal 16(%esp), %eax
movl %eax, 4(%esp)
calll __mystart
movl %eax, %esi
movl %eax, (%esp)
calll __exita
movl %esi, %eax
addl $8, %esp
popl %esi
retl
.Lfunc_end0:



/* Startup code for Linux */
/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char **argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf, size_t len, int prot);

#define PROT_READ 1
#define PROT_WRITE 2
#define PROT_EXEC 4
#endif

/* We can get away with a minimal startup code, plus make it
a C program. There is no return address. Instead, on the
stack is a count, followed by all the parameters as pointers */

int _start(char *p)
{
int rc;
char *argv[2] = { "prog", NULL };

#ifdef NEED_MPROTECT
/* make malloced memory executable */
/* most environments already make the memory executable */
/* but some certainly don't */
/* there doesn't appear to be a syscall to get the page size to
ensure page alignment (as required), and I read that some
environments have 4k page sizes but mprotect requires 16k
alignment. So for now we'll just go with 16k */
size_t blksize = 16 * 1024;
size_t numblks;

newmembuf = membuf + blksize; /* could waste memory here */
newmembuf = newmembuf - (unsigned int)newmembuf % blksize;
numblks = sizeof membuf / blksize;
numblks -= 2; /* if already aligned, we wasted an extra block */
rc = __mprotect(newmembuf,
numblks * blksize,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (rc != 0) return (rc);
#endif

/* I don't know what the official rules for ARM are, but
looking at the stack on entry showed that this code
would work */
#ifdef __ARM__

#if defined(__UNOPT__)
rc = __mystart(*(int *)(&p + 5), &p + 6);
#else
rc = __start(*(int *)(&p + 6), &p + 7);
#endif

#else
paul = (int *)(&p - 1);
rc = __mystart(*(int *)(&p - 1), &p);
/* rc = __start(1, argv); */
#endif
__exita(rc);
return (rc);
}


void *__allocmem(size_t size)
{
return (newmembuf);
}


#if defined(__WATCOMC__)

#define CTYP __cdecl

/* this is invoked by long double manipulations
in stdio.c and needs to be done properly */

int CTYP _CHP(void)
{
return (0);
}

/* don't know what these are */

void CTYP cstart_(void) { return; }
void CTYP _argc(void) { return; }
void CTYP argc(void) { return; }
void CTYP _8087(void) { return; }

#endif



Holy cow I need a real computer

muta...@gmail.com

unread,
Mar 1, 2023, 11:53:51 PM3/1/23
to
Ok, I'm back on my computer now and was able to raise a bug report:

https://github.com/llvm/llvm-project/issues/61112

BFN. Paul.

Alexei A. Frounze

unread,
Mar 2, 2023, 12:34:09 AM3/2/23
to
Without looking any further, you're taking address of a parameter (p),
and doing pointer arithmetic on it. The only valid arithmetic here
would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
you aren't allowed to dereference it (as in *(&p + 1)).
You can only subtract the 1 back (as in (&p + 1) - 1) or compute
a difference of that and &p (as in (&p + 1) - &p or the other way around).
Your shady pointer manipulation is undefined behavior and
I almost guarantee your clang bug being soon closed as invalid.

Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
to get there.

Alexey

muta...@gmail.com

unread,
Mar 2, 2023, 1:47:09 AM3/2/23
to
"undefined behavior" shouldn't mean "let's leave the stack
uninitialized, and not even issue a warning". It just means
you can't guarantee what the result will be.

But you can still give a sensible result.

In this case, I am trying to inspect the stack. I'm using
knowledge of the x86 as to what the stack looks like,
and the calling convention.

Yes. That code won't work on every platform in the world.

But it should work perfectly fine when I know the layout.

It is not in the spirit of C to catch every undefined behavior
and throw an error. And in this case, even throwing an error
would be better than silent failure and garbage.

Even if setting a pointer to the arbitrary value 0xb8000 and
indexing up a 1000 bytes is undefined behavior, doesn't
mean it should be disallowed either.

BFN. Paul.

Alexei A. Frounze

unread,
Mar 2, 2023, 3:30:12 AM3/2/23
to
On Wednesday, March 1, 2023 at 10:47:09 PM UTC-8, muta...@gmail.com wrote:
> On Thursday, March 2, 2023 at 1:34:09 PM UTC+8, Alexei A. Frounze wrote:
> > On Wednesday, March 1, 2023 at 8:53:51 PM UTC-8, muta...@gmail.com wrote:
> > > Ok, I'm back on my computer now and was able to raise a bug report:
> > >
> > > https://github.com/llvm/llvm-project/issues/61112
> > >
> > > BFN. Paul.
> > Without looking any further, you're taking address of a parameter (p),
> > and doing pointer arithmetic on it. The only valid arithmetic here
> > would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
> > you aren't allowed to dereference it (as in *(&p + 1)).
> > You can only subtract the 1 back (as in (&p + 1) - 1) or compute
> > a difference of that and &p (as in (&p + 1) - &p or the other way around).
> > Your shady pointer manipulation is undefined behavior and
> > I almost guarantee your clang bug being soon closed as invalid.
> >
> > Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
> > to get there.
> "undefined behavior" shouldn't mean "let's leave the stack
> uninitialized, and not even issue a warning". It just means
> you can't guarantee what the result will be.

This is wishful thinking. The standard means that all bets are off
regardless of your thinking of what should or should not be.

> But you can still give a sensible result.

Sure. Garbage in, garbage out, as is the case here, is a sensible result.

> In this case, I am trying to inspect the stack. I'm using
> knowledge of the x86 as to what the stack looks like,
> and the calling convention.

If the compiler inlines some code or uses link-time optimization,
there may be no stack at all. This likely won't happen if the caller
is some assembly code, but throw in enough C and it becomes
a possibility.

> Yes. That code won't work on every platform in the world.
>
> But it should work perfectly fine when I know the layout.

If you use an ancient compiler or disable optimizations in a modern
one, maybe.

> It is not in the spirit of C to catch every undefined behavior
> and throw an error. And in this case, even throwing an error
> would be better than silent failure and garbage.

The standard does not require to detect undefined behavior
at compile or run time and inform of it. In some cases such
detection is impractically expensive if not outright impossible.

> Even if setting a pointer to the arbitrary value 0xb8000 and
> indexing up a 1000 bytes is undefined behavior, doesn't
> mean it should be disallowed either.

That's less relevant to your current problem, I think.

Alex

Alexei A. Frounze

unread,
Mar 2, 2023, 3:40:44 AM3/2/23
to
If you really want to manipulate stuff directly on the stack,
declare a structure type with those things that will be on
the stack. On the caller side push the struct contents to the
stack and pass ESP as an argument to your C function
(make sure to abide by the ABI w.r.t. saved and non-saved
registers and ESP alignment (ESP may be expected to be
a multiple of 8 or 16, not just a multiple of 4)).

Alex

muta...@gmail.com

unread,
Mar 2, 2023, 3:59:40 AM3/2/23
to
On Thursday, March 2, 2023 at 4:30:12 PM UTC+8, Alexei A. Frounze wrote:

> > "undefined behavior" shouldn't mean "let's leave the stack
> > uninitialized, and not even issue a warning". It just means
> > you can't guarantee what the result will be.

> This is wishful thinking. The standard means that all bets are off
> regardless of your thinking of what should or should not be.

Sure. The standard doesn't stop the compiler from
detecting the unportable code and reformatting my
hard disk. That doesn't mean it should. There is a
sensible thing to do, and it could have done it.

There's even an asshole thing to do - throw a compile
error.

But it's simply diabolical to introduce a random value.

> > But you can still give a sensible result.

> Sure. Garbage in, garbage out, as is the case here, is a sensible result.

No it isn't. The perfectly fine stack manipulation is the
sensible result.

> > In this case, I am trying to inspect the stack. I'm using
> > knowledge of the x86 as to what the stack looks like,
> > and the calling convention.

> If the compiler inlines some code or uses link-time optimization,
> there may be no stack at all. This likely won't happen if the caller
> is some assembly code, but throw in enough C and it becomes
> a possibility.

The caller is an external unit of work. The parameter passing
mechanism needs to be fixed in place.

It is the equivalent of assembler.

That is what I want - a compile time option that assumes that
this is an independent body of work called by assembler.

If only ancient compilers support that concept, then fine,
I'm going to use a modern ancient compiler.

> > Yes. That code won't work on every platform in the world.
> >
> > But it should work perfectly fine when I know the layout.

> If you use an ancient compiler or disable optimizations in a modern
> one, maybe.

Or use gcc. It didn't have a problem.

> > It is not in the spirit of C to catch every undefined behavior
> > and throw an error. And in this case, even throwing an error
> > would be better than silent failure and garbage.

> The standard does not require to detect undefined behavior
> at compile or run time and inform of it. In some cases such
> detection is impractically expensive if not outright impossible.

See above about reformatting the hard disk.

> > Even if setting a pointer to the arbitrary value 0xb8000 and
> > indexing up a 1000 bytes is undefined behavior, doesn't
> > mean it should be disallowed either.

> That's less relevant to your current problem, I think.

No. It's exactly relevant.

> If you really want to manipulate stuff directly on the stack,
> declare a structure type with those things that will be on
> the stack. On the caller side push the struct contents to the
> stack and pass ESP as an argument to your C function
> (make sure to abide by the ABI w.r.t. saved and non-saved
> registers and ESP alignment (ESP may be expected to be
> a multiple of 8 or 16, not just a multiple of 4)).

I can't control the external caller.

Which is the Linux OS.

BFN. Paul.
0 new messages