Moral: UB is UB; if you don't want UB, don't invoke UB.
Phil
--
If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the
drug-addicted gang member who had committed his first murder
before he had sex, which was rape. -- Erik Naggum (1965-2009)
This isn't really like Torvalds; he usually has low tolerance for stupid
mistakes. I'm guessing from the quote perhaps he saw some of the vids
but didn't actually see the defective code, which I'm confident he'd
want repaired.
In that particular code snippet, it's pretty obvious either the check
for NULL is A) useless and should be removed (which is what GCC is
doing) or B) very important and the pointer should not be dereferenced
without being checked against NULL (which is what the exploit is...
exploiting).
Based on the exploit, it probably should be (B), and one patch (not
(yet) in the kernel tree) does exactly that. From Brad Spengler's
exploit.c:
Fix appears on July 6th:
http://lkml.org/lkml/2009/7/6/19
As you'll note in the fix, the problem was a NULL tun variable,
which from the source should have been unexploitable due to the
immediate check for !tun. The dereference of tun to grab ->sk would
have caused only a crash (or not, if NULL was mapped). So how was
the bug exploitable before but fixed by moving one line of code?
The reason is that because the tun ptr is used before the check for
!tun, the compiler assumes that in dereferencing tun a fault will
occur, removing any need to later check tun for being NULL. So the
!tun check actually does not exist in the compiled code. Normally
this would be fine, if you could actually ensure that nothing is
mapped at NULL, but as this (and previous exploits) have proven,
that's not the case :) So the fix moves the dereference of tun
until after !tun is checked -- this time the !tun check actually
exists in the code and the vulnerability/exploitability is removed.
You can see a reference to this sort of problem here:
http://osdir.com/ml/gcc.cross-compiling.arm/2007-10/msg00003.html
The kernel should be compiled with -fno-delete-null-pointer-checks
to remove the possibility of these kinds of vulnerabilities turning
exploitable in the future which would be impossible to spot at the
source level without this knowledge.
>Moral: UB is UB; if you don't want UB, don't invoke UB.
This really is a great example of all bets being off. The compiler
says, "Well, you blindly dereference it here, so there's no point in
checking it against NULL right after that." Works great for pointers
that aren't NULL, and would work great without the dereference. But
when you dereference the NULL pointer, things go to hell.
-Beej
PS. Interestingly, it appears legal to dereference a NULL pointer as
long as you take the address of it immediately thereafter:
int *a = NULL, *b, c;
b = &*a; // ok, same as "b = a"
c = *a; // bad
(C99 6.5.3.2p3-4, fn87)
I would have thought that this didn't count as dereferencing the
pointer. Dereferencing gives you a value; you can't apply & to a value
to ask "where is this stored?". Not that I'm an expert.
I'd pick one small but fairly important nit with this analysis:
It's not that the compiler assumes dereferencing NULL will cause a
fault, but that dereferencing NULL causes undefined behavior. U.B.
is not a promise of a fault, nor a promise of death and taxes, nor
a promise of eternal bliss in Paradise: It's just "U."
In other words, we need not assume that the compiler has made
some sort of error by assuming the program would end with a crash
at the NULL dereference. The compiler merely observes that if the
pointer was NULL at the dereference, then the program invoked U.B.
at that point and its future actions are "U." The compiler need
not reason (incorrectly) that "If we survive the dereference, then
the pointer is non-NULL." Instead, it reasons "If the behavior is
defined, then the pointer is non-NULL" -- in which case, the test
for NULL can be omitted. All it can do is change one manifestation
of U.B. into a different one, but since both are equally "U" it
makes no difference what manifestation manifests itself.
Some U.B. is welcome, some is unwelcome. All U.B. is U.
--
Eric Sosman
eso...@ieee-dot-org.invalid
You are right that it does not count as a dereference but for another
reason -- namely that the two operators are defined in such a way that
there is no dereference.
If it were for your reason, then
int i = 42;
int *a = &i, *b;
b = &*a;
and, by extension, constructs like
&an_array[idx]
would not be possible. Expressions of the form *E and E1[E2] are
lvalues which is, to a first approximation, code for values whose
address you _can_ take.
--
Ben.
Oh, yes, there's a lot of hearsaid (I think I may have invented
that verb) quotes involved. Without context it's hard to tell
who was actually suffering from impared judgement.
> In that particular code snippet, it's pretty obvious either the check
> for NULL is A) useless and should be removed (which is what GCC is
> doing) or B) very important and the pointer should not be dereferenced
> without being checked against NULL (which is what the exploit is...
> exploiting).
>
> Based on the exploit, it probably should be (B), and one patch (not
> (yet) in the kernel tree) does exactly that. From Brad Spengler's
> exploit.c:
>
> Fix appears on July 6th:
> http://lkml.org/lkml/2009/7/6/19
> As you'll note in the fix, the problem was a NULL tun variable,
> which from the source should have been unexploitable due to the
That predicate (to be unexploitable) in this context makes me mad!
> immediate check for !tun. The dereference of tun to grab ->sk would
> have caused only a crash (or not, if NULL was mapped). So how was
> the bug exploitable before but fixed by moving one line of code?
> The reason is that because the tun ptr is used before the check for
> !tun, the compiler assumes that in dereferencing tun a fault will
> occur, removing any need to later check tun for being NULL. So the
> !tun check actually does not exist in the compiled code. Normally
...
>
>>Moral: UB is UB; if you don't want UB, don't invoke UB.
>
> This really is a great example of all bets being off.
Amen! You read it like I read it, obviously.
> The compiler
> says, "Well, you blindly dereference it here, so there's no point in
> checking it against NULL right after that." Works great for pointers
> that aren't NULL, and would work great without the dereference. But
> when you dereference the NULL pointer, things go to hell.
>
> -Beej
>
> PS. Interestingly, it appears legal to dereference a NULL pointer as
> long as you take the address of it immediately thereafter:
>
> int *a = NULL, *b, c;
> b = &*a; // ok, same as "b = a"
> c = *a; // bad
>
> (C99 6.5.3.2p3-4, fn87)
Yup, that seems to be a special case - you're not actually
dereferencing it.
That's nice to hear. I'll be checking in some UB into linux-omap in
the next few days! (But I don't believe there's a single .c file
in the whole linux kernel that doesn't involve UB, so it'll fit
right in!)
If I read it well I think it is a compiler that modifies well-behaved code
to UB. "Although the code correctly checks to make sure the tun variable
doesn't point to NULL, the compiler removes the lines responsible for that
inspection during optimization routines." Although this sentence is
slightly incomprehensible.
--
dik t. winter, cwi, science park 123, 1098 xg amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
> If I read it well I think it is a compiler that modifies well-behaved code
> to UB. "Although the code correctly checks to make sure the tun variable
> doesn't point to NULL, the compiler removes the lines responsible for that
> inspection during optimization routines." Although this sentence is
> slightly incomprehensible.
No. The problem lies within the original code.
http://isc.sans.org/diary.html?storyid=6820
struct sock *sk = tun->sk; // initialize sk with tun->sk
...
if (!tun) return POLLERR; // if tun is NULL return error
On the basis of the code sample presented, this isn't the
situation. The code is not well-behaved, and invokes undefined
behavior, full stop. Yes (by report), the compiler removes the
test for NULL, but this does not introduce undefined behavior,
it merely changes (probably) the nature of the undefined behavior
already in progress. Since a program has no right to rely on
*anything* once it indulges in U.B., it can in particular not
rely on a NULL test occurring.
After the train goes off the rails, the settings of the
switches not yet encountered make no difference to its route.
I think your nit has another nit which needs to be picked. :-)
I haven't looked at the original code, but from the thread I've read so far
here, is sounds like it's something like this:
if ( tun->sk ... && tun != NULL )
{
}
rather than:
if (tun != NULL && tun->sk ... )
{
}
While C may not define what happens when you dereference a NULL pointer, the
O/S may. Since the compiler "knows" that it's compiling for Linux, and it
"knows" that dereferencing NULL will cause a trap, it can remove the check,
because it can never be false. (That is, it tun were NULL, the check would
never be reached.)
> In other words, we need not assume that the compiler has made
> some sort of error by assuming the program would end with a crash
> at the NULL dereference. The compiler merely observes that if the
> pointer was NULL at the dereference, then the program invoked U.B.
> at that point and its future actions are "U." The compiler need
> not reason (incorrectly) that "If we survive the dereference, then
> the pointer is non-NULL." Instead, it reasons "If the behavior is
> defined, then the pointer is non-NULL" -- in which case, the test
> for NULL can be omitted. All it can do is change one manifestation
> of U.B. into a different one, but since both are equally "U" it
> makes no difference what manifestation manifests itself.
>
> Some U.B. is welcome, some is unwelcome. All U.B. is U.
--
Kenneth Brody
No need to assume anything about the OS.
The compiler 'knows' that dereferencing a pointer will cause UB if that
pointer is NULL, so from then on it is allowed to assume that that pointer
is not NULL.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
if tun==0 this above deference tun, and read the memory
of address (0+offset(sk)), and this is in what i remember full UB
here i think should segfalut
the code for to be ok should be
start_routine
/start declare all variable in routine
struct sock *sk
...
/end declare all variabel in routine
/start code routine
if(tun==0) return POLLERR;
if(someother) return ELSEERROR;
...
sk=tun->sk;
the wrong above is sk=tun->sk=(*tun).sk==((*0) + offset(sk))
read the address in 0 (not the address (0+offset(sk)))
The thing I find odd about that web page is the claim that "Because a
source code audit of the vulnerable code would never find this
vulnerability (well, actually, it is possible but I assure you that
almost everyone would miss it)." While I admit that the vulnerability
itself might not be obvious, would a source code audit normally ignore
such an obvious example of undefined behavior?
Possibly I haven't tried hard enough, but I haven't located the
context of the above code. However, since 'tun' can be null, and is
being dereferenced without bothering to check for that possibility,
such code would never pass a code review in my group unless the
mechanism whereby 'tun' could be null were extremely inobvious. It's
one of the simplest things we check; it's right up there with making
sure that indexes are within limits before using them to index an
array. Does this make us unusual?
It's like this in structure:
void foo(int *a)
{
int b = *a;
if (a == NULL) return;
.
. // exploit works here because "if" test has been optimized out
.
Actual code below from kernel 2.6.30.1 drivers/net/tun.c. 2.6.30.2 was
just released and this issue was patched, with the relevant part of the
patch shown below, as well.
-Beej, time for a rebuild... :)
------------------------------ 8< ------------------------------
static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
{
struct tun_file *tfile = file->private_data;
struct tun_struct *tun = __tun_get(tfile);
struct sock *sk = tun->sk;
unsigned int mask = 0;
if (!tun)
return POLLERR;
DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
poll_wait(file, &tun->socket.wait, wait);
if (!skb_queue_empty(&tun->readq))
mask |= POLLIN | POLLRDNORM;
if (sock_writeable(sk) ||
(!test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
sock_writeable(sk)))
mask |= POLLOUT | POLLWRNORM;
if (tun->dev->reg_state != NETREG_REGISTERED)
mask = POLLERR;
tun_put(tun);
return mask;
}
------------------------------ 8< ------------------------------
- struct sock *sk = tun->sk;
+ struct sock *sk;
unsigned int mask = 0;
if (!tun)
return POLLERR;
+ sk = tun->sk;
------------------------------ 8< ------------------------------
commit 3f8fd3f9f677ce452556aca82473b7fcac370830
Author: Mariusz Kozlowski <m.koz...@tuxland.pl>
Date: Sun Jul 5 19:48:35 2009 +0000
tun/tap: Fix crashes if open() /dev/net/tun and then poll() it.
(CVE-2009-1897)
commit 3c8a9c63d5fd738c261bd0ceece04d9c8357ca13 upstream.
Fix NULL pointer dereference in tun_chr_pool() introduced by commit
33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554 ("tun: Limit amount of queued
packets per device") and triggered by this code:
int fd;
struct pollfd pfd;
fd = open("/dev/net/tun", O_RDWR);
pfd.fd = fd;
pfd.events = POLLIN | POLLOUT;
poll(&pfd, 1, 0);
Reported-by: Eugene Kapun <abacaba...@gmail.com>
Signed-off-by: Mariusz Kozlowski <m.koz...@tuxland.pl>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
> static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
> {
> struct tun_file *tfile = file->private_data;
> struct tun_struct *tun = __tun_get(tfile);
> struct sock *sk = tun->sk;
> unsigned int mask = 0;
>
> if (!tun)
> return POLLERR;
>
> DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
>
> poll_wait(file, &tun->socket.wait, wait);
I can't quite see why removing the test for tun == NULL would be the
root cause of a vulnerability here. There must be some severe hacking
going on to make __tun_get return a null pointer. What if that hacking
made it return (struct tun_struct *) 4 instead? That would have passed
the null pointer test anyway, whether it is there or not, so how would
that be a different situation?
Just to pick the nit once more, I think you've restated the analysis
whose nit I tried to pick ... My point was that the optimization is
valid even if control *does* reach the check (where the check was, that
is). If the pointer was NULL, the behavior is undefined and removing
the check doesn't make it any more or less undefined than it already
was. The Linux environment doesn't enter into the matter, nor does the
likelihood of a trap or other crash: undefined is UN-defined, and it
simply doesn't matter what the code does afterwards.
Y'know how a couple times a week some newbie will ask "I did the
following invalid thing and my program didn't crash; is my compiler
buggy?" Or almost equivalently, "Whaddya mean, my code is invalid?
I tried it, and it worked fine!" And we all patiently explain that
Undefined Behavior is not a guarantee of a crash, nor of a desired
outcome, nor of demons flying from the nasal cavity, nor of anything
else in particular. The analysis whose nits we nit-wits are trying
to pick makes this same error in assuming that U.B. means "crash" --
or, rather, in arguing that the compiler is at fault for so assuming.
It's not that "The dereference will crash if the pointer is NULL,
so the test will never be reached unless it's vacuous, therefore the
test can be removed." It's something a little subtler, a little less
definite, more like "The dereference produces U.B. if the pointer is
NULL, so if the test is reached there are two cases: (1) either all
is well and there's no U.B., meaning that the pointer wasn't NULL and
the test can be removed because it's vacuous, or (2) the pointer was
NULL and we're in U.B.-land, so the test can be removed because we're
already careening out of control and what we do next doesn't matter."
do { !pick(nit); } while(underInfluence(oratory));
A source code audit? I sure hope not. But I imagine a _security_ audit
might, considering the number of people who think the problem is limited
to how the Linux kernel might react rather than the compiler. They'd
note the bug bug, classify it as just a crash, not something the program
would do further which could be exploited, and move on. Hopefully
reporting the bug anyway, but not as a security problem.
I don't know how a security audit works, but it has to mean they are not
required to find and report every bug they can.
> Possibly I haven't tried hard enough, but I haven't located the
> context of the above code.
Actually you've seen enough. A pointer is followed, and the very next
line says it may have been a null pointer. I don't see how the source
could make it easier to spot a possible null dereference.
--
Hallvard
Nor any reason to put your 'knows' below in quotes:
> The compiler 'knows' that dereferencing a pointer will cause UB if that
> pointer is NULL,
Yes. Because it's compiling a C program, and that's what the C language
standard says about following NULL pointers.
> so from then on it is allowed to assume that that pointer is not NULL.
Right.
--
Hallvard
Since it checks it for null *after* the dereference it would fail any
code review I attended even if it was *proved* than tun could never be
null. After all, the test is in the wrong place. If it was proved that
tun could never by null I might accept removing the test completely (as
the compiler is doing), but I would never accept it being left in place.
Oh, and since the reason for the test being in the wrong place was that
you can't (in C90) declare variables after the first statement, it could
be changed to...
if (!tun) return POLLERR
else {
struct sock *sk = tun->sk;
more code
}
So the argument that the variables have to be initialised on declaration
(which I understood to be part of the reason for the code being as it
was) on it's own is *not* a reason for introducing this bug!
> It's
> one of the simplest things we check; it's right up there with making
> sure that indexes are within limits before using them to index an
> array. Does this make us unusual?
Well, a lot of code assume the arguments are valid However, assuring the
test for null (if it exists) is before the use of the variable is pretty
basic and I would expect every organisation to require that it be done
in the right order.
Of course, it would have been nice if gcc had warned about this when it
did the optimisation.
--
Flash Gordon
Since it checks it for null *after* the dereference it would fail any
code review I attended even if it was *proved* than tun could never be
null. After all, the test is in the wrong place. If it was proved that
tun could never by null I might accept removing the test completely (as
the compiler is doing), but I would never accept it being left in place.
Oh, and since the reason for the test being in the wrong place was that
you can't (in C90) declare variables after the first statement, it could
be changed to...
if (!tun) return POLLERR
else {
struct sock *sk = tun->sk;
more code
}
So the argument that the variables have to be initialised on declaration
(which I understood to be part of the reason for the code being as it
was) on it's own is *not* a reason for introducing this bug!
> It's
> one of the simplest things we check; it's right up there with making
> sure that indexes are within limits before using them to index an
> array. Does this make us unusual?
Well, a lot of code assume the arguments are valid However, assuring the
Since it checks it for null *after* the dereference it would fail any
code review I attended even if it was *proved* than tun could never be
null. After all, the test is in the wrong place. If it was proved that
tun could never by null I might accept removing the test completely (as
the compiler is doing), but I would never accept it being left in place.
Oh, and since the reason for the test being in the wrong place was that
you can't (in C90) declare variables after the first statement, it could
be changed to...
if (!tun) return POLLERR
else {
struct sock *sk = tun->sk;
more code
}
So the argument that the variables have to be initialised on declaration
(which I understood to be part of the reason for the code being as it
was) on it's own is *not* a reason for introducing this bug!
> It's
> one of the simplest things we check; it's right up there with making
> sure that indexes are within limits before using them to index an
> array. Does this make us unusual?
Well, a lot of code assume the arguments are valid However, assuring the
From a C standards perspective, that optimization is undoubtedly
correct; the dereference of NULL invokes UB, so the implementation can
do whatever it wants after that point.
However, the problem is that in Linux/x86 _kernel_ code, dereferencing
NULL is defined (by the implementation) to _not_ trap and thus what the
compiler "knows" is wrong; the optimization is not valid in that
specific context, though it happens to be valid in Linux _user_ code.
S
--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Isaac Jaffe
Then Linux in effect needs a language extension which allows that.
Which is what the 'gcc -fno-delete-null-pointer-checks' provides.
Or, for that matter, picking optimize flags which they know will do
that. Or they could use assembly, or some trick or other which they
know the supported compiler won't optimize this way, or...
--
Hallvard
The optimization is valid, trap or no trap. The dereference
produces undefined behavior (which may or may not involve a trap),
and after that all bets are off. Once U.B. strikes, all optimizations
and transformations and perversions of the code are valid; the code
has no defined behavior that the compiler is obliged to preserve.
You might want to read again. The 'fix' patch demonstrates the error
most succinctly.
> "Although the code correctly checks to make sure the tun variable
> doesn't point to NULL, the compiler removes the lines responsible for that
> inspection during optimization routines." Although this sentence is
> slightly incomprehensible.
Because it's obfuscatorily, or simply ignorantly, presented.
The only time the check can detect potential null pointer defererences
is after one has already happened. In which case, it's at liberty to
check Scott Nudds' rear end for null pointers instead of the function's
parameter.
"indulges", eh? Nice. Better than "invokes"!
Coverity would flash a huge warning light, I'm sure. I don't have
any association with whoever makes coverity, and even today I marked
two supposed "static buffer overruns" with "Coverity's braindead",
but given the number of things it does find, I can't recommend it
highly enough.
Currently mulling over something flagged in an AES-related module,
with a 99% certainty that it's a nasty bug,
Agreed.
On the other hand, a conforming C compiler *could* assume that the
code it generates will run under Linux, and assume whatever actual
behavior Linux guarantees for a null pointer dereference. It would
then not remove the null pointer test, even though it follows an
attempt to dereference the pointer. (Removing the test is an
optimization; compilers aren't required to perform optimizations.)
But I don't think there's much doubt that the error is in the code,
not in the compiler.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
But, isn't an implementation allowed to define what happens in situations
that the Standard leaves undefined? What if the platform it's running on
explicitly states that dereferencing NULL is allowed?
--
Kenneth Brody
Some programs "flaunt" their undefined behavior ...
I'm curious what dereferencing a null pointer is defined to do.
Surely it's not something that a programmer would do intentionally.
A conforming implementation (say, gcc in some special mode) *could*
assume that dereferencing a null pointer has well-defined behavior; it
then wouldn't be allowed to remove the null pointer test that follows
the dereference. But this would only be for the benefit of code
that's buggy anyway.
It would have been nice if gcc had warned about removing the check;
that doesn't require anything Linux-kernel-specific.
Ah. I missed the little tidbit about this being kernel-level code. Thanks.
--
Kenneth Brody
Yes. The language Standard can go only so far, describing a
sort of "least common denominator" for all C implementations. The
implementations themselves (perhaps in response to other Standards)
may well add guarantees that C itself does not provide. For example,
the API's for dynamic libraries in POSIX rely on being able to convert
data pointers to and from function pointers, something C itself does
not promise will work.
> What if the platform
> it's running on explicitly states that dereferencing NULL is allowed?
Then a compiler for the platform would have to take it into
account. The problem in this Linux foofaraw might be characterized as
erroneous code (that'd be my choice), or as using the wrong compiler
for the environment (taking the view that different compiler flags
choose different compilers from one big "gcc package").
From the C point of view, the compiler's action in deleting the
test is entirely justifiable. From the C-as-in-Linux-kernel-space
point of view, it's not. The kernel (it seems) is written in a
C-with-extras language but compiled by a C-with-fewer-extras compiler;
whether that's a problem with the kernel code or with the compiler
is a question for the philosophers.
But it ain't a problem with C or the C Standard, as claimed.
I'm not familiar enough with the Linux kernel to know offhand why it's
defined... It might be that traps are disabled, or the Interrupt
Descriptor Table* is mapped there, or something else entirely. The
point is that you won't get a segfault like you (and the compiler) would
expect in user code (also defined by the implementation).
(* The IDT must live at physical addresses 0-1023 on x86, which is a
large part of why buggy DOS programs crashed the computer so frequently:
writing through a null pointer would corrupt the IDT, so the next time
that interrupt fired, the CPU would wander off and start executing
something that probably wasn't code, trap to the corrupted IDT again, etc.)
> A conforming implementation (say, gcc in some special mode) *could*
> assume that dereferencing a null pointer has well-defined behavior; it
> then wouldn't be allowed to remove the null pointer test that follows
> the dereference.
GCC _does_ have an option to disable that optimization; however,
apparently nobody had thought that it needed to be used. If they had
known it was needed, they would have fixed the code and made it
unnecessary again.
> But this would only be for the benefit of code that's buggy anyway.
... unless, of course, dereferencing the null pointer was done
deliberately, e.g. to access the IDT on a DOS system.
> It would have been nice if gcc had warned about removing the check;
> that doesn't require anything Linux-kernel-specific.
The only time such a warning would be justified is when the optimization
might be invalid, i.e. only in the Linux kernel and a few other cases
even rarer.
> Apparently one of the side effects of undefined behaviour is
> random finger-pointing:
> http://www.theregister.co.uk/2009/07/17/linux_kernel_exploit/
> I don't think anyone can walk away with their head held high
> from that little mess.
Oh, I do. The C Standard Committee can walk away without blame, and so
can the gcc crowd. The fault lies entirely with the kernel jocks.
Richard
...or they could just write code that is correct.
Richard
> Possibly I haven't tried hard enough, but I haven't located the
> context of the above code. However, since 'tun' can be null, and is
> being dereferenced without bothering to check for that possibility,
> such code would never pass a code review in my group unless the
> mechanism whereby 'tun' could be null were extremely inobvious. It's
> one of the simplest things we check; it's right up there with making
> sure that indexes are within limits before using them to index an
> array. Does this make us unusual?
Hell, no.
Richard
Sure, it's permitted; writing non-portable code with behavior that is
undefined by the C standard, when the implementation itself provides a
definition, is a completely ordinary activity in the C programming
world. However, it should be done with great care and a clear
understanding of what the implementation's defined behavior is.
> ... What if the platform it's running on
> explicitly states that dereferencing NULL is allowed?
Note: NULL is a C standard library macro; you're looking for an
adjective - the correct spelling of the adjective is 'null'.
That's precisely what happened in this case. The defined behavior for
gcc (depending upon which command line options are chosen) includes
treating the pointer's value as if it were guaranteed to be non-null,
in all subsequent code. It would appear that the authors were unaware
of that definition of the behavior, and that's what went wrong.
I disagree. The check was written explicitly in the source code.
Optimizing it away is valid, but the fact that the compiler is
eliminating user-written code indicates (at least in this case) that
the code shouldn't have been written in the first place, that the code
itself is buggy.
For example, if I write:
int foo(int *p)
{
int x = *p;
if (p == NULL) {
return -1;
};
/* ... */
}
the compiler is free to remove the if statement, but I'd like it to
warn me that I've done something stupid.
I think that's stretching things a bit. "Behavior" refers (mostly) to
run-time behavior; there is no defined behavior in this case. Based
on the lack of any definition of the behavior if the pointer is null,
the compiler is free (but not required) to assume that the pointer is
non-null; any behavior in that case is permitted for the null case as
well.
It's the run-time behavior of the entire program that becomes
undefined, as far as the C standard is concerned, not just the
statement whose execution rendered the behavior undefined.
> ... there is no defined behavior in this case.
The C standard provides no definition for the behavior, but gcc does,
as it is permitted to do. The man page for gcc (v4.2.2) defines what -
fdelete-null-pointer-check does, and defines that this option is
turned on by default when using any of the options -O2, -O3, or -Os.
I was basically curious to see whether the author had any good reason
to trust that a valid value that was stored in 'tun'. Obviously, since
the vulnerability was exploited, whatever reason the author had was
incorrect; but was it at least plausible? Since writing that message,
I've seen the context, and it involves executing a function that might
or might not have seemed trustworthy, depending upon what that
function's definition was. Apparently it was not trustworthy - but was
it supposed to be?
Of course, having the null-checking code after the pointer is
dereferenced is pointless, whether or not there seemed to be
justification for assuming that it was dereferenceable.
> Dik T. Winter wrote:
>> In article <87zlb0t...@kilospaz.fatphil.org> Phil Carmody
>> <thefatphi...@yahoo.co.uk> writes:
>> > Apparently one of the side effects of undefined behaviour is random
>> > finger-pointing:
>> > http://www.theregister.co.uk/2009/07/17/linux_kernel_exploit/
>> > I don't think anyone can walk away with their head held high from
>> > that little mess.
>> >
>> > Moral: UB is UB; if you don't want UB, don't invoke UB.
>>
>> If I read it well I think it is a compiler that modifies well-behaved
>> code to UB. "Although the code correctly checks to make sure the tun
>> variable doesn't point to NULL, the compiler removes the lines
>> responsible for that inspection during optimization routines."
>> Although this sentence is slightly incomprehensible.
Strictly speaking it is UB.
But, given the fact that the kernel-source must be considered
freestanding code, IMHO different rules _could_ apply.
( I don't know what GCC's -ffreestanding flag actually does; it possibly
just avoids library dependencies)
IIRC in freestanding environments the compiler *may* define UB to
anything it wants. (correct me if I am wrong)
I can imagine the compiler user _wants_ the "intended behaviour" (IB) to
be "If I dereference null pointers, I know what I'm doing. Trust me ..."
How else could a kernel ever address memory at location=0 ?
Seen in this light, the compiler should *not* be allowed to optimize out
the null-test, since the initialiser does not *test* anything, it just
proves that it did not trap.
> On the basis of the code sample presented, this isn't the
> situation. The code is not well-behaved, and invokes undefined
> behavior, full stop. Yes (by report), the compiler removes the test for
> NULL, but this does not introduce undefined behavior, it merely changes
> (probably) the nature of the undefined behavior already in progress.
> Since a program has no right to rely on *anything* once it indulges in
> U.B., it can in particular not rely on a NULL test occurring.
>
> After the train goes off the rails, the settings of the
> switches not yet encountered make no difference to its route.
But, I agree: the code is deceptive. The dereference in the initialiser
obfuscates things, but saves a line of code.
AvK
In the case at hand, the compiler is reported to have been
given a command-line switch to enable the optimization. So you
seem to be asking for a warning whenever the compiler does just
what you instructed it to do, which seems a bit peculiar ...
There was, I guess, some confusion since the switch that
did the code removal was not explicit on the command line, but
implied by the presence of an "omnibus" or "macro" switch that
essentially means "enable a lot of aggressive optimizations."
So although the person doing the compilations asked for useless
code to be removed, he might not have realized it. Still, he
*did* ask for it, and it would seem odd for the compiler to warn
that he got what he asked for.
However, there are other dereferences that will _not_ cause a crash in
a typical environment, and that still cause undefined behaviour if the
pointer p is a null pointer:
1. char* q = p->x; // When x has type "array of char"
2. int* q = &p->x; // When x has type "int"
3. int b = (p >= p);
4. T* q = p + 0; // When p has type "T*"
5. ptrdiff_t d = p - p;
Each of these causes undefined behaviour, and could cause the compiler
to assume that p != NULL.
For example, a compiler can always replace (p + 0 != NULL) with TRUE.
> In the case at hand, the compiler is reported to have been
> given a command-line switch to enable the optimization. So you
> seem to be asking for a warning whenever the compiler does just
> what you instructed it to do, which seems a bit peculiar ...
I assume that the compiler always does just what I instruct it to
do... Doesn't mean it shouldn't give warnings!
Now seriously: I look at warnings as a tool to give me hints where the
compiler believes that what I wrote might not be what I wanted to
write. For example, if I write "if (i = 0)" then as an intelligent
human programmer you would probably say "this is most likely not what
you wanted to write". Similar, if I had seen that code I would have
thought "this is most likely not what the programmer intended", unless
I saw a comment that explains why it is done. So I would say if the
compiler would give a warning here, that would be great.
But giving that warning _intelligently_, that is difficult. I might
have a set of macros that do null pointer checks out of paranoia -
macros that are never supposed to be used with a null pointer, but
they still check, just in case. It would be normal to have a pointer
dereference, followed by use of such a macro. Or take assert's. assert
is supposed to be used to catch programming errors. So almost by
definition, I would use assert (cond) only if I _know_ that cond
_must_ be true (because just because I _know_ that it _must_ be true
doesn't mean it is actually true).
Well, it's difficult. I would want the compiler to optimise things
away - I often write code so that it is most readable and rely on the
compiler to optimise things away. And I would want it to give warnings
where this optimisation is caused by a bug, and not in the other
cases. Very difficult to get right.
On x86, only in real mode is there an IDT in a fixed location. Which
certainly does not apply to the Linux kernel in the usual
circumstances. In any protected mode, there’s a CPU register which
points to the IDT (with complications – the being x86, of course - and
with a completely different table format than in real mode).
It's not quite clear what you're saying, but all of the
interpretations I can think of make that statement wrong.
One possibility is that you're talking about the circumstances where
behavior is UB. Those are defined by the C standard (though in some
cases, they are defined only by the absence of a definition for the
behavior). A freestanding implementation of C is not allowed to change
those definitions. However, those definitions impose somewhat fewer
requirements on freestanding implementations than on hosted ones. The
most important differences are that the startup need not be called main
(), and has an implementation-defined interface, and most of the C
standard library becomes optional. None of those differences are
relevant in this context.
Another possibility is that you're referring to the behavior that
actually occurs when the C standard says that the behavior is
undefined. All implementations, freestanding or hosted, have complete
freedom to define such behavior any way they want - that's precisely
what "undefined behavior" means.
> I can imagine the compiler user _wants_ the "intended behaviour" (IB) to
> be "If I dereference null pointers, I know what I'm doing. Trust me ..."
>
> How else could a kernel ever address memory at location=0 ?
Please note: a null pointer need not refer to location==0, and a
pointer referring to location==0 need not be a null pointer; this is
entirely up to the implementation. Try not to confuse the two
concepts. An integer constant expression with a value of 0, converted
into a pointer, is guaranteed to convert into a null pointer. However,
it's not meaningful, within the C standard, to even talk about which
memory location a null pointer points at; if an implementation chooses
to allow derefencing such pointers, it need not define the resulting
behavior as "refer to the object of the appropriate type residing at
location=0".
How a kernel addresses the memory at location==0 is entirely
implementation-specific; there's no portable way to write such code
(though dereferencing a null pointer probably has that effect on many
platforms). An implementation is not required to provide a way to do
this; if that fact renders it impossible to compile the kernel for a
particular implementation, then you shouldn't use that implementation
to compile the kernel.
> Seen in this light, the compiler should *not* be allowed to optimize out
> the null-test, since the initialiser does not *test* anything, it just
> proves that it did not trap.
It's trivial to fix this code to do what is actually desired.
However, if the kernel developer really feels a need to write
defective code, and to have the compiler produce the desired results
despite the fact that the code is defective, the a different compiler
should have been a used (such as gcc -fno-delete-null-pointer-checks).
> Moi wrote:
>>
>> IIRC in freestanding environments the compiler *may* define UB to
>> anything it wants. (correct me if I am wrong)
>
> It's not quite clear what you're saying, but all of the interpretations
> I can think of make that statement wrong.
I meant: "the compiler is allowed to _define_ what it does in case of UB.
(eg: in case of a NULL pointer: handle it as if it were not null)
>
> Another possibility is that you're referring to the behavior that
> actually occurs when the C standard says that the behavior is undefined.
> All implementations, freestanding or hosted, have complete freedom to
> define such behavior any way they want - that's precisely what
> "undefined behavior" means.
I guess this is the case I was referring to.
>
>> I can imagine the compiler user _wants_ the "intended behaviour" (IB)
>> to be "If I dereference null pointers, I know what I'm doing. Trust me
>> ..."
>>
>> How else could a kernel ever address memory at location=0 ?
>
> Please note: a null pointer need not refer to location==0, and a pointer
> referring to location==0 need not be a null pointer; this is entirely up
> to the implementation. Try not to confuse the two concepts. An integer
I know. This is c.l.c ;-)
I know as well that on the x86 a NULL pointer is represented as "all bits
zero". But this is not important, IMO; even if the NULL pointer were
represented as "all bits 1", the dereference could be Ok (apart from
alignment, of course), the code _could_ be valid, *given that NULL
pointer dereference is allowed* (and does not trap)
>
> How a kernel addresses the memory at location==0 is entirely
> implementation-specific; there's no portable way to write such code
> (though dereferencing a null pointer probably has that effect on many
> platforms). An implementation is not required to provide a way to do
> this; if that fact renders it impossible to compile the kernel for a
> particular implementation, then you shouldn't use that implementation to
> compile the kernel.
But -given it allowed the dereference- it should not optimize out the
null pointer test. Nothing was tested by the initializer.
I know: this is all about implementing/defining UB.
Slippery slope...
>
>> Seen in this light, the compiler should *not* be allowed to optimize
>> out the null-test, since the initialiser does not *test* anything, it
>> just proves that it did not trap.
>
> It's trivial to fix this code to do what is actually desired. However,
> if the kernel developer really feels a need to write defective code, and
> to have the compiler produce the desired results despite the fact that
> the code is defective, the a different compiler should have been a used
> (such as gcc -fno-delete-null-pointer-checks).
That's what I said in the last paragraph. (which you snipped)
AvK
... but Linux, which is also part of the implementation, _does_ define
what happens when you dereference a null pointer. In user code, you get
SIGSEGV; in kernel code, you read from address 0. So, the behavior is
not undefined in this particular context. Given that the Linux kernel
code obviously does not run in any other context than the Linux
implementation, it's a stretch to call it UB.
Sure, an implementation is free to define what the consequences of a
null pointer dereference are, in a way that causes no problem for such
code. An implementation is also free to define what those consequence
are, in a way that causes an immediate segmentation fault, which seems
to be the only other option you recognize.
The point, however, is that the standard it much more lenient than you
are; it allows an infinite number of other more complicated options than
those two. Even if an implementation chooses not to generate an
immediate segmentation fault, it's still also free to have the
consequences of the dereference be that, for instance, the next call to
printf() will display the string "<author> doesn't know how to avoid
dereferencing null pointers", regardless of which arguments are passed
to printf().
>> How a kernel addresses the memory at location==0 is entirely
>> implementation-specific; there's no portable way to write such code
>> (though dereferencing a null pointer probably has that effect on many
>> platforms). An implementation is not required to provide a way to do
>> this; if that fact renders it impossible to compile the kernel for a
>> particular implementation, then you shouldn't use that implementation to
>> compile the kernel.
>
>
> But -given it allowed the dereference- it should not optimize out the
> null pointer test. Nothing was tested by the initializer.
If you want to impose such a requirement, just select the appropriate
command line option for gcc. However, there's no external requirement
(such as the C standard) that prohibits such optimizations.
Ah, I thought I'd read somewhere that the protected mode IDT was kept at
the same location, but my memory was faulty.
Thinking through it more, even if the IDT _were_ at physical address
zero, dereferencing a null pointer (virtual address zero) still probably
wouldn't read the IDT since it wouldn't be mapped there. The Linux
kernel (in the normal 2+2GB mode) operates with the page mappings of the
calling process, which shouldn't have _any_ page mapped at that virtual
address. The only logical conclusion I can come up with for why you
don't get a trap is that the kernel, upon entry, disables traps -- or at
least that particular one -- as a protection against bad pointers from
user code and the CPU fudges any reads from unmapped virtual addresses.
(Note: It could also be mapped to a page marked unreadable by user code,
but either way the result is the same.)
There is no stretching involved. It's just a technical piece of jargon
whose meaning cannot be deduced simply by examining the ordinary meaning
of the individual English words that make it up. In the context of the C
standard, the term "undefined behavior" refers to behavior "for which
this International Standard imposes no requirements" (3.4.3p1), It
doesn't matter how many other sources do impose requirements, it doesn't
matter how authoritative those other sources are; it's still "undefined
behavior" as far as the C standard is concerned.
>
>"Boon" <root@localhost> ha scritto nel messaggio
>news:4a647789$0$294$7a62...@news.club-internet.fr...
>> Dik T. Winter wrote:
>>
>>> If I read it well I think it is a compiler that modifies well-behaved code
>>> to UB. "Although the code correctly checks to make sure the tun variable
>>> doesn't point to NULL, the compiler removes the lines responsible for that
>>> inspection during optimization routines." Although this sentence is
>>> slightly incomprehensible.
>>
>> No. The problem lies within the original code.
>>
>> http://isc.sans.org/diary.html?storyid=6820
>>
>> struct sock *sk = tun->sk; // initialize sk with tun->sk
>
>if tun==0 this above deference tun, and read the memory
>of address (0+offset(sk)), and this is in what i remember full UB
>here i think should segfalut
You have no idea what address a NULL pointer will attempt to
reference, if it even attempts to reference memory at all. (There is
no reason the internal representation of NULL cannot be 0xdeadbeef.)
In the overall scheme of things, the number of systems for which the
term segfault has meaning is relatively small. Furthermore, even on
those systems, there is no requirement for undefined behavior to
manifest itself this way.
You really need to stop trying to impose the limits of your experience
on the real world. Don't feel bad. None of us has enough experience
to do that reliably.
--
Remove del for email
The Linux kernel is only *part* of the implementatiom, gcc is also part
of the implementation. So if the behaviour can be considered to be
defined, it can be considered to be defined the way gcc implements it,
i.e. that subsequent checks for null will be ignored.
In any case, calling it UB as far as C is concerned is not stretching it
at all, it is precisely what it is as far as C is concerned, and gcc
makes use of the license this provides.
I would imagine that the gcc optimisation is designed to be triggered in
other circumstances where the test actually makes sense. I.e. you have
in one TU a function which is small enough to be inlined and does the
check, and before some of the calls the pointer passed has alredy been
checked and used.
--
Flash Gordon
if i have to rewrite the code the result would be this
static unsigned int tun_chr_poll(struct file* file, poll_table* wait)
{struct tun_file *tfile;
struct tun_struct *tun;
struct sock *sk;
unsigned int mask=0;
if(file==0||wait==0) R POLLERR;
tfile = file->private_data;
if((tun = __tun_get(tfile))==0) R POLLERR;
sk=tun->sk;
DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
if( poll_wait(file, &tun->socket.wait, wait)==ZERROR )
{mask=ZERROR; G label;}
// there are no conditioon for error for these function?
// the same for all other function there are no condition of error?
if(skb_queue_empty(&tun->readq)==0) mask|=(POLLIN|POLLRDNORM);
if(sock_writeable(sk)==0)
{if(test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags)==0)
if(sock_writeable(sk)) mask|=(POLLOUT|POLLWRNORM);
}
else mask|=(POLLOUT|POLLWRNORM);
if (tun->dev->reg_state != NETREG_REGISTERED)
mask = POLLERR;
label: ;
tun_put(tun);
return mask;
}
> static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
> {
> struct tun_file *tfile = file->private_data;
there is anhoter UB here if file==0 this deference 0 too
> struct tun_struct *tun = __tun_get(tfile);
> struct sock *sk = tun->sk;
> unsigned int mask = 0;
>
> if (!tun)
> return POLLERR;
>
> DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
>
> poll_wait(file, &tun->socket.wait, wait);
>
> if (!skb_queue_empty(&tun->readq))
> mask |= POLLIN | POLLRDNORM;
>
> if (sock_writeable(sk) ||
> (!test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
> sock_writeable(sk)))
> mask |= POLLOUT | POLLWRNORM;
>
> if (tun->dev->reg_state != NETREG_REGISTERED)
> mask = POLLERR;
>
> tun_put(tun);
> return mask;
> }
While you're right, I'm going to speculate that if file is NULL in this
function, there are Bigger Problems to worry about (i.e. file not being
NULL is a precondition.) But now we're way OT. :)
-Beej
> if i have to rewrite the code the result would be this
[...]
> if(file==0||wait==0) R POLLERR;
why would we care?
But that is not shown in the referred article on The Register, and that
is where I based my remark on.
--
dik t. winter, cwi, science park 123, 1098 xg amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Where in that article is a 'fix patch'?
Huh? Since the Linux kernel defines the result of following a NULL
pointer, correct code consists of following that definition. (In the
contexts where it is relevant, anyway. I hope they don't mean all
contexts.)
Since standard C renders that undefined, correct code for that is not
written as pure standard C. Since correct code is not written in
standard C, correct build process does not compile it as standard C.
--
Hallvard
>> The C standard provides no definition for the behavior, but gcc does,
>> as it is permitted to do. The man page for gcc (v4.2.2) defines what -
>> fdelete-null-pointer-check does, and defines that this option is
>> turned on by default when using any of the options -O2, -O3, or -Os.
>
> ... but Linux, which is also part of the implementation,
The kernel is only part of the implementation if you're using gcc to
compile user code (i.e. "hosted implementation"). If you're compiling the
kernel, you have to assume a freestanding implementation (i.e. it's not
gcc's fault that malloc() doesn't work for kernel code).
> _does_ define what happens when you dereference a null pointer.
It defines (or at least decides) what happens when you access the first
page of memory. That isn't necessarily the same thing as defining the
semantics of null pointer dereference (which is the compiler's
prerogative, not the kernel's).
> In user code, you get SIGSEGV; in kernel code, you read from address 0.
In user code, you get SIGSEGV if you read from address 1, too; but that
isn't a null pointer.
> So, the behavior is
> not undefined in this particular context. Given that the Linux kernel
> code obviously does not run in any other context than the Linux
> implementation, it's a stretch to call it UB.
Why "obviously"? Surely gcc isn't expected to recognise kernel code and
treat it differently?
FWIW, I think it's a stretch to say that the kernel is attempting to
"define" undefined or implementation-defined behaviour in terms of the C
standard(s). Bear in mind that not all kernel code is written in C.
The Linux kernel can't define that; it's determined by the
implementation of C being used to compile the kernel. When the
compiler is gcc, if it is used with compiler options that optimize
away null pointer checks after a given pointer has been dereferenced,
then the malfunctions caused by that optimization are part of the
behavior, defined by that implementation, of the results of
dereferencing a null pointer.
Did you read my next paragraph?
> it's determined by the
> implementation of C being used to compile the kernel.
No, what C implementations the Linux kernel can use is determined by
what the Linux kernel and the kernel code needs from it.
> When the
> compiler is gcc, if it is used with compiler options that optimize
> away null pointer checks after a given pointer has been dereferenced,
> then the malfunctions caused by that optimization are part of the
> behavior, defined by that implementation, of the results of
> dereferencing a null pointer.
Which possibly means Linux has bought itself a world of trouble by
depending on a non-standard C feature which gcc does not provide. Or
maybe it's all a storm in a teacup, since I don't know how much of the
kernel code we're talking about here, and since gcc now does provide a
flag which sounds like it defines the compiler extension Linux needs.
--
Hallvard
Yes. It didn't contain anything that would justify either re-wording
that statement or reconsidering my decision to clip the paragraph
> > it's determined by the
> > implementation of C being used to compile the kernel.
>
> No, what C implementations the Linux kernel can use is determined by
> what the Linux kernel and the kernel code needs from it.
There's no contradiction there; each C implementation determines how
it implements pointers, and in particular, how it handles dereferences
of null pointers. The kernel developers decide which C implementations
implement pointers in a fashion that is compatible with the design of
the kernel.
> > When the
> > compiler is gcc, if it is used with compiler options that optimize
> > away null pointer checks after a given pointer has been dereferenced,
> > then the malfunctions caused by that optimization are part of the
> > behavior, defined by that implementation, of the results of
> > dereferencing a null pointer.
>
> Which possibly means Linux has bought itself a world of trouble by
> depending on a non-standard C feature which gcc does not provide.
But gcc does provide it - just choose -fno-delete-null-pointer-checks.
More importantly, there's no good reason for the code to be written in
such a way that it relies on this feature; just correct the code and
the kernel will no longer be relying on it. The code doesn't do
anything useful with the result of dereferencing the null pointer.
There's no serious costs associated with changing the code so that it
checks the pointer for null before dereferencing it, rather than
after. it's just a completely pointless coding errror.
Ditto. I don't see how _anyone_ could look at that code and not think
it should be improved.
> If it was proved that tun could never by null I might accept removing
> the test completely (as the compiler is doing), but I would never accept
> it being left in place.
I do not accept that one can adequately prove tun could never be null.
There are too many times I've seen arguments that "can't be null" turn
out to be null in practice due to other bugs.
For instance, a past employer had to add null pointer checks to every
function in a several-million-LOC project because we kept finding bug
after bug where null pointers showed up in "impossible" places and
caused the product to crash due to unprotected dereferences; years
later, we finally found a tiny defect in an interrupt routine (pedants:
pretend I said signal handler) that, once every few trillion executions,
would zero out part of the stack at exactly the right place and time so
that a valid pointer passed to a function would be transformed into a
null pointer. Until we found it, though, the "unnecessary" null pointer
checks saved our customers untold headaches.
> Oh, and since the reason for the test being in the wrong place was that
> you can't (in C90) declare variables after the first statement, it could
> be changed to...
>
> if (!tun) return POLLERR
> else {
> struct sock *sk = tun->sk;
> more code
> }
That's just ugly. Given all of the GCCisms that the Linux kernel relies
on, why not make use of a few C99 features that it supports?
> So the argument that the variables have to be initialised on declaration
> (which I understood to be part of the reason for the code being as it
> was) on it's own is *not* a reason for introducing this bug!
Bah. If you want to adhere to that rule, you can still do this:
struct sock *sk = NULL;
if (!tun) return POLLERR;
sk = tun->sk;
In the particular case we're talking about, as I understand it, the
Linux kernel code wasn't *deliberately* using any such non-standard
feature. Dereferencing a null pointer might be permitted in that
context, but the code in question did so accidentally and did not
make any intentional use of the result. (If I've gotten this wrong,
I have no doubt someone will correct me.)
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Yes, this particular case was a bug. But this is about the general case
- quoting Stephen Sprunk upthread:
"However, the problem is that in Linux/x86 _kernel_ code, dereferencing
NULL is defined (by the implementation) to _not_ trap"
--
Hallvard
Then we are not having the same discussion.
>>> it's determined by the
>>> implementation of C being used to compile the kernel.
>>
>> No, what C implementations the Linux kernel can use is determined by
>> what the Linux kernel and the kernel code needs from it.
>
> There's no contradiction there; each C implementation determines how
> it implements pointers, and in particular, how it handles dereferences
> of null pointers. The kernel developers decide which C implementations
> implement pointers in a fashion that is compatible with the design of
> the kernel.
Yes. Then what are you disagreeing with me about? Why are you denying
that the kernel can define what happens when you follow a null pointer?
>>> When the
>>> compiler is gcc, if it is used with compiler options that optimize
>>> away null pointer checks after a given pointer has been dereferenced,
>>> then the malfunctions caused by that optimization are part of the
>>> behavior, defined by that implementation, of the results of
>>> dereferencing a null pointer.
>>
>> Which possibly means Linux has bought itself a world of trouble by
>> depending on a non-standard C feature which gcc does not provide.
>
> But gcc does provide it - just choose -fno-delete-null-pointer-checks.
Except apparently the build process didn't use it. Which also might
mean it doesn't do exactly what Linux expects. For that matter, the
option is a decade younger than Linux. Anyway, just speculating since
I'm not about to dig into the Linux documents and there is rather more
FUD than facts in this thread.
> More importantly, there's no good reason for the code to be written in
> such a way that it relies on this feature; just correct the code and
> the kernel will no longer be relying on it.
There's nothing to correct if it's not a bug. It's not a bug to depend
on a compiler extension. If that's what they were doing (and specifying
that they were doing).
> The code doesn't do
> anything useful with the result of dereferencing the null pointer.
> There's no serious costs associated with changing the code so that it
> checks the pointer for null before dereferencing it, rather than
> after. it's just a completely pointless coding errror.
The particular code in this case, no. That was a bug. But that's not
what I was replying to, nor what you said you disagreed with me about.
--
Hallvard
I think he means "tun could never be NULL with subsequent defined
behavior" in this case. If it is NULL, there's no requirement to do
anything particularly well-behaved after the dereference, and this
especially means not bothering to redundantly check if it's NULL at that
point.
tun effect
-------- --------------------------------------------------------------
not NULL if (tun == NULL) is always false
NULL if (tun == NULL) is UB (because of the previous dereference)
So for all cases of tun, the compiler is free and happy to remove the if
statement, and it apparently did.
-Beej
The implementation may specify that a pointer with all zero bits
in its internal representation is dereferencable, the C standard
permits that (though one has to work for it). That's different
from a NULL pointer though.
I've worked on a system where a NULL pointer, e.g. (void*)0,
was represented by 0x8000000 (and any address with the top bit
set would cause a fault).
Phil
--
If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the
drug-addicted gang member who had committed his first murder
before he had sex, which was rape. -- Erik Naggum (1965-2009)
WOH! Quite the opposite! I defeference NULL pointers frequently.
They're part of the BUG() macro that linux defines for my platform.
It's basically the please-die-horribly-right-now command (like an
assert(0), say). On my platform the all-0-bits address dereference
is the easiest way to invoke that, and the implementation turns a
NULL pointer dereference into an all-0-bits address dereference.
The whole thing relies on cooperation between the chip and the
compiler, but it never claimed, or even tried, to be portable.
> A conforming implementation (say, gcc in some special mode) *could*
> assume that dereferencing a null pointer has well-defined behavior; it
> then wouldn't be allowed to remove the null pointer test that follows
> the dereference. But this would only be for the benefit of code
> that's buggy anyway.
>
> It would have been nice if gcc had warned about removing the check;
> that doesn't require anything Linux-kernel-specific.
I'd like to see what one of the deeper static source code analysis
tools would have said about it. The fact that gcc was able to optimise
the check away implies that it knew that that could ignore something
the programmer wrote, and you're right, that's possibly worthy of
flagging.
Warning: you don't know what you're doing at line 123
False assumptions may lead to false conclusions. You seem to be
unaware of the LIDT instruction which has only existed in the
instruction set for 26 years. Yes, even I'm surprised it's 26,
but it is.
Quite possibly - but until you are more specific about what you found
objectionable in my statement, that's the best response I can give you.
>>>> it's determined by the
>>>> implementation of C being used to compile the kernel.
>>> No, what C implementations the Linux kernel can use is determined by
>>> what the Linux kernel and the kernel code needs from it.
>> There's no contradiction there; each C implementation determines how
>> it implements pointers, and in particular, how it handles dereferences
>> of null pointers. The kernel developers decide which C implementations
>> implement pointers in a fashion that is compatible with the design of
>> the kernel.
>
> Yes. Then what are you disagreeing with me about? Why are you denying
> that the kernel can define what happens when you follow a null pointer?
Because the kernel designers can define what they want to have happen,
and on the basis of that definition rule out the use of certain
compilers, but it's the compiler that they chose which defines what
actually happens.
>> More importantly, there's no good reason for the code to be written in
>> such a way that it relies on this feature; just correct the code and
>> the kernel will no longer be relying on it.
>
> There's nothing to correct if it's not a bug. It's not a bug to depend
> on a compiler extension. If that's what they were doing (and specifying
> that they were doing).
It's a bug if dependence on the compiler extension was unintentional,
which seems to be what happened in this case. The code looks very much
like the possibility that 'tun' might be null was given only pro-forma
consideration, and as a result the test for that possibility was misplaced.
For what possible reason should they NOT replace:
struct sock *sk = tun->sk;
if (!tun)
return POLLERR;
with
struct sock *sk;
if(!tun)
return POLLERR;
sk = tun->sk;
?
Ok -- but does it *ever* make sense for kernel code to depend on this?
Hypothesis: Attempting to dereference a null pointer in Linux kernel
code always indicates a bug. The environment arranges not to trap
on such a dereference only for the purpose of avoiding drastic
consequences for such bugs, not to enable any useful behavior.
The code in question was compiled by gcc with
-fdelete-null-pointer-checks turned on, either explicitly or as a
side-effect of the other options chosen. That implementation of C does
not define the behavior that results from dereferencing NULL in the
manner he described, so his statement was incorrect. It's the C compiler
which defines such things, not the kernel. The kernel might come with
documentation that specifies something else, but that documentation has
no control over what actually happens; only the compiler has that control.
Absolutely correct in the general case. In this specific case,
however (Linux on x86 compiled with gcc), a null pointer happens to be
all-bits-zero, and the implementation may rely on that guarantee.
But they never ran into the situation anyway, so they didn't need
to walk away. They can stand back and feel smug, if they so desire,
as they have the right to.
I spotted some more UB in the core kernel yesterday, and I noticed that
the maintainer of the component it was part of was the same guy who
contributed the UB above. (Un)fortunately it's not exploitable ;-)
I was trying to find C&V in the standard yesterday, but failing -
can someone give me a pointer or reference to where it says the
following is baaaaaaaad.
struct { int x; int foo[32]; int bar[32]; } *p;
//...
p->foo[32] = thing();
p->bar[0] = IdontcarewhatIjustdidasImrepairingitnow();
Much obliged,
The implementation may define anything it wants in cases of UB.
It may define a dog to have 5 tails, for example.
> I can imagine the compiler user _wants_ the "intended behaviour" (IB) to
> be "If I dereference null pointers, I know what I'm doing. Trust me ..."
>
> How else could a kernel ever address memory at location=0 ?
I can think of at least two methods.
1) it can use an implementation which doesn't use the all-bits-zero value
as the null pointer. (I've seen this on some embedded systems where address 0
was necessary to access, for example.)
2) it can use an implementation which supplies a
writeAtAddress0(char const*,size_t) function.
You appear to be confusing NULL with the all-bits-zero address. I admit,
the ability to use ``0'' as a null pointer constant invites such
misinterpretation, but it's still a misinterpretation.
> Seen in this light, the compiler should *not* be allowed to optimize out
Too late to apply your rules, it already is allowed according to the
rules of the language.
Processors do not define what address value the NULL pointer will
represent, C implementations do.
If you're trying to imply that I claimed the fix patch was *in* the
article, you're up for a slapping. (But only a gentle one, as English
isn't your first language, though judging by your normal usage that's
hard to believe.)
However, if you're genuinely interested to see the most succint demonstration
of what's UB in the buggy code, then the fix patch is two clicks away, via
the exploit. Didn't your mother teach you about the wonders of hypertext?!
> I was trying to find C&V in the standard yesterday, but failing -
> can someone give me a pointer or reference to where it says the
> following is baaaaaaaad.
>
> struct { int x; int foo[32]; int bar[32]; } *p;
> //...
> p->foo[32] = thing();
> p->bar[0] = IdontcarewhatIjustdidasImrepairingitnow();
In C89, 3.3.2.1 (Array subscripting) and 3.3.6 (Additive operators)
"""
A postfix expression followed by an expression in square brackets [] is a
subscripted designation of a member of an array object. The definition of
the subscript operator [] is that E1[E2] is identical to (*(E1+(E2)))
"""
and
"""
Thus if P points to a member of an array object, the expression P+1 points to
the next member of the array object. Unless both the pointer operand and the
result point to a member of the same array object, or one past the last member
of the array object, the behavior is undefined. Unless both the pointer operand
and the result point to a member of the same array object, or the pointer
operand points one past the last member of an array object and the result points
to a member of the same array object, the behavior is undefined if the result is
used as the operand of a unary * operator.
"""
p->foo[32] invokes UB.
Many many thanks - I didn't think of looking there!
Bringing into C99-land (in particular n1256):
6.5.6 Additive operators
8 When an expression that has integer type is added to or subtracted
from a pointer, the
result has the type of the pointer operand. If the pointer operand
points to an element of
an array object, and the array is large enough, the result points to
an element offset from
the original element such that the difference of the subscripts of
the resulting and original
array elements equals the integer expression. In other words, if the
expression P points to
the i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th
and i-n-th elements of
the array object, provided they exist. Moreover, if the expression P
points to the last
element of an array object, the expression (P)+1 points one past the
last element of the
array object, and if the expression Q points one past the last
element of an array object,
the expression (Q)-1 points to the last element of the array object.
If both the pointer
operand and the result point to elements of the same array object,
or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the
behavior is undefined. If the result points one past the last
element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
I don't think that's worded perfectly, but the meat is all there.
Thanks again,
Phil
--
Ug, googoogroups is icky!
>>>> The Linux kernel can't define that;
>>> (...snip..)
>>
>> In the particular case we're talking about, as I understand it, the
>> Linux kernel code wasn't *deliberately* using any such non-standard
>> feature. Dereferencing a null pointer might be permitted in that
>> context, but the code in question did so accidentally and did not
>> make any intentional use of the result. (If I've gotten this wrong,
>> I have no doubt someone will correct me.)
>
> Yes, this particular case was a bug. But this is about the general case
> - quoting Stephen Sprunk upthread:
>
> "However, the problem is that in Linux/x86 _kernel_ code, dereferencing
> NULL is defined (by the implementation) to _not_ trap"
I don't agree with that assessment.
Several people are claiming that the fact that the kernel maps RAM into
the first page means that the kernel developers are trying to *define* C's
null pointer semantics. But in the absence of actual evidence, that's pure
conjecture.
Several people are claiming that the fact that the kernel maps RAM into
the first page means that the kernel developers are trying to *define* C's
null pointer semantics. But in the absence of actual evidence, that's pure
conjecture.
Perhaps, but my article was based on that article, and then you stated that
I have to read it again, I can find nothing more than what I already have
written about it.
> However, if you're genuinely interested to see the most succint demonstration
> of what's UB in the buggy code, then the fix patch is two clicks away, via
> the exploit. Didn't your mother teach you about the wonders of hypertext?!
Not entirely two clicks with the configuration I am using. First download
a gzipped tar-file containing the exploit. Looking for links in the
source files downloaded and follow that... However, why should I expect
to find a fix in the exploit?
I'm not doubting you for a minute, but I'd like someone else to
independently confirm each of the above.
Doesn't gcc use (2) for its offsetof() macro? (Answering self - yes,
if there isn't a __compiler_offsetof macro, but I don't know when
that's the case, it seems that if it's defined, it's defined to be
__builtin_offsetof.)
Cheers,
Phil
Obviously a dereference of a null pointer so undefined.
>> 2. int* q = &p->x; // When x has type "int"
Equivalent to
int *q = &((*p).x)
So the & is not directly applied to the result of the * so the rule that
& and * can cancel out does not apply, so still a dereference of a null
pointer and undefined behaviour.
>> 3. int b = (p >= p);
A comparison other than for (in)equality when the pointers do not point
to (or one passed the end of) the same object (as they do not point to
*any* object). So undefined behaviour.
>> 4. T* q = p + 0; // When p has type "T*"
Addition to a pointer is only defined if the pointer points to an object
(or one passed the end of an object) so undefined behaviour.
>> 5. ptrdiff_t d = p - p;
Subtraction of pointers from each other when the pointers do not point
to (or one passed the end of) the same object (as they do not point to
*any* object). So undefined behaviour.
>> Each of these causes undefined behaviour, and could cause the compiler
>> to assume that p != NULL.
>>
>> For example, a compiler can always replace (p + 0 != NULL) with TRUE.
>
> I'm not doubting you for a minute, but I'd like someone else to
> independently confirm each of the above.
>
> Doesn't gcc use (2) for its offsetof() macro? (Answering self - yes,
> if there isn't a __compiler_offsetof macro, but I don't know when
> that's the case, it seems that if it's defined, it's defined to be
> __builtin_offsetof.)
The implementation is allowed to use anything it wants in its headers to
achieve the desired result, including things which are undefined
behaviour if *you* do them. It would be legal for an implementation to
provide the following definition of the offestof macro
#defined offsetof(type,field) (size_t)(*0)
Just as long as it implemented some magic to make it work (for instance,
completely ignoring the definition and just doing the right thing). I
doubt you will see it done, but it would not make the implementation
non-conforming as far as I can see.
In fact, one reason for some of the macros (including offestof) that the
implementation is required to provide is because *you* cannot implement
them without invoking undefined behaviour!
--
Flash Gordon
>>> 1. char* q = p->x; // When x has type "array of char"
>>> 2. int* q = &p->x; // When x has type "int"
>>> 3. int b = (p >= p);
>>> 4. T* q = p + 0; // When p has type "T*"
>>> 5. ptrdiff_t d = p - p;
[SNIP - 5 expositions on why they're UB]
Many thanks. I agree with all the reasoning, so we're all on the same
page. I still think that section 8 of additive operators needs a rewrite
though.
I must say that I'm a bit disappointed by #4. If any merit could be
squeezed out of the use of any of them, that was the only drop I
could detect.
I notice that gcc-4.3 is smart enough to fold the explicit
int cmp(void*p,void*q)
{
return p==q || p>q;
}
into a single test:
cmp:
movl 8(%esp), %eax
cmpl %eax, 4(%esp)
setae %al
movzbl %al, %eax
ret
So at least there's no apparent cost for (one instance of) going out
of your way if you know that NULLs are possible pointer values (that
you wish to compare as if they are less than any real pointer to your
array - this is probably a bad design, but is at least workable).
Glad to be of help :-)
> I agree with all the reasoning, so we're all on the same
> page.
Well, I've had some training as a pedantic git...
> I still think that section 8 of additive operators needs a rewrite
> though.
>
> I must say that I'm a bit disappointed by #4. If any merit could be
> squeezed out of the use of any of them, that was the only drop I
> could detect.
I can't think of any real situation where knowing it was valid would
help. After all, for the real world the 0 would be in a value in a
variable (why would you add a constant 0?) so you would have to know
that in all situations where p was null the value added would be 0, and
I can't imagine when that would be true! Oh, and you could always do a
test if it was possible!
> I notice that gcc-4.3 is smart enough to fold the explicit
> int cmp(void*p,void*q)
> {
> return p==q || p>q;
> }
> into a single test:
> cmp:
> movl 8(%esp), %eax
> cmpl %eax, 4(%esp)
> setae %al
> movzbl %al, %eax
> ret
I'm too out of practice (and out of date) on x86 assembler to read that,
so I'll take your word it is sensible...
> So at least there's no apparent cost for (one instance of) going out
> of your way if you know that NULLs are possible pointer values (that
> you wish to compare as if they are less than any real pointer to your
> array - this is probably a bad design, but is at least workable).
Well, compilers are pretty good at the easy optimisations.
--
Flash Gordon
> On 19 July, 21:29, Beej Jorgensen <b...@beej.us> wrote:
> > PS. Interestingly, it appears legal to dereference a NULL pointer as
> > long as you take the address of it immediately thereafter:
> >
> > � int *a = NULL, *b, c;
> > � b = &*a; // ok, same as "b = a"
> > � c = *a; �// bad
> >
> > (C99 6.5.3.2p3-4, fn87)
>
> I would have thought that this didn't count as dereferencing the
> pointer. Dereferencing gives you a value; you can't apply & to a value
> to ask "where is this stored?". Not that I'm an expert.
In C semantics dereferencing a valid (data) pointer gives you an
*lvalue*, which you can take the & of; any sane compiler collapses
these to a no-op (just keeps the pointer). The C99 change makes it
official even for a null pointer, which doesn't point to a valid
object. Most other uses -- not all -- of the lvalue coerce it to an
rvalue, which is what many people would call the actual dereference.