[LLVMdev] built-in longjmp and setjmp

847 views
Skip to first unread message

Akira Hatanaka

unread,
Apr 12, 2011, 4:56:36 PM4/12/11
to llv...@cs.uiuc.edu
Does the X86 backend (or any other backend) correctly implement support for __builtin_setjmp and __builtin_longjmp?
I don't get the correct result when I compile and run the following code with clang.

# clang foo.c -O3; ./a.out

#include <stdio.h>
void *buf[20];
void __attribute__((noinline))
foo (void)
{
  __builtin_longjmp (buf, 1);
}

int
main (int argc, char** argv)
{
  if (__builtin_setjmp (buf))
    {
      printf("return\n");
      return 0;
    }

  printf("call foo\n");
    foo ();

  return 1;
}

Jim Grosbach

unread,
Apr 12, 2011, 5:38:50 PM4/12/11
to Akira Hatanaka, llv...@cs.uiuc.edu
ARM/Darwin implements them. I'm not aware of any others.

That said, they are designed for internal use by the compiler for exception handling. Calling them directly like this is very much not recommended. Using the system library setjmp()/longjmp() functions is preferred.

-Jim

> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Akira Hatanaka

unread,
Apr 12, 2011, 5:54:25 PM4/12/11
to Jim Grosbach, llv...@cs.uiuc.edu
What would be the best way to convert built-in setjmp and longjmp tp library calls?
Should it be implemented in clang or in backends?  

Jim Grosbach

unread,
Apr 12, 2011, 6:15:19 PM4/12/11
to Akira Hatanaka, llv...@cs.uiuc.edu
If you want an automated method, then using the source code re-writer interfaces in clang is probably a reasonable starting place. Just modifying the source code manually is probably easier, though, to be honest.

As a moderate caveat to all of this, there are some bits of code out there that use these builtins that are very tightly coupled to the compiler (the Linux kernel used to do this, I think, and maybe still does). Those sorts of situations are unlikely to be solved satisfactorily by moving to library calls (performance reasons, usually). The appropriate solution there will be very situation specific and will likely involve refactoring the implementations in question to some degree.

Regards,
Jim

John McCall

unread,
Apr 12, 2011, 8:51:13 PM4/12/11
to Jim Grosbach, Akira Hatanaka, llv...@cs.uiuc.edu
On Apr 12, 2011, at 3:15 PM, Jim Grosbach wrote:
> If you want an automated method, then using the source code re-writer interfaces in clang is probably a reasonable starting place. Just modifying the source code manually is probably easier, though, to be honest.
>
> As a moderate caveat to all of this, there are some bits of code out there that use these builtins that are very tightly coupled to the compiler (the Linux kernel used to do this, I think, and maybe still does). Those sorts of situations are unlikely to be solved satisfactorily by moving to library calls (performance reasons, usually). The appropriate solution there will be very situation specific and will likely involve refactoring the implementations in question to some degree.

Are these intrinsics really prohibitively difficult to implement? I'm not suggesting that you (or anyone else) particularly needs to do them, but is there more to them than clobbering all registers and then lowering to a quick series of instructions which save/restore the current IP, SP, and (maybe?) FP? Something seems very wrong if rewriting a custom refactoring tool to turn builtin_setjmp/longjmp into library calls could possibly be simpler than just adding support for these intrinsics to one or two more targets.

John.

Akira Hatanaka

unread,
Apr 13, 2011, 12:51:16 PM4/13/11
to John McCall, llv...@cs.uiuc.edu
It seems straightforward to implement, if it just needs to be functionally correct.

I have another question about setjmp/longjmp. When the following program is compiled and run with argument 10 (./a.out 10), should it print 10 or 23? I am asking this question because it prints 23 when compiled with gcc and prints 10 when compiled with clang. If it is supposed to return 23, it seems to me that saving and clobbering registers is not enough to guarantee correctness of the compiled program.    

#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

jmp_buf buf;

void __attribute__((noinline))
sub2 (void)
{

  longjmp (buf, 1);
}

int
main (int argc, char** argv)
{
  int n = atoi(argv[1]), r;

  if ((r = setjmp (buf)))
    {
      printf("n = %d\n", n);
      return 0;
    }

    n += 13;
    sub2 ();

  return n;

Akira Hatanaka

unread,
Apr 13, 2011, 1:01:18 PM4/13/11
to John McCall, llv...@cs.uiuc.edu
After further investigation, I found that it prints 10 when it is compiled with gcc with option -O3.

Jakob Stoklund Olesen

unread,
Apr 13, 2011, 1:05:09 PM4/13/11
to Akira Hatanaka, llv...@cs.uiuc.edu

On Apr 13, 2011, at 9:51 AM, Akira Hatanaka wrote:

> int
> main (int argc, char** argv)
> {
> int n = atoi(argv[1]), r;
>
> if ((r = setjmp (buf)))
> {
> printf("n = %d\n", n);
> return 0;
> }

Non-volatile local variables are not preserved by setjmp(), so this program can print whatever it wants.

/jakob

Eli Friedman

unread,
Apr 13, 2011, 1:05:36 PM4/13/11
to Akira Hatanaka, llv...@cs.uiuc.edu

Neither output is wrong.

C99 7.13.2.1p3:
All accessible objects have values, and all other components of the
abstract machine212)
have state, as of the time the longjmp function was called, except
that the values of
objects of automatic storage duration that are local to the function
containing the
invocation of the corresponding setjmp macro that do not have
volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.

-Eli

Akira Hatanaka

unread,
Apr 27, 2011, 2:38:45 PM4/27/11
to llv...@cs.uiuc.edu
I have another basic question about setjmp/longjmp.

When I compile and run the following program, is it expected that global variable gi2 will be incremented twice? It seems that the code generated with clang and llc increments it only once (line 37-43 of attached file).

$ clang setjmp6.c -o setjmp6.arm.ll -emit-llvm -O3 -S -ccc-host-triple arm-unknown-darwin -ccc-clang-archs arm
$ llc setjmp6.arm.ll -o setjmp6.arm.s

#include <stdio.h>
#include <stdlib.h>
void *buf[20];

int gi2 = 0;

void __attribute__ ((noinline)) sub2 (void)
{
  __builtin_longjmp (buf, 1);
}

int

main (int argc, char **argv)
{
  int n = atoi (argv[1]);
  int r = __builtin_setjmp (buf);
  ++gi2;

  if (r)
    {
      printf ("setjmp %d\n", n + gi2);
      return 0;
    }

  sub2 ();

  return 0;
}


On Wed, Apr 13, 2011 at 10:05 AM, Jakob Stoklund Olesen <stok...@2pi.dk> wrote:

On Apr 13, 2011, at 9:51 AM, Akira Hatanaka wrote:

> int
> main (int argc, char** argv)
> {
>   int n = atoi(argv[1]), r;
>
>   if ((r = setjmp (buf)))
>     {
>       printf("n = %d\n", n);
>       return 0;
>     }


/jakob


setjmp6.arm.s
dag.main.dot

Duncan Sands

unread,
Apr 27, 2011, 3:25:22 PM4/27/11
to llv...@cs.uiuc.edu
Hi Akira,

> When I compile and run the following program, is it expected that global
> variable gi2 will be incremented twice? It seems that the code generated with
> clang and llc increments it only once (line 37-43 of attached file).

try marking gi2 volatile.

Ciao, Duncan.

Akira Hatanaka

unread,
Apr 27, 2011, 4:22:08 PM4/27/11
to llv...@cs.uiuc.edu
I declared gi2 as "volatile" and I think gi2 is still incremented once.
Here is a snippet of the code. Line 39 - 42 increments gi2.

According to the standard, shouldn't ++gi2 be executed twice regardless of whether gi2 is volatile or not? Isn't the missing chain from EH_SJLJ_SETJMP node to load/store nodes that access gi2 causing this problem (please see attached file in my previous email)?

# line 39 - 47
 ldr r1, LCPI1_1
 ldr r2, [r1]
 add r2, r2, #1
 str r2, [r1]
 add r4, pc, #8              @ eh_setjmp begin
 str r4, [r0, #4]
 mov r0, #0
 add pc, pc, #0
 mov r0, #1                  @ eh_setjmp end

 ...
LCPI1_1:
  .long _gi2
  .align  2
f.s

Jim Grosbach

unread,
Apr 27, 2011, 5:03:33 PM4/27/11
to Akira Hatanaka, llv...@cs.uiuc.edu
There is no C standard to follow for these builtins. You are expecting them to behave as if they were the standard library calls. They are not equivalent and the naming similarity is an unfortunate historical artifact. Use the standard library functions instead.

-Jim

> <f.s>_______________________________________________

Akira Hatanaka

unread,
Apr 27, 2011, 6:45:39 PM4/27/11
to Jim Grosbach, llv...@cs.uiuc.edu
Okay. I understand builtin functions do not have to behave exactly the same way as standard library functions. What I wanted to know is what should the code generated by llvm (clang + llc) look like (I am working on the Mips back-end now). I guess there should be a behavior users expect to see who are using __builtin_setjmp/longjmp even they aren't the same as library functions. If the code generated by arm-darwin is acceptable, does it mean that you can freely move code that accesses a global variable above the call to setjmp?

Jim Grosbach

unread,
Apr 27, 2011, 6:55:53 PM4/27/11
to Akira Hatanaka, llv...@cs.uiuc.edu
The builtins are for internal compiler use in the context of SjLj exception handling. Any other use, including any direct calls of the builtins in user code, are a bad idea with no guaranteed behaviour. That they're exposed at all is, again, for historical purposes. Don't use them.

-Jim

Joerg Sonnenberger

unread,
Apr 27, 2011, 7:08:13 PM4/27/11
to llv...@cs.uiuc.edu
On Wed, Apr 27, 2011 at 03:55:53PM -0700, Jim Grosbach wrote:
> The builtins are for internal compiler use in the context of SjLj
> exception handling. Any other use, including any direct calls of the
> builtins in user code, are a bad idea with no guaranteed behaviour.
> That they're exposed at all is, again, for historical purposes. Don't use them.

Why is longjmp converted into calls to the builtin then?
See PR 8765.

Joerg

Akira Hatanaka

unread,
Apr 27, 2011, 7:04:10 PM4/27/11
to Jim Grosbach, llv...@cs.uiuc.edu
Okay. Are you saying that you shouldn't use __builtin functions in general in your program or just __builtin_setjmp/longjmp? Also, are there any warnings issued by either clang or llvm if they are used in your program?

Jim Grosbach

unread,
Apr 27, 2011, 7:11:27 PM4/27/11
to Akira Hatanaka, llv...@cs.uiuc.edu
Just __builtin_setjmp() and __builtin_longjmp(). They're "special" in all sorts of not-very-fun ways. Builtins in general are completely fine.

I'm not aware of any diagnostic regarding them. Off the top of my head, adding one, especially when using something like the -pedantic option, sounds perfectly reasonable. The CFE guys would have a better handle on that aspect of things.

-Jim

Jim Grosbach

unread,
Apr 27, 2011, 7:25:30 PM4/27/11
to Joerg Sonnenberger, llv...@cs.uiuc.edu

On Apr 27, 2011, at 4:08 PM, Joerg Sonnenberger wrote:

> On Wed, Apr 27, 2011 at 03:55:53PM -0700, Jim Grosbach wrote:
>> The builtins are for internal compiler use in the context of SjLj
>> exception handling. Any other use, including any direct calls of the
>> builtins in user code, are a bad idea with no guaranteed behaviour.
>> That they're exposed at all is, again, for historical purposes. Don't use them.
>
> Why is longjmp converted into calls to the builtin then?
> See PR 8765.


Hi Joerg,

If I follow what's happing in PR8765 correctly, it's a bit different. setjmp/longjmp calls are never lowered to the builtin EH intrinsics. Unfortunately, "builtin" is a bit of an overloaded term. :( Something else is recognizing the "setjmp" name as special and is doing something with it (e.g., SelectionDAGISel checks for it as well as a few other "returns twice" functions).

-Jim

Joerg Sonnenberger

unread,
Apr 27, 2011, 8:24:20 PM4/27/11
to llv...@cs.uiuc.edu
On Wed, Apr 27, 2011 at 04:25:30PM -0700, Jim Grosbach wrote:
>
> On Apr 27, 2011, at 4:08 PM, Joerg Sonnenberger wrote:
>
> > On Wed, Apr 27, 2011 at 03:55:53PM -0700, Jim Grosbach wrote:
> >> The builtins are for internal compiler use in the context of SjLj
> >> exception handling. Any other use, including any direct calls of the
> >> builtins in user code, are a bad idea with no guaranteed behaviour.
> >> That they're exposed at all is, again, for historical purposes. Don't use them.
> >
> > Why is longjmp converted into calls to the builtin then?
> > See PR 8765.
>
>
> Hi Joerg,
>
> If I follow what's happing in PR8765 correctly, it's a bit different.
> setjmp/longjmp calls are never lowered to the builtin EH intrinsics.

Yes, this is not about using them for exception handling.

> Unfortunately, "builtin" is a bit of an overloaded term. :( Something
> else is recognizing the "setjmp" name as special and is doing something
> with it (e.g., SelectionDAGISel checks for it as well as a few other
> "returns twice" functions).

I suppose this is the normal common library name detection logic. Point
of my inquire in this context is whether the mapping to the builtin
gives anything over just applying the "returns twice" attribute.
Doing only the latter would not have issues with the external name
mangling.

Joerg

Reply all
Reply to author
Forward
0 new messages