ATS and Clang-3.5

207 views
Skip to first unread message

gmhwxi

unread,
Jan 26, 2015, 7:05:49 PM1/26/15
to ats-lan...@googlegroups.com
FYI.

I once used (Clang-3.2 -O2) to compile ATS2 successfully.

Today, I noted that neither (Clang-3.4 -O2) nor (Clang-3.5 -O2) can succeed in
compiling ATS2. The patsopt generated by these compilers crashes immediately.
However, (Clang-3.5 -O1) can do it. So it is very suspicious that new optimizations
added into Clang-3.4 and Clang-3.5 were the cause of this failure. This is a very
unfortunate situation!

For now, (Gcc-4.8 -O2) is the only optimizing compiler that can compile ATS2.

Raoul Duke

unread,
Jan 26, 2015, 7:09:04 PM1/26/15
to ats-lang-users
things like this, and the bugs seen in the Tahoe LAFS project, make me
hold the opinion that compiler optimizations are basically kinda
almost purely bad deals with the devil.

Brandon Barker

unread,
Jan 26, 2015, 7:19:03 PM1/26/15
to ats-lang-users

I should test ICC.. but I am scared.

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at http://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/0489dc9f-652e-4a55-aa6e-030ac740561d%40googlegroups.com.

Greg Fitzgerald

unread,
Jan 26, 2015, 8:18:43 PM1/26/15
to ats-lan...@googlegroups.com
> Today, I noted that neither (Clang-3.4 -O2) nor (Clang-3.5 -O2) can succeed in
> compiling ATS2. The patsopt generated by these compilers crashes immediately.

It is expected for this class of error to occur if ATS2 depends on
undefined behavior. For example, if you use memcpy() instead of
memset() to copy overlapping regions, then bumping up to -O2 may pull
in a vectorized version, which could manifest itself as a memory
corruption. To weed out runtime instances of undefined behavior,
compile and link your code with "-fsanitize=undefined", and then run
'patsopt' from the command-line. If undefined behavior is detected,
the runtime will report an error message on stderr.

Another useful one is "-fsanitze=address", which is used the same way.
Just before segfaulting, it should spit out a stack trace of where the
memory violation occurred. That might solve your problem as well, but
for this particular case (upgrading the compiler or enabling more
optimizations) checking for undefined behavior first should be better
for determining the root cause.

-Greg

Hongwei Xi

unread,
Jan 26, 2015, 8:51:26 PM1/26/15
to ats-lan...@googlegroups.com
By using gdb, I first guessed that this was due to the compilation of
a file of the name "pats_constraint_solve.dats". So I did the following:

1. Compiling pats_constraint_solve.dats using (gcc-4.8 -O2)
2. Compiling the rest using (clang-3.5 -O2)

It actually worked!

So the good news is that this issue seems to be contained in one single file.


Hongwei Xi

unread,
Jan 26, 2015, 8:58:23 PM1/26/15
to ats-lan...@googlegroups.com
I followed your suggestions.

There were 22M bytes of leaks reported (-fsanitize=address)
There were no undefined behaviors reported (-fsanitize=undefined)

Leaks are expected because I turned off GC.

As I see it, the chance of fixing such a bug is really next to none.

Greg Fitzgerald

unread,
Jan 27, 2015, 12:36:29 AM1/27/15
to ats-lan...@googlegroups.com
> As I see it, the chance of fixing such a bug is really next to none.

Fixing clang 3.4 or 3.5? That sounds accurate. But if we move
quickly we might be able to squeeze a fix into 3.6. Can you send me
an isolated C fragment.

Thanks,
Greg
> https://groups.google.com/d/msgid/ats-lang-users/CAPPSPLpbNS6O%2B8doygqooiuejHnYCQ76CjmZt%3DoWodUX0quaug%40mail.gmail.com.

Hongwei Xi

unread,
Jan 27, 2015, 3:11:19 AM1/27/15
to ats-lan...@googlegroups.com
I finally found a way to circumvent the problem.


implement{a}
myintvec_cffgcd
  {n}(iv, n) = let
//
var res
  : myint(a) = myint_make_int<a> (0)
val p_res = &res
//
// HX-2015-01-27:
// fixing a bug in (clang-3.5 -O2)
//
// HX: a dummy to force clang
// *not* to optimize the value stored in [res]
val ((*void*)) = ptr_as_volatile(p_res)
//

If the call to ptr_as_volatile is removed, then clang-3.5
may use the (linear) value in [res] repeatedly, causing this
value to be freed more than once.

The C code is attached. Searching for 'pats_as_volatile' should lead
you to the spot where the bug occurs.

pats_constraint3_solve_dats.c

Hongwei Xi

unread,
Jan 27, 2015, 3:15:13 AM1/27/15
to ats-lan...@googlegroups.com
Now clang-3.5 can be used to compile ATS2 under both Linux and OSX:

https://travis-ci.org/githwxi/ATS-Postiats-test

Cheers!

Hongwei Xi

unread,
Jan 27, 2015, 1:44:26 PM1/27/15
to ats-lan...@googlegroups.com
But asking programmers to write optimized on their own would be a lot worse!

I think it is time to ask/find someone to use CompCert to compile ATS2 :)


--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at http://groups.google.com/group/ats-lang-users.

Greg Fitzgerald

unread,
Jan 27, 2015, 5:49:36 PM1/27/15
to ats-lan...@googlegroups.com
Thanks for doing that work to ensure is in clang and not in
ATS-generated code. I opened a discussion on the llvmdev email list
and added a bug report at llvm.org:

http://llvm.org/bugs/show_bug.cgi?id=22360

I'll keep you posted on progress. Feel free to jump in on the discussion.

-Greg
> https://groups.google.com/d/msgid/ats-lang-users/CAPPSPLph8gEqH-4PHYWMPqg6xpp07s0i6JcCE3_1g9%3DteL69uQ%40mail.gmail.com.

Barry Schwartz

unread,
Jan 27, 2015, 7:06:46 PM1/27/15
to ats-lan...@googlegroups.com
On Mon, Jan 26, 2015 at 7:09 PM, Raoul Duke <rao...@gmail.com> wrote:
> things like this, and the bugs seen in the Tahoe LAFS project, make me
> hold the opinion that compiler optimizations are basically kinda
> almost purely bad deals with the devil.

Hongwei Xi <gmh...@gmail.com> skribis:
> But asking programmers to write optimized on their own would be a lot worse!

Hear, hear!

Such code too often has the opposite effect, as well, at least in the
long run.

Greg Fitzgerald

unread,
Jan 27, 2015, 7:30:59 PM1/27/15
to ats-lan...@googlegroups.com
From Reid Kleckner:

So this is almost undoubtedly a setjmp issue. Relevant C standard quote
[7.13.2.1 p5]:

"""
All accessible objects have values, and all other components of the abstract
machine249)
have state, as of the time the longjmp function was called, except that the
values of
objects of automatic storage duration that are local to the function containing
the
invocation of the corresponding setjmp macro that do not have
volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.
"""

As the variable tmp502 is not declared volatile, its value is indeterminate. I
think this is working as intended with nothing to do in LLVM.

Barry Schwartz

unread,
Jan 27, 2015, 9:14:03 PM1/27/15
to ats-lan...@googlegroups.com
Greg Fitzgerald <gar...@gmail.com> skribis:
> As the variable tmp502 is not declared volatile, its value is indeterminate. I
> think this is working as intended with nothing to do in LLVM.

My worldview is sustained. I keep the C standard nearby because it is
so precise about things.

gmhwxi

unread,
Jan 27, 2015, 9:29:03 PM1/27/15
to ats-lan...@googlegroups.com
Thanks!

I would really like to understand the issue.


>>As the variable tmp502 is not declared volatile, its value is indeterminate

When is it indeterminate? During the longjmp phase?

Why does it matter whether it is indeterminate or not? It is not used during the
longjmp phase.

Greg Fitzgerald

unread,
Jan 27, 2015, 9:37:32 PM1/27/15
to ats-lan...@googlegroups.com, Reid Kleckner

gmhwxi

unread,
Jan 27, 2015, 10:50:40 PM1/27/15
to ats-lan...@googlegroups.com
>>Why does it matter whether it is indeterminate or not? It is not used during the
>>longjmp phase.

I just noticed that this is not true. It is used as the return value of the function call.

I strongly suggest that this (reading from a variable holding indeterminate value)
be reported by clang as an undefined behavior. Having a rule in the standard is one thing.
Pointing out violations of this rule is another.

--Hongwei

PS: I went ahead to remove the use of exception in my code in this case (and thus
requiring no setjmp/longjmp). Everything is smooth so far. It is a great feeling.

Shea Levy

unread,
Jan 27, 2015, 10:54:53 PM1/27/15
to ats-lan...@googlegroups.com
I would guess that finding a violation of this rule would be very expensive for the compiler while relying on this rule being followed results in real optimizations. llvm has put a lot of emphasis on fast compilation times.

But perhaps this is exactly what the sanitize=undefined is for, so maybe clang could have longjmp somehow put traps in all non-volatile stack variables. But I don’t know how the sanitization stuff is even implemented so this might be nonsense :)

~Shea

gmhwxi

unread,
Jan 28, 2015, 12:59:15 AM1/28/15
to ats-lan...@googlegroups.com
I thought that I had understood the issue; I apparently did not.


""""
All accessible objects have values, and all other components of the abstract
machine have state, as of the time the longjmp function was called, except that the
values of objects of automatic storage duration that are local to the function containing
the invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.
""""

If you use (clang -O1) to compile the following code and then execute the generate object code,
you should get:

second
main: x = 1000000

If you use (clang -O2), you should get

second
main: x = 0

I understood this behavior: the value in x is really indeterminate; either 0 or 1000000 is okay.

What I do not understand is that you get the same results even if x is heap-allocated. This does
not follow from the above description: if x is heap-allocated, it is considered a state; so its content
should not be indeterminate. Right?

#include <stdio.h>
#include <setjmp.h>

static jmp_buf buf;

void
second
(int *p) {
 
*p = 1000000;
  printf
("second\n");         // prints
  longjmp
(buf,1);             // jumps back to where setjmp was called - making setjmp now return 1
}

void
first
(int *p) {
  second
(p);
  printf
("first\n");          // does not print
}

int main() {
 
int x = 0;
 
if (!setjmp(buf) ) {
    first
(&x);                // when executed, setjmp returns 0
 
} else {                    // when longjmp jumps back, setjmp returns 1
    printf
("main: x = %i\n", x);       // prints
 
}
 
return 0;
}



On Tuesday, January 27, 2015 at 7:30:59 PM UTC-5, Greg Fitzgerald wrote:

gmhwxi

unread,
Jan 28, 2015, 1:11:59 AM1/28/15
to ats-lan...@googlegroups.com
Please try (clang -O1) and (clang -O2) on the following code:

#include <stdio.h>
#include <stdlib.h>

#include <setjmp.h>

static jmp_buf buf;

void
second
(int *p) {
 
*p = 1000000;

  printf
("second: *p = %i\n", *p);

  longjmp
(buf,1);             // jumps back to where setjmp was called - making setjmp now return 1
}

void
first
(int *p) {
  second
(p);
  printf
("first\n");          // does not print
}

int main() {

 
/*
  int x = 0;
  */

 
int *p;
  p
= malloc(sizeof(int));
/*
  printf("main: p = %p\n", p);
*/

 
if (p == 0)
 
{
    fprintf
(stderr, "malloc: failed!\n"); exit(1);
 
}

 
*p = 0;

 
if (!setjmp(buf) ) {
    first
(p);                // when executed, setjmp returns 0

 
} else {                    // when longjmp jumps back, setjmp returns 1

    printf
("main: *p = %i\n", *p);       // prints
 
}
 
return 0;
}

gmhwxi

unread,
Jan 28, 2015, 1:24:51 AM1/28/15
to ats-lan...@googlegroups.com
>>llvm has put a lot of emphasis on fast compilation times.

If one uses a flag like -fsanitize=undefined, then it is probably reasonable to assume that
compilation time is not considered a key factor for the moment :)

Greg Fitzgerald

unread,
Jan 28, 2015, 1:43:27 AM1/28/15
to ats-lan...@googlegroups.com
Yep, still looks like a clang bug to me. I'll keep digging.

-Greg
> https://groups.google.com/d/msgid/ats-lang-users/83e98e70-e320-4d7d-8776-5a6fa2804b94%40googlegroups.com.

Greg Fitzgerald

unread,
Jan 28, 2015, 1:49:09 AM1/28/15
to ats-lan...@googlegroups.com
Note that works as expected if you change "int" to "volatile int".

-Greg

Greg Fitzgerald

unread,
Jan 28, 2015, 1:56:19 AM1/28/15
to sh...@shealevy.com, ats-lan...@googlegroups.com
> But perhaps this is exactly what the sanitize=undefined is for

Yes, exactly.


> so maybe clang could have longjmp somehow put traps in all non-volatile stack variables.
> But I don’t know how the sanitization stuff is even implemented so this might be nonsense :)

Not nonsense. That'd be a nice contribution to make. If you're
interested in implementing it:

https://github.com/llvm-mirror/compiler-rt/tree/master/lib/ubsan
http://llvm.org/docs/GettingStarted.html#git-mirror

-Greg
> https://groups.google.com/d/msgid/ats-lang-users/F1FA52DD-34D7-446D-9B86-47834C03EA71%40shealevy.com.

Yannick Duchêne

unread,
Jan 28, 2015, 7:11:17 AM1/28/15
to ats-lan...@googlegroups.com
I heard to say that GCC -O2 is even safer than GCC -O1 or -O0 or Os, as GCC -O2 is the most tested. Unfortunately, I have no reference to provide for this assertion.
 

Shea Levy

unread,
Jan 28, 2015, 12:15:18 PM1/28/15
to ats-lan...@googlegroups.com
Note that in this case, the printf is undefined behavior: p may point to the heap, but p itself is on the stack and thus indeterminate.

gmhwxi

unread,
Jan 28, 2015, 12:31:58 PM1/28/15
to ats-lan...@googlegroups.com
But p is unchanged; it is a const; so it should be determinate, I think.
...

Shea Levy

unread,
Jan 28, 2015, 12:36:31 PM1/28/15
to ats-lan...@googlegroups.com
It’s not even const, but either way the standard is clear here that non-volatile stack variables are indeterminate.

I am not at all experienced enough to know how this kind of thing is actually implemented, but I could imagine for example a setjmp/longjmp impl that ran longjmp in an entirely new stack frame, with only the volatile variables copied from before. Or one that filled the stack with garbage first.

Shea Levy

unread,
Jan 28, 2015, 1:24:06 PM1/28/15
to ats-lan...@googlegroups.com
Or, of course, the variable could live in a register that is simply not restored if it’s not volatile.

Barry Schwartz

unread,
Jan 28, 2015, 1:26:58 PM1/28/15
to ats-lan...@googlegroups.com
gmhwxi <gmh...@gmail.com> skribis:
> But p is unchanged; it is a const; so it should be determinate, I think.

I’m looking at the 2011 standard. To me Section 7.13.2.1 seems pretty
clear that the value has to be ‘actually changed’ to be considered
indeterminate. So I would think an optimizer ought to be conservative
about it.

gmhwxi

unread,
Jan 28, 2015, 2:00:04 PM1/28/15
to ats-lan...@googlegroups.com

It is pretty clear to me that Section 7.13.2.1 is written by compiler-writers.
The main purpose of this section is to not really trying to clarify the use of
setjmp/longjmp. If I am sarcastic about it, then I would say that its main purpose
is probably about covering their rear ends :)

First, setjmp/longjmp is straightforward to implement if optimization is not of
the concern. Essentially, you just attach labels to the stack so that you know how
to roll back the stack when longjmp is called.

However, if we want to optimize away some memory accesses (that is, caching
some memory contents), then there is a challenge here. Because one never knows
when longjmp is to take place after setjmp is called, being conservative means that
everything on the stack should be treated as being volatile. Probably this is too
conservative a stance for the compiler-writer to take, so Section 7.13.2.1 shifts the
burden to the programmer: it is the programmer's responsibility to clearly indicate what
should be treated as being volatile.

This is just like the fine prints in the brochure sent by a credit card company.

It is not the end of the story yet. Section 7.13.2.1 does not say anything about
inlining a function that calls setjmp. Is it allowed? If it is, then what about the
stack-allocated variables in the function that calls the inlined function? Should
they be handled according to Section 7.13.2.1 as well?

gmhwxi

unread,
Jan 28, 2015, 2:59:05 PM1/28/15
to ats-lan...@googlegroups.com
I so wished that there had been a so-called "reference" implementation for C.
Just like Scheme.

Barry Schwartz

unread,
Jan 28, 2015, 3:07:45 PM1/28/15
to ats-lan...@googlegroups.com
gmhwxi <gmh...@gmail.com> skribis:
> I so wished that there had been a so-called "reference" implementation for
> C.
> Just like Scheme.

Or a ‘Trip Test’.

gmhwxi

unread,
Jan 28, 2015, 3:53:33 PM1/28/15
to ats-lan...@googlegroups.com

I read the assembly output from  (clang -O2). It is
pretty clear that *p in the printf-call is replaced with
0. This would be correct if setjmp were an oridinary function.
In this case, the compiler should have assumed that setjmp may
modify the content of anything that has an address.

Let us treat setjmp specially because it is special. However, it is not so simple because
setjmp could be called something else. There is no easy way for the compiler to tell whether
a function call may or may not invoke setjmp.

This is clearly a compiler bug. But the bug cannot really be fixed (without undesirable consequences).

One possibility is to reword Section 7.13.2.1 so that heap-variables are also covered. But that
would too much even for compiler-writers :)

Reid Kleckner

unread,
Jan 29, 2015, 1:46:22 PM1/29/15
to ats-lan...@googlegroups.com
Resending to get through googlegroups...

------------

On Tue, Jan 27, 2015 at 6:37 PM, Greg Fitzgerald <gar...@gmail.com> wrote:
+Reid

On Tue, Jan 27, 2015 at 6:29 PM, gmhwxi <gmh...@gmail.com> wrote:
> Thanks!
>
> I would really like to understand the issue.
>
>>>As the variable tmp502 is not declared volatile, its value is
>>> indeterminate
>
> When is it indeterminate? During the longjmp phase?
>
> Why does it matter whether it is indeterminate or not? It is not used during
> the
> longjmp phase.

tmp502 is indeterminate *after* the longjmp. Consider this small example:

#include <setjmp.h>
int main(int argc, char **argv) {
  int x = 42;
  jmp_buf buf;
  if (setjmp(buf))
    return x;
  x = 13;
  longjmp(buf, 1);
}

LLVM immediately optimizes it to this:

#include <setjmp.h>
int main(int argc, char **argv) {
  jmp_buf buf;
  if (setjmp(buf))
    return 42; // 'x = 42' dominates 'return x', x does not escape, so setjmp cannot modify it, constant propagate
  // x = 13; // dead, no one uses x after this.
  longjmp(buf, 1); // noreturn, control ends here
}

setjmp and longjmp allow you to create control flow that the compiler cannot see. Compilers model them for the most part as simple library functions that can only modify memory that has previously escaped. Given that I can write simple setjmp/longjmp wrappers that hide them from the compiler, it is basically impossible to do otherwise.

Therefore, the C standard suggests that if you want to see updates to 'x' after setjmp and before longjmp, you use the 'volatile' storage specifier. This pins it in stack memory and ensures that optimizers will not forward stores to loads. This program will return 13 instead of 42:

#include <setjmp.h>
int main(int argc, char **argv) {
  volatile int x = 42;
  jmp_buf buf;
  if (setjmp(buf))
    return x;
  x = 13;
  longjmp(buf, 1);
}

gmhwxi

unread,
Jan 29, 2015, 2:21:29 PM1/29/15
to ats-lan...@googlegroups.com
Thanks!

What should happen if x is not on the stack. Say you do


int *p;
p = malloc(sizeof(int));
*p = 42
//
// replace x with *p from this point on
//

I noted that clang does the same after this change. Is this according to
the C standard?

Barry Schwartz

unread,
Jan 29, 2015, 3:12:20 PM1/29/15
to ats-lan...@googlegroups.com
'Reid Kleckner' via ats-lang-users <ats-lan...@googlegroups.com> skribis:
> tmp502 is indeterminate *after* the longjmp. Consider this small example:
>
> #include <setjmp.h>
> int main(int argc, char **argv) {
> int x = 42;
> jmp_buf buf;
> if (setjmp(buf))
> return x;
> x = 13;
> longjmp(buf, 1);
> }

This certainly satisfies the criterion that the value of x be changed.

> LLVM immediately optimizes it to this:
>
> #include <setjmp.h>
> int main(int argc, char **argv) {
> jmp_buf buf;
> if (setjmp(buf))
> return 42; // 'x = 42' dominates 'return x', x does not escape, so
> setjmp cannot modify it, constant propagate
> // x = 13; // dead, no one uses x after this.
> longjmp(buf, 1); // noreturn, control ends here
> }

This looks valid to me, in that if you did not have the ‘x=13’
statement the function would always return the unchanged (and
therefore supposedly determinate) value 42.

I’d have to study tmp502 to see what is going on with that.


gmhwxi

unread,
Jan 29, 2015, 3:22:58 PM1/29/15
to ats-lan...@googlegroups.com

My current concern is this one:

#include <stdlib.h>

#include <setjmp.h>
int main(int argc, char **argv) {

 
int *p;
  p
= malloc(sizeof(int));

 
*p = 42;
  jmp_buf buf
;
 
if (setjmp(buf))
   
return *p;
 
*p = 13;
  longjmp
(buf, 1);
}

Try

clang abcde.c; ./a.out; echo $? // should see 13
clang -O2 abcde.c; ./a.out; echo $? // should see 42

My current understanding is that this is a BUG.

Raoul Duke

unread,
Jan 29, 2015, 3:39:47 PM1/29/15
to ats-lang-users
All I can (re) learn from this is how much C sucks? If it is this hard
to figure out if something is a bug or not... Sheesh. :-}

gmhwxi

unread,
Jan 29, 2015, 4:24:48 PM1/29/15
to ats-lan...@googlegroups.com

But this is all due to C being a REAL programming language :)

Greg Fitzgerald

unread,
Jan 29, 2015, 4:32:04 PM1/29/15
to ats-lan...@googlegroups.com
http://xkcd.com/1312/
> --
> You received this message because you are subscribed to the Google Groups
> "ats-lang-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ats-lang-user...@googlegroups.com.
> To post to this group, send email to ats-lan...@googlegroups.com.
> Visit this group at http://groups.google.com/group/ats-lang-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ats-lang-users/2a67eefe-2b84-43cd-b6c5-e6725d58a319%40googlegroups.com.

Barry Schwartz

unread,
Jan 29, 2015, 4:59:31 PM1/29/15
to ats-lan...@googlegroups.com
gmhwxi <gmh...@gmail.com> skribis:
> My current concern is this one:
>
> #include <stdlib.h>
> #include <setjmp.h>
> int main(int argc, char **argv) {
> int *p;
> p = malloc(sizeof(int));
> *p = 42;
> jmp_buf buf;
> if (setjmp(buf))
> return *p;
> *p = 13;
> longjmp(buf, 1);
> }
>
> Try
>
> clang abcde.c; ./a.out; echo $? // should see 13
> clang -O2 abcde.c; ./a.out; echo $? // should see 42
>
> My current understanding is that this is a BUG.

That looks like a bug to me, too.

All accessible objects have values, and all other components of
the abstract machine have state, as of the time the longjmp
function was called, except that the values of objects of
automatic storage duration that are local to the function
containing the invocation of the corresponding setjmp macro that
do not have volatile-qualified type and have been changed between
the setjmp invocation and longjmp call are indeterminate.

The value of the object in automatic storage has not changed;
therefore the value should be determinate; also determinate should be
the value at storage to which the auto variable points, _as of the
time the longjmp function was called_.

Greg Fitzgerald

unread,
Jan 29, 2015, 5:00:15 PM1/29/15
to ats-lan...@googlegroups.com
So I had a quick chat on the llvm irc channel about this. The general
mood is that setjmp/longjmp is a dirty hack for people trying to get C
to do things that it's not meant for - exception handling. The
'volatile' solution is a hack on hack - a convenient escape hatch that
already existed in the language and could be used by those few people
in the world attempting to hack exception handling into C. While it
sounds like we have a solution, have you considered targeting C++ or
LLVM bitcode instead of C?

Regarding your malloc() example. Since that memory is non-volatile
and the C spec gives itself the space to treat setjmp() as a
subroutine, that constant of 42 would still be propagated to "return
*p" and make it "return 42". The malloc() would probably stick around
even though it appears dead. That's because someone may override
malloc() and add side-effects. If you did whole-program optimization,
I'd think the malloc() would get the boot and your program would be
reduced to:

#include <setjmp.h>
int main(int argc, char **argv) {
jmp_buf buf;
if (setjmp(buf))
return 42;
longjmp(buf, 1);
}


-Greg

On Thu, Jan 29, 2015 at 1:24 PM, gmhwxi <gmh...@gmail.com> wrote:
>

Barry Schwartz

unread,
Jan 29, 2015, 5:05:37 PM1/29/15
to ats-lan...@googlegroups.com
Raoul Duke <rao...@gmail.com> skribis:
> All I can (re) learn from this is how much C sucks? If it is this hard
> to figure out if something is a bug or not... Sheesh. :-}

IMO C is basically a high-level _assembly_ language, primarily for
PDP-11 but implemented for ‘everything’. That’s why it has these
pointers and all the trouble associated with them. The notations of C
correspond largely to PDP-11 instructions.

This is not entirely a bad thing. The main alternative would be for
ATS to be able to generate a zillion different assembly languages!

Shea Levy

unread,
Jan 29, 2015, 5:06:39 PM1/29/15
to ats-lan...@googlegroups.com
All that about it being a hack is true, but the standard is pretty clear nonetheless that the heap example should work.

From the perspective of ATS, I guess the solution is to avoid relying on changes to values between setjmp and longjmp wherever possible. From an llvm perspective they should either stop doing constant propagation of this form (or perhaps there's a less invasive fix?) or openly acknowledge divergence from the standard.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/CAFLa5WNZb6bKSek2Q-jQAo43PRZKcdp%3DAaHRuBoj8XZ6gZFkMw%40mail.gmail.com.

Barry Schwartz

unread,
Jan 29, 2015, 5:09:04 PM1/29/15
to ats-lan...@googlegroups.com
gmhwxi <gmh...@gmail.com> skribis:
> But this is all due to C being a *REAL* programming language :)
>
> On Thursday, January 29, 2015 at 3:39:47 PM UTC-5, Raoul Duke wrote:
> >
> > All I can (re) learn from this is how much C sucks? If it is this hard
> > to figure out if something is a bug or not... Sheesh. :-}

I could be convinced otherwise, but it looks to me as if LLVM is being
too aggressive to optimize in the presence of pointers.

Greg Fitzgerald

unread,
Jan 29, 2015, 5:35:04 PM1/29/15
to ats-lan...@googlegroups.com
On Thu, Jan 29, 2015 at 2:08 PM, Barry Schwartz
<chemoe...@chemoelectric.org> wrote:
> I could be convinced otherwise, but it looks to me as if LLVM is being
> too aggressive to optimize in the presence of pointers.

It's an interesting case. It's not the malloc() to look at, but the
code just after:

*p = 42;
return *p;

The compiler can still respect memory by changing it to:

*p = 42;
return 42;

That is setjmp() won't begin until 42 is flushed to memory. However,
the value 42 is very much local. I think constants fit comfortably
into the definition "automatic storage duration that are local to the
function". You get the same bug using:

*p = argc
return *p;

argc is inlined into the return statement.

-Greg

gmhwxi

unread,
Jan 29, 2015, 6:30:52 PM1/29/15
to ats-lan...@googlegroups.com

I don't feel that it should be interpreted this way:


""""
All accessible objects have values, and all other components of the abstract
machine have state, as of the time the longjmp function was called, except that the
values of objects of automatic storage duration that are local to the function containing
the invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.
""""

A constant cannot be changed. This paragraph clearly talks about left-values (objects).
The purpose of this paragraph is very clear: It tries to shift the responsibility to the programmer
in case of an "over-optimization". There is really no other way to do it.

By the way, "indeterminate" is not completely indeterminate. Essentially, it means that
the compiler may use the value available at the point where longjmp starts (not optimizing)
or the value saved at the point where setjmp happens (optimizing).

gmhwxi

unread,
Jan 29, 2015, 6:46:58 PM1/29/15
to ats-lan...@googlegroups.com
A Haskell program essentially executes in two stages:

Stage 1: The program executes to generate another program (of the type IO(...))
Stage 2. The latter program gets executed by something like runIO: running the IO monad.

When people say that a Haskell program is pure, they mean that Stage 1 is pure.
Stage 2 cannot be pure and is not supposed to be pure.

I think of Stage 1 as a "philosopher" and Stage 2 as a "plumber". A language being "real" means,
in my vocabulary, a language that can do the work of a "plumber" :)

Raoul Duke

unread,
Jan 29, 2015, 6:48:00 PM1/29/15
to ats-lang-users

Barry Schwartz

unread,
Jan 29, 2015, 7:01:38 PM1/29/15
to ats-lan...@googlegroups.com
Greg Fitzgerald <gar...@gmail.com> writes:
> I think constants fit comfortably
> into the definition "automatic storage duration that are local to the
> function".

I think it’s probably meant to correspond to the meaning of the
keyword ‘auto’, and thus would _not_ include literal numerals:

Section 6.2.4:

An object whose identifier is declared with no linkage and without
the storage-class specifier static has automatic storage duration,
as do some compound literals. The result of attempting to
indirectly access an object with automatic storage duration from a
thread other than the one with which the object is associated is
implementation-defined.

The terminology seems never to be clearly defined (the index points to
the above), but I think the analogy to the ‘auto’ keyword is a good
guide.

Barry Schwartz

unread,
Jan 29, 2015, 7:06:24 PM1/29/15
to ats-lan...@googlegroups.com
I think we are seeing something people have known for a long while:
that C was not designed with automatic optimization in mind, and it
shows.

:)

gmhwxi

unread,
Jan 29, 2015, 7:40:41 PM1/29/15
to ats-lan...@googlegroups.com

Conal Elliot :)

He and I had the same PhD adviser.

I haven't met him for at least 12 years. When I met him last time,
he was giving a talk on FRP in Haskell. During the talk, he gave a demo
involving animation. I still remember this vividly: the animation stopped
somewhere in the middle because GC just kicked in :)

On Thursday, January 29, 2015 at 6:48:00 PM UTC-5, Raoul Duke wrote:
http://conal.net/blog/posts/the-c-language-is-purely-functional

gmhwxi

unread,
Jan 29, 2015, 7:44:40 PM1/29/15
to ats-lan...@googlegroups.com
Should have added this as well:

While it was an amusing moment, it did force me to think about
how functional programming can be implemented without GC support.

Raoul Duke

unread,
Jan 29, 2015, 8:06:01 PM1/29/15
to ats-lang-users
> While it was an amusing moment, it did force me to think about
> how functional programming can be implemented without GC support.

regions? reference counting?
Message has been deleted

gmhwxi

unread,
Jan 29, 2015, 8:36:09 PM1/29/15
to ats-lan...@googlegroups.com

Techniques like regions and reference counting sound too much of
engineering. I am a logician by training :)

See:

http://www.sciencedirect.com/science/article/pii/0304397588901004

I could not find a free copy of the above paper.

Raoul Duke

unread,
Jan 29, 2015, 8:54:53 PM1/29/15
to ats-lang-users
Linear logic as in Concurrent Clean style Uniqueness Types?

Raoul Duke

unread,
Jan 29, 2015, 8:56:01 PM1/29/15
to ats-lang-users

Raoul Duke

unread,
Jan 29, 2015, 8:58:54 PM1/29/15
to ats-lang-users
> Linear logic as in Concurrent Clean style Uniqueness Types?

"If a uniquely typed argument is not returned again in the function
result it has become garbage (the reference has dropped to zero). Due
to the fact that uniqueness typing is static the object can be garbage
collected (see Chapter 2) at compile time."

?

Hongwei Xi

unread,
Jan 29, 2015, 9:43:33 PM1/29/15
to ats-lan...@googlegroups.com
Clean finds a way to use linear logic.
Essentially, Clean uses linear logic to do reference counting at compile-time.
IMO, this is a very superficial way to use linear logic.

In physics, if a theory is developed, it should be tested in reality (the physical
world). To me, a "good" linear type theory should be able to explain how C works.
I see C as the "reality" testbed for a type theory. In this regard, C is very special.
Let me give a comparison. Haskell certainly has a sophisticatedly developed type
theory. But this type theory only explains how things work in the abstract computational
model of Haskell (Stage 1).




?

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at http://groups.google.com/group/ats-lang-users.

Raoul Duke

unread,
Jan 30, 2015, 2:36:48 AM1/30/15
to ats-lang-users
If the compiler could insert the malloc's and free's that a human does
in C, then that seems to just end up as reference counting? As much as
I like GC in some ways, I don't utterly dislike e.g. ARC.

gmhwxi

unread,
Jan 30, 2015, 2:32:04 PM1/30/15
to ats-lan...@googlegroups.com
I will start another thread on this. It has been buried too deep!

Yannick Duchêne

unread,
Jan 31, 2015, 4:32:51 AM1/31/15
to ats-lan...@googlegroups.com


Le mardi 27 janvier 2015 19:44:26 UTC+1, gmhwxi a écrit :
But asking programmers to write optimized on their own would be a lot worse!

I think it is time to ask/find someone to use CompCert to compile ATS2 :)

Hi! :D

The [documentation](http://compcert.inria.fr/man/manual.pdf) says

-- begin of quote --
1.4.2 The supported C dialect
Chapter 5 specifies the dialect of the C language that CompCert C accepts as input language. In summary,
CompCert C supports all of ISO C 99 [4], with the following exceptions:
   • switch statements must be structured as in MISRA-C [1]; unstructured switch, as in Duff’s device,
is not supported.
   • Variable-length arrays are not supported.
   • longjmp and setjmp are not guaranteed to work.
-- end of quote --

So unfortunately, this may not solve the setjmp/longjmp issue.

Also, `./configure` seems to recognize the `CC` environment variable passed to it, but do not propagate it. I had to manually edit Makefile to change `CCOMP=compcert` at three places. Note: the CompCert command's name is `ccomp`, but as this name is used everywhere in ATS build's source directory, I created a symbolic `compcert` to `ccomp` (to avoid ambiguity).

Actually, it complains about unrecognized option `-O2`, so I guess there are other files to me manually edited which `./configure` don't.

I'm going on later, and if I succeed to build Postiat 0.1.8 with CompCert, I will tell in a thread (and will explain how I did).

Brandon Barker

unread,
Jan 31, 2015, 7:34:49 AM1/31/15
to ats-lang-users
This definitely sounds like a good and necessary companion compiler to ATS. 

It looks like the compiler is partly Coq and partly Ocaml, but mostly Coq.

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at http://groups.google.com/group/ats-lang-users.

Brandon Barker

unread,
Jan 31, 2015, 9:16:07 PM1/31/15
to ats-lang-users
On Friday, I happened across Plaugers book, which I'm now tempted to buy (didn't know about it previously). I did find the following relevant quotes ... :)



On Sat, Jan 31, 2015 at 4:32 AM, 'Yannick Duchêne' via ats-lang-users <ats-lan...@googlegroups.com> wrote:

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at http://groups.google.com/group/ats-lang-users.



--
Brandon Barker
brandon...@gmail.com

Yannick Duchêne

unread,
Feb 1, 2015, 6:26:05 AM2/1/15
to ats-lan...@googlegroups.com


Le dimanche 1 février 2015 03:16:07 UTC+1, Brandon Barker a écrit :
On Friday, I happened across Plaugers book, which I'm now tempted to buy (didn't know about it previously). I did find the following relevant quotes ... :)




Indeed, the CompCert manual says the same. Additionnaly to the previous quote, here is another one, on page 40:

“ **The CompCert C compiler has no special knowledge of the setjmp and longjmp functions, treating
them as ordinary functions that respect control flow.** It is therefore not advised to use these two func-
tions in CompCert-compiled code. To prevent misoptimisation, it is crucial that all local variables that
are live across an invocation of setjmp be declared with volatile modifier. ”
 
Reply all
Reply to author
Forward
0 new messages