[llvm-dev] Ambiguity in the nofree function attribute

Philip Reames via llvm-dev

unread,

Apr 9, 2021, 3:05:10 PM4/9/21

to llvm...@lists.llvm.org, Artur Pilipenko

I've stumbled across a case related to the nofree attribute where we seem to have inconsistent interpretations of the attribute semantic in tree. I'd like some input from others as to what the "right" semantic should be.

The basic question is does the presence of nofree prevent the callee from allocating and freeing memory entirely within it's dynamic scope? At first, it seems obvious that it does, but that turns out to be a bit inconsistent with other attributes and leads to some surprising results.

For reference in the following discussion, here is the current wording for the nofree function attribute in LangRef:

"This function attribute indicates that the function does not, directly or indirectly, call a memory-deallocation function (free, for example). As a result, uncaptured pointers that are known to be dereferenceable prior to a call to a function with the nofree attribute are still known to be dereferenceable after the call (the capturing condition is necessary in environments where the function might communicate the pointer to another thread which then deallocates the memory)."

For discussion purposes, please assume the concurrency case has been separately proven. That's not the point I'm getting at here.

The two possible semantics as I see them are:

Option 1 - nofree implies no call to free, period

This is the one that to me seems most consistent with the current wording, but it prevents the callee from allocating storage and freeing it entirely within it's scope. This is, for instance, a reasonable thing a target might want to do when lowering large allocs. This requires transforms to be careful in stripping the attribute, but isn't entirely horrible.

The more surprising bit is that it means we can not infer nofree from readonly or readnone. Why? Because both are specified only in terms of memory effects visible to the caller. As a result, a readnone function can allocate storage, write to it, and still be readonly. Our current inference rules for readnone and readonly do exploit this flexibility.

The optimizer does currently assume that readonly implies nofree. (See the accessor on Function) Removing this substantially weakens our ability to infer nofree when faced with a function declaration which hasn't been explicitly annotated for nofree. We can get most of this back by adding appropriate annotations to intrinsics, but not all.

Option 2 - nofree applies to memory visible to the caller

In this case, we'd add wording to the nofree definition analogous to that in the readonly/readnone specification. (There's a subtlety about the precise definition of visible here, but for the moment, let's hand wave in the same way we do for the other attributes.)

This allows us to infer nofree from readonly, but essentially cripples our ability to drive transformations within an annotated function. We'd have to restrict all transforms and inference to cases where we can prove that the object being affected is visible to the caller.

The benefit is that this makes it slightly easier to infer nofree in some cases. The main impact of this is improving ability to reason about dereferenceability for uncaptured objects over calls to functions for which we inferred nofree.

The downside of this is that we essentially loose all ability to reason about nofree in a context free manner. For a specific example of the impact of this, it means we can't infer dereferenceability for an object allocated in F, and returned (e.g. not freed), in the scope of F.

This breaks hoisting and vectorization improvements (e.g. unconditional loads instead of predicated ones) I've checked in over the last few months, and makes the ongoing deref redefinition work substantially harder. https://reviews.llvm.org/D100141 shows what this looks like code wise.

My Take

At first, I was strongly convinced that option 1 was the right choice. So much so in fact that I nearly didn't bother to post this question. However, after giving it more thought, I've come to distrust my own response a bit. I definitely have a conflict of interest here. Option 2 requires me to effectively cripple several recent optimizer enhancements, and maybe even revert some code which becomes effectively useless. It also makes a project I'm currently working on (deref redef) substantially harder.

On the other hand, the inconsistency with readonly and readnone is surprising. I can see an argument for that being the right overall approach long term.

So essentially, this email is me asking for a sanity check. Do folks think option 1 is the right option? Or am I forcing it to be the right option because it makes things easier for me?

Philip

Nicolai Hähnle via llvm-dev

unread,

Apr 10, 2021, 5:43:10 AM4/10/21

to Philip Reames, llvm...@lists.llvm.org, Artur Pilipenko

Hi Philip,

could you explain the downsides of Option 2 more, perhaps with examples? They seem pretty non-obvious to me at a first glance.

Naively, I'd argue that programming language semantics tend to always be understood under "as-if" rules, which seems to imply that Option 2 is the right one. If a callee allocates and immediately frees memory, how can the caller even tell?

Cheers,

Nicolai

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Johannes Doerfert via llvm-dev

unread,

Apr 11, 2021, 12:48:22 PM4/11/21

to Nicolai Hähnle, Philip Reames, llvm...@lists.llvm.org, Artur Pilipenko

I generally agree with Nicolai and I feel I'm missing information
here. Comments inlined.

>> *Option 1* - nofree implies no call to free, period

>>
>> This is the one that to me seems most consistent with the current wording,
>> but it prevents the callee from allocating storage and freeing it entirely
>> within it's scope. This is, for instance, a reasonable thing a target
>> might want to do when lowering large allocs. This requires transforms to
>> be careful in stripping the attribute, but isn't entirely horrible.
>>
>> The more surprising bit is that it means we can not infer nofree from
>> readonly or readnone. Why? Because both are specified only in terms of
>> memory effects visible to the caller. As a result, a readnone function can
>> allocate storage, write to it, and still be readonly. Our current
>> inference rules for readnone and readonly do exploit this flexibility.

I thought this was not a big problem as it seems it only comes into play if we have user annotated
readonly/none declarations. If so, we should expose a way to attach other attributes, like nofree, to
the user as well. If not, could you elaborate in what other situations this occurs?
(That said, the Attributor can derive readonly/readnone even if there are memory operations inside a function
as long as it looks from the outside as-if it was readonly/readnone. While we don't do that for nofree and
friends, we could.)

>> The optimizer does currently assume that readonly implies nofree. (See
>> the accessor on Function) Removing this substantially weakens our ability
>> to infer nofree when faced with a function declaration which hasn't been
>> explicitly annotated for nofree. We can get most of this back by adding
>> appropriate annotations to intrinsics, but not all.
>>

>> *Option 2* - nofree applies to memory visible to the caller

>>
>> In this case, we'd add wording to the nofree definition analogous to that
>> in the readonly/readnone specification. (There's a subtlety about the
>> precise definition of visible here, but for the moment, let's hand wave in
>> the same way we do for the other attributes.)
>>
>> This allows us to infer nofree from readonly, but essentially cripples our
>> ability to drive transformations within an annotated function. We'd have
>> to restrict all transforms and inference to cases where we can prove that
>> the object being affected is visible to the caller.
>>
>> The benefit is that this makes it slightly easier to infer nofree in some
>> cases. The main impact of this is improving ability to reason about
>> dereferenceability for uncaptured objects over calls to functions for which
>> we inferred nofree.
>>
>> The downside of this is that we essentially loose all ability to reason
>> about nofree in a context free manner. For a specific example of the
>> impact of this, it means we can't infer dereferenceability for an object
>> allocated in F, and returned (e.g. not freed), in the scope of F.

I don't understand. Do you mean:
```
void* F(void *A, void *B) {
void *ptr = malloc(8);
free(A);
unknown();
return ptr;
}
```
If so, why could we not determine that `ptr` is deref(8) at the return.
We can also attach `nofree` to B. Both seem to be possible even if F
is not `nofree`. What am I missing?

~ Johannes

>>
>> This breaks hoisting and vectorization improvements (e.g. unconditional
>> loads instead of predicated ones) I've checked in over the last few months,
>> and makes the ongoing deref redefinition work substantially harder.
>> https://reviews.llvm.org/D100141 shows what this looks like code wise.
>>

>> *My Take*

Nuno Lopes via llvm-dev

unread,

Apr 12, 2021, 1:58:48 PM4/12/21

to Philip Reames, llvm...@lists.llvm.org, Artur Pilipenko

I like option 2. I agree that allowing functions to allocate & deallocate memory is useful.

Option 1 is super hard to infer. Plus it necessarily hits fewer cases, as the whole call-graph would need to consist of nofree calls. Option 2 doesn’t have such requirement.

Nofree is most useful for callers to know that the dereferenceability of any pointer they have is kept across the call. In general, function attributes are there to help callers. Otherwise they would be at most a cache for analyses that you can do locally (ok, plus information that frontends can give, but clang can’t give you nofree I guess).

I quickly scanned the test cases affected by your patch, and those seem to be “easily” recoverable. Those functions don’t have any call to free nor are those pointers passed to other non-nofree calls, so you can assume that any dereferenceable argument remains so throughout the whole function. Requires a bit more work, but doable.

Nuno

Philip Reames via llvm-dev

unread,

Apr 13, 2021, 5:33:59 PM4/13/21

to Johannes Doerfert, Nicolai Hähnle, llvm...@lists.llvm.org, Artur Pilipenko

Replying to this to explain points raised, but I think I'm convinced
option #2 is the overall right one. See my top level response to follow
in near future.

Philip

This is exactly the case I am mentioning. If we adopt option #1, we can
not infer nofree in a function which allocates memory, writes to it, and
then frees it. If the function did nothing else, we could infer
readonly. As a result, inferring nofree from the presence of readonly
is unsound because one uses the "as if" framing, and one does not.

(I also expand on this example below.)

The example I was referring to is the following:

void* g() {
void *A = malloc(8);
external(A);
for (int i = 0; i < 10000; i++) {
if (!C)
sum += *(int*)A;
}
return A;
}

If we assume we know that g is nofree - because the frontend told us say
- what does that tell us? Under option #1, we can hoist the load and
annotate the return with deref(8). Under option #2, we can not because
external might call free(A) internally. Option #1 disallows this;
Option #2 does not.

To help further explain, consider the following toy example:

void f() nofree {

void *A = malloc(8);
if (C) free(A);
for (int i = 0; i < 10000; i++) {
if (!C)
sum += *(int*)A;
}
if (!C) free(A);
}

Under option 2, this function is nofree because the only memory freed is
allocated within the function. Under option 1 it would not be inferred
as such, and if annotated manually would be full UB since we free along
all paths after saying we didn't.

With option 2, we can not use context insensitive reasoning to hoist the
load from A outside the loop. Today, for a function annotated nofree,
we would assume that A can't be freed and thus it's safe to hoist the
load above the loop. That's the ability we loose by using option #2
over option #1.

That flexibility is quite important optimization wise when we have
something like:

void f() nofree {
void *A = malloc(8);
unknown(A);
for (int i = 0; i < 10000; i++) {
if (!C)
sum += *(int*)A;
}
unknown2(A);
}

It is worth acknowledging that we *might* be able to prove nofree for
unknown and unknown2 depending on their bodies.

One important point to highlight that might be causing confusion is that
the source of the nofree attributes in these examples can be external to
LLVM. They do not have to be inferred, and might be provided by a
language level rule.

Hm, having just said that, I think I just shot my own argument in the
foot. None of the use cases we originally identified for the deref
stuff actually provide this semantic for a function externally. (Well,
the GC semantics does, but we'd also found another way to achieve that
without needing the attribute to have that semantic.)

>
> I don't understand. Do you mean:
> ```
> void* F(void *A, void *B) {
> void *ptr = malloc(8);
> free(A);
> unknown();
> return ptr;
> }
> ```
> If so, why could we not determine that `ptr` is deref(8) at the return.

True, but only by using context sensitive capture tracking on uses of
ptr. If ptr escaped before the unknown call, we would *not* be able to
mark the return. That is the lost context insensitive reasoning I'm
describing. If you tweak your example to remove the free and replace it
with an unknown external call, option #1 allows you to trivially prove
your fact of interest, and option #2 does not.

Philip Reames via llvm-dev

unread,

Apr 13, 2021, 5:38:15 PM4/13/21

to Nuno Lopes, llvm...@lists.llvm.org, Artur Pilipenko

On 4/12/21 10:58 AM, Nuno Lopes wrote:

I like option 2. I agree that allowing functions to allocate & deallocate memory is useful.

Option 1 is super hard to infer. Plus it necessarily hits fewer cases, as the whole call-graph would need to consist of nofree calls. Option 2 doesn’t have such requirement.

I think the consensus seems to be in favor of option #1.

Nofree is most useful for callers to know that the dereferenceability of any pointer they have is kept across the call. In general, function attributes are there to help callers. Otherwise they would be at most a cache for analyses that you can do locally (ok, plus information that frontends can give, but clang can’t give you nofree I guess).

It's that last clause that's the entire difference between the two approaches. :)

I quickly scanned the test cases affected by your patch, and those seem to be “easily” recoverable. Those functions don’t have any call to free nor are those pointers passed to other non-nofree calls, so you can assume that any dereferenceable argument remains so throughout the whole function. Requires a bit more work, but doable.

Two points in response to your last paragraph:

1) I think you're over analyzing deliberately reduced minimal test case for the current logic and assuming from that the original examples look exactly like the reduced cases.

2) I do think the need for context sensitive logic is a major increase in difficulty over context insensitive. Your point about being able to special case a context insensitive analysis when all callees are nofree is a good one, and probably worth implementing.

Philip Reames via llvm-dev

unread,

Apr 13, 2021, 5:43:58 PM4/13/21

to llvm...@lists.llvm.org, Artur Pilipenko

Replying to self to summarize takeaways.

The consensus in responses seems to be strongly in favor of option #2. My main hesitation with option #2 has always been the lost ability for a frontend to provide the stronger scoped fact, but in the process of writing a detailed response to Johannes downthread, I realized we don't have a motivating example frontend which actually benefits from said ability. Given that, it seems like option #2 wins over option #1.

It seems stopping to ask for a sanity check was well warranted in this case. :)

I will go ahead and cleanup https://reviews.llvm.org/D100141 into a real patch. I will give it a couple of days (concurrent with review) to give anyone who might disagree with this decision a chance to see this thread and chime in.

Philip

Reply all

Reply to author

Forward