Secondly, have you looked into a volatile store / load to an alloca?
That should work with PNaCl and WebAssembly.
E.g.
define i32 @blackbox(i32 %arg) {
entry:
%p = alloca i32
store volatile i32 10, i32* %p ;; or store %arg
%v = load volatile i32, i32* %p
ret i32 %v
}
-- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Why does this need to be an intrinsic (as opposed to generic "unknown function" to llvm)?
Secondly, have you looked into a volatile store / load to an alloca? That should work with PNaCl and WebAssembly.
E.g.
define i32 @blackbox(i32 %arg) {
entry:
%p = alloca i32
store volatile i32 10, i32* %p ;; or store %arg
%v = load volatile i32, i32* %p
ret i32 %v
}
This would not prevent dead code elimination from removing it. The
intrinsic would need to have some sort of a side-effect in order to be
preserved in all cases. Are you concerned about cases where the user of
the intrinsic is dead?
-Krzysztof
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
I'm very unclear and why you think a generic black box intrinsic will have any different performance impact ;-)I'm also unclear on what the goal with this intrinsic is.I understand the symptoms you are trying to solve - what exactly is the disease.IE you say "
I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc."But why are you trying to achieve this goal?
Benchmarks that can be const prop'd/etc away are often meaningless.
Past that, if you want to ensure a particular optimization does a particular thing on a benchmark, ISTM it would be better to generate the IR, run opt (or build your own pass-by-pass harness), and then run "the passes you want on it" instead of "trying to stop certain passes from doing things to it".
To add on to what Danny and Krzysztof have said, this proposal doesn’t make a lot of sense to me. You want this intrinsic to inhibit (some) optimizations, but you simultaneously want it not to have a performance impact. Those are contradictory goals. Worse, the proposal doesn’t specify what optimizations should/should not be allowed for this intrinsic, since apparently you want at least some applied. Is CSE allowed? DCE? PRE?
On Mon, Nov 2, 2015 at 9:16 PM, Daniel Berlin <dbe...@dberlin.org> wrote:I'm very unclear and why you think a generic black box intrinsic will have any different performance impact ;-)I'm also unclear on what the goal with this intrinsic is.I understand the symptoms you are trying to solve - what exactly is the disease.IE you say "
I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc."But why are you trying to achieve this goal?It's a cleaner design than current solutions (as far as I'm aware).
Benchmarks that can be const prop'd/etc away are often meaningless.A benchmark that's completely removed is even more meaningless, and the developer may not even know it's happening.
I'm not saying this intrinsic will make all benchmarks meaningful (and I can't), I'm saying that it would be useful in Rust in ensuring that tests/benches aren't invalidated simply because a computation wasn't performed.Past that, if you want to ensure a particular optimization does a particular thing on a benchmark, ISTM it would be better to generate the IR, run opt (or build your own pass-by-pass harness), and then run "the passes you want on it" instead of "trying to stop certain passes from doing things to it".True, but why would you want to force that speed bump onto other developers? I'd argue that's more hacky than the inline asm.
I wonder the same. Richard, maybe we just need something more specific
in Rust? Like something that only clobbers memory? Or just the
variable? Seems like we could do with specialized "block_box`
functions. AIUI our `black_box` got extended to prevent more
optimizations as it became obvious that the compiler could still
"defeat" it. Maybe we need to take a step back and say "ok, we'll have
to think a bit harder and decide how hard we'll be on the optimizer"?
Is there anything that speaks against that and requires an intrinsic?
Cheers,
Björn
I don't see how this is any different from volatile markers on loads/stores or memory barriers or several other optimizer blocking devices. They generally end up crippling the optimizers without much added benefit.
Would it be possible to stop the code motion you want to block by explicitly exposing data dependencies? Or simply disabling some optimizations with pragmas?
On Tue, Nov 3, 2015 at 12:29 PM, Richard Diamond <wic...@vitalitystudios.com> wrote:On Mon, Nov 2, 2015 at 9:16 PM, Daniel Berlin <dbe...@dberlin.org> wrote:I'm very unclear and why you think a generic black box intrinsic will have any different performance impact ;-)I'm also unclear on what the goal with this intrinsic is.I understand the symptoms you are trying to solve - what exactly is the disease.IE you say "
I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc."But why are you trying to achieve this goal?It's a cleaner design than current solutions (as far as I'm aware).For what, exact, well defined goal?
Trying to make certain specific optimizations not work does not seem like a goal unto itself.It's a thing you are doing to achieve something else, right?(Because if not, it has a very well defined and well supported solutions - set up a pass manager that runs the passes you want)What is the something else?IE what is the problem that led you to consider this solution.
Benchmarks that can be const prop'd/etc away are often meaningless.A benchmark that's completely removed is even more meaningless, and the developer may not even know it's happening.Write good benchmarks?No, seriously, i mean, you want benchmarks that tests what users will see when the compiler works, not benchmarks that test what users see if the were to suddenly turn off parts of the optimizers ;)
I'm not saying this intrinsic will make all benchmarks meaningful (and I can't), I'm saying that it would be useful in Rust in ensuring that tests/benches aren't invalidated simply because a computation wasn't performed.Past that, if you want to ensure a particular optimization does a particular thing on a benchmark, ISTM it would be better to generate the IR, run opt (or build your own pass-by-pass harness), and then run "the passes you want on it" instead of "trying to stop certain passes from doing things to it".True, but why would you want to force that speed bump onto other developers? I'd argue that's more hacky than the inline asm.Speed bump? Hacky?It's a completely normal test harness?That's in fact, why llvm uses it as a test harness?
No, it just must not be optimised away. The CPU is still free to cache
it.
Joerg
The common use case I've seen for a black box like construct is when writing microbenchmarks. In particular, you're generally looking for a way to "sink" the result of a computation without having that sink outweigh the cost of the thing you're trying to measure.
Common alternate approaches are to use a volatile store (so that it can't be eliminated or sunk out of loops) or a call to an external function with a cheap calling convention.
As an example:
int a = 5; // initialization is not visible to compiler
int b = 7;
void add_two_globals()
sink(a+b)
}
If what I'm look into is the code generation around addition, this is a very useful way of testing the entire compiler - frontend, middle end, and backend.
I'll note that we use such a framework extensively.
What I'm not clear on is why this needs to be an intrinsic. Why does a call to an external function or a volatile store not suffice?
On Fri, Nov 06, 2015 at 10:31:23AM -0600, Richard Diamond via llvm-dev wrote:
> On Tue, Nov 3, 2015 at 2:50 PM, Diego Novillo <dnov...@google.com> wrote:
>
> > I don't see how this is any different from volatile markers on
> > loads/stores or memory barriers or several other optimizer blocking
> > devices. They generally end up crippling the optimizers without much added
> > benefit.
> >
>
> Volatile must touch memory (right?). Memory is slow.
No, it just must not be optimised away. The CPU is still free to cache
it.
Now, as I stated in the proposal, `test::black_box` currently uses no-op inline asm to "read" from its argument in a way the optimizations can't see. Conceptually, this seems like something that should be modelled in LLVM's IR rather than by hacks higher up the IR food chain because the root problem is caused by LLVM's optimization passes (most of the time this code optimization is desired, just not here). Plus, it seems others have used other tricks to achieve similar effects (ie volatile), so why shouldn't there be something to model this behavior?
Benchmarks that can be const prop'd/etc away are often meaningless.A benchmark that's completely removed is even more meaningless, and the developer may not even know it's happening.Write good benchmarks?No, seriously, i mean, you want benchmarks that tests what users will see when the compiler works, not benchmarks that test what users see if the were to suddenly turn off parts of the optimizers ;)But users are also not testing how fast deterministic code which LLVM is completely removing can go. This intrinsic prevents LLVM from correctly thinking the code is deterministic (or that a value isn't used) so that measurements are (at the very least, the tiniest bit) meaningful.I'm not saying this intrinsic will make all benchmarks meaningful (and I can't), I'm saying that it would be useful in Rust in ensuring that tests/benches aren't invalidated simply because a computation wasn't performed.Past that, if you want to ensure a particular optimization does a particular thing on a benchmark, ISTM it would be better to generate the IR, run opt (or build your own pass-by-pass harness), and then run "the passes you want on it" instead of "trying to stop certain passes from doing things to it".True, but why would you want to force that speed bump onto other developers? I'd argue that's more hacky than the inline asm.Speed bump? Hacky?It's a completely normal test harness?That's in fact, why llvm uses it as a test harness?I mean I wouldn't write a harness or some other type of workaround for something like this: Rust doesn't seem to be the first to have encountered this issue, thus it is nonsensical to require every project using LLVM to have a separate harness or other workaround so they don't run into this issue. LLVM's own documentation suggests that adding an intrinsic is the best choice moving forward anyway: "Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension." (from http://llvm.org/docs/ExtendingLLVM.html). That sounds perfect to me.
Now, as I stated in the proposal, `test::black_box` currently uses no-op inline asm to "read" from its argument in a way the optimizations can't see. Conceptually, this seems like something that should be modelled in LLVM's IR rather than by hacks higher up the IR food chain because the root problem is caused by LLVM's optimization passes (most of the time this code optimization is desired, just not here). Plus, it seems others have used other tricks to achieve similar effects (ie volatile), so why shouldn't there be something to model this behaviour?Benchmarks that can be const prop'd/etc away are often meaningless.A benchmark that's completely removed is even more meaningless, and the developer may not even know it's happening.Write good benchmarks?No, seriously, i mean, you want benchmarks that tests what users will see when the compiler works, not benchmarks that test what users see if the were to suddenly turn off parts of the optimizers ;)But users are also not testing how fast deterministic code which LLVM is completely removing can go. This intrinsic prevents LLVM from correctly thinking the code is deterministic (or that a value isn't used) so that measurements are (at the very least, the tiniest bit) meaningful.I'm not saying this intrinsic will make all benchmarks meaningful (and I can't), I'm saying that it would be useful in Rust in ensuring that tests/benches aren't invalidated simply because a computation wasn't performed.Past that, if you want to ensure a particular optimization does a particular thing on a benchmark, ISTM it would be better to generate the IR, run opt (or build your own pass-by-pass harness), and then run "the passes you want on it" instead of "trying to stop certain passes from doing things to it".True, but why would you want to force that speed bump onto other developers? I'd argue that's more hacky than the inline asm.Speed bump? Hacky?It's a completely normal test harness?That's in fact, why llvm uses it as a test harness?I mean I wouldn't write a harness or some other type of workaround for something like this: Rust doesn't seem to be the first to have encountered this issue, thus it is nonsensical to require every project using LLVM to have a separate harness or other workaround so they don't run into this issue. LLVM's own documentation suggests that adding an intrinsic is the best choice moving forward anyway: "Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension." (from http://llvm.org/docs/ExtendingLLVM.html). That sounds perfect to me.At anyrate, I apologize for my original hand-wavy-ness; I am young and inexperienced.
#define PUBLISH_WRITES_TO_VAR(X) __asm__ __volatile__("":: "m"((X)))
#define OBSERVE_WRITES_TO_VAR(X) __asm__ __volatile__("": "=m"((X)))
Thank you,
Steven Stewart-Gallus
<snip>
I think the fundamental thing you're missing is that benchmarks are an
exercise in if/then:
*If* a user exercises this API, *then* how well would it perform?
Of course, in the case of a user, the data could come from anywhere, and
go anywhere - the terminal, a network socket, whatever.
However, in a benchmark, all the data comes from (and goes) to places the
compiler and see.
Thus, it's necessary to make the compiler _pretend_ the data came from
and goes to a "black box", in order for the benchmarks to even *remotely*
resemble what they're meant to test.
This is actually distinct from #1, #2, _and_ #3 above - quite simply,
what is needed is a way to simulate a "real usage" scenario without
actually contacting the external world.
I think the fundamental thing you're missing is that benchmarks are an
exercise in if/then:
*If* a user exercises this API, *then* how well would it perform?
Of course, in the case of a user, the data could come from anywhere, and
go anywhere - the terminal, a network socket, whatever.
However, in a benchmark, all the data comes from (and goes) to places the
compiler and see.
Thus, it's necessary to make the compiler _pretend_ the data came from
and goes to a "black box", in order for the benchmarks to even *remotely*
resemble what they're meant to test.
This is actually distinct from #1, #2, _and_ #3 above - quite simply,
what is needed is a way to simulate a "real usage" scenario without
actually contacting the external world.
One thing that volatile doesn't do is escape results that have been written to memory.
The proposed blackbox intrinsic is modeled as reading and writing any pointed to memory, which is useful.
I also think blackbox will be a lot easier for people to use than empty volatile inline asm and volatile loads and stores. That alone seems worth something. :)
Hi Richard,
why don't you use an inline assembly that returns your argument in a register ?
For example:
----
int foo(int a, int b)
{
int c=a+b+10;
__asm__ volatile ("":"=r"(c):"0"(c):"memory");
return c+20;
}
---
results in: (Note that the +10 and +20 were not combined)
---
foo: # @foo
.cfi_startproc
# BB#0:
leal 10(%rdi,%rsi), %eax
#APP
#NO_APP
addl $20, %eax
retq
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.cfi_endproc
--
At llvm-ir level, it looks like:
---
define i32 @foo(i32 %a, i32 %b) #0 {
%1 = add i32 %a, 10
%2 = add i32 %1, %b
%3 = tail call i32 asm sideeffect "", "=r,0,~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %2) #1, !srcloc !1
%4 = add nsw i32 %3, 20
ret i32 %4
}
---
Greetings,
Jeroen Dobbelaere
From: llvm-dev [mailto:llvm-dev...@lists.llvm.org]
On Behalf Of Richard Diamond via llvm-dev
Sent: Tuesday, November 03, 2015 12:58 AM
To: llvm...@lists.llvm.org
Subject: [llvm-dev] [RFC] A new intrinsic, `llvm.blackbox`, to explicitly prevent constprop, die, etc optimizations
Hey all,
I'd like to propose a new intrinsic for use in preventing optimizations from deleting IR due to constant propagation, dead code elimination, etc.
# Background/Motivation
I think the idea is to model the intrinsic as a normal external function call:
- Can read/write escaped memory
- Escapes pointer args- Functionattrs cannot infer anything about it- Returns a pointer which may alias any escaped data
On Wed, Nov 11, 2015 at 10:32 AM, Reid Kleckner <r...@google.com> wrote:I think the idea is to model the intrinsic as a normal external function call:- Can read/write escaped memory- Escapes pointer args- Functionattrs cannot infer anything about it- Returns a pointer which may alias any escaped dataAs you point out so nicely, there is already a list of stuff that external function calls may do, but we may be able to prove things about them anyway due to attributes, etc.So it's not just an external function call, it's a super-magic one.
Now, can we handle that?
Sure.For example, i can move external function calls if i can prove things about their dependencies, and the above list is not sufficient to prevent me from moving (or PRE'ing) most of the blackbox calls that just take normal non-pointer args.Is that going to be okay?(Imagine, for example, LTO modes where i can guarantee i have the entire program, etc.You still want blackbox to be magically special in these modes, even though nothing else is).
On Wed, Nov 11, 2015 at 10:41 AM, Daniel Berlin <dbe...@dberlin.org> wrote:On Wed, Nov 11, 2015 at 10:32 AM, Reid Kleckner <r...@google.com> wrote:I think the idea is to model the intrinsic as a normal external function call:- Can read/write escaped memory- Escapes pointer args- Functionattrs cannot infer anything about it- Returns a pointer which may alias any escaped dataAs you point out so nicely, there is already a list of stuff that external function calls may do, but we may be able to prove things about them anyway due to attributes, etc.So it's not just an external function call, it's a super-magic one.Right, an external function, with a definition that the compiler will never find.Now, can we handle that?
Sure.For example, i can move external function calls if i can prove things about their dependencies, and the above list is not sufficient to prevent me from moving (or PRE'ing) most of the blackbox calls that just take normal non-pointer args.Is that going to be okay?(Imagine, for example, LTO modes where i can guarantee i have the entire program, etc.You still want blackbox to be magically special in these modes, even though nothing else is).Sure, the compiler can reorder all the memory accesses to non-escaped memory as it sees fit across the barrier. That's part of the normal modelling of external calls.I don't know how you could CSE it, though.
Any call you can't reason about
can always use inline asm to talk to external devices or issue a write syscall.
I don't know how you could practically deploy a super-duper LTO mode that doesn't allow that as part of its model.
The following CFG simplification would be legal, as it also fits the normal model of an external call:if (cond) y =llvm.blackbox(x)else y = llvm.blackbox(x)-->y = llvm.blackbox(x)I don't see how this is special. It just provides an overloaded intrinsic whose definition we promise to never reason about. Other than that it follows the same familiar rules that function calls do.
I do agree this is a concern.
>> I don't know how you could practically deploy a super-duper LTO mode
>> that doesn't allow that as part of its model.
>>
>>
> Sure.
>
>
>> The following CFG simplification would be legal, as it also fits the
>> normal model of an external call:
>> if (cond) y =llvm.blackbox(x)
>> else y = llvm.blackbox(x)
>> -->
>> y = llvm.blackbox(x)
>>
>> I don't see how this is special. It just provides an overloaded
>> intrinsic whose definition we promise to never reason about. Other than
>> that it follows the same familiar rules that function calls do.
>>
>>
> You have now removed some conditional evaluation and jumps. those
> would normally take benchmark time.
> Why is that okay?
Because the original post in terms of wanting to inhibit specific
optimizations was a flawed way of describing the problem.
Reid's explanation of "an external function that LLVM is not allowed to
reason about the body of" is a much better explanation, as a good
benchmark will place llvm.blackbox() exactly where real code would call,
say, getrandom() (on input) or printf() (on output).
However, as the function call overhead of said external function isn't
part of the _developer's_ code, and not something they can make faster in
case of slow results, it's not relevant to the benchmarks - thus, using
an _actual_ external function is suboptimal, even leaving aside that with
LTO and such, llvm may STILL infer things about such functions, obviating
the benchark.
Perhaps the best explanation is that it's about *simulating the
existence* of a "perfectly efficient" external world.
Reid's explanation of "an external function that LLVM is not allowed to
reason about the body of" is a much better explanation, as a good
benchmark will place llvm.blackbox() exactly where real code would call,
say, getrandom() (on input) or printf() (on output).
However, as the function call overhead of said external function isn't
part of the _developer's_ code,
>>
>>
>> Reid's explanation of "an external function that LLVM is not allowed to
>> reason about the body of" is a much better explanation, as a good
>> benchmark will place llvm.blackbox() exactly where real code would
>> call,
>> say, getrandom() (on input) or printf() (on output).
>>
>>
>
>> However, as the function call overhead of said external function isn't
>> part of the _developer's_ code,
>
>
> This isn't call overhead though.
> It's a conditional and two calls someone wrote in some benchmark code.
> That's not call overhead ;-)
I meant the prologue/epilogue of the external function, but James'
response is relevant there.
> It's just that i've proven the condition has no side effects and doesn't
> matter, so i eliminated it.
Yes. That's perfectly fine. You could do the exact same thing with
getrandom()'s result, or printf() calls.
That was my point.
> Thus, I'm trying to ask the question: "Will the use case really still be
> served if we let us eliminate these conditionals as useless, when the
> whole point is to let people test the overhead of things the compiler
> wanted to eliminate because it thinks they are useless"
> ;-)
And my answer was "Yes, emphatically so, as you're continually restating
what I consider a deeply flawed summation of what it's trying to solve."
> the whole point is to let people test the overhead of things the
> compiler wanted to eliminate because it thinks they are useless"
This is _incorrect_.
The point is to _model the behavior of the benchmarked code AS IF the
data goes to and comes from a place we know nothing about_.
These are fundamentally different things, which is why I keep restating
it.
This is the wrong question. The correct question is: What useful
benchmark cannot trivally factor out the overhead of the external
function call. Yes, if you do microbenchmarking it can be measurable.
But the point is that the overhead should be extremely predictable and
stable. As such, it can be easily calibrated and removed from the cost
of whatever you are really trying to measure. Given that the
instrumentation in general has some latency, you won't get around
calibration anyway.
Joerg
Hey all,
I apologize for my delay with my reply to you all (two tests last week, with three more coming this week).
I appreciate all of your inputs. Based on the discussion, I’ve refined the scope and purpose of llvm.blackbox, at least as it pertains to Rust’s desired use case. Previously, I left the intrinsic only vaguely specified, and based on the resulting comments, I’ve arrived at a more well defined intrinsic.
Specifically:
All other optimizations are fair game.
The above is a bit involved to be sure, and seeing how this intrinsic isn’t critical, I’m fine with leaving it at the worse case (ie read/write mem + other side effects) for now.
Alex summed it up best: “[..] it’s about simulating the existence of a “perfectly efficient” external world.” This intrinsic would serve as an aid for benchmarking, ensuring benchmark code is still relevant after optimizations are performed on it, and is an attempt to create a dedicated escape hatch to be used in place of the alternatives I’ve listed below.
In no particular order:
Not ideal for benchmarking (isn’t guaranteed to cache), nonetheless I made an attempt to measure the effects on Rustc’s set of benchmarks. However, I found an issue with rustc which blocks attempts to measure the effect: https://github.com/rust-lang/rust/issues/29663.
Rust’s current solution. Needs stack space.
Won’t work for any type which is bigger than a register; at least not without creating a rustc intrinsic anyway to make the asm operate piecewise on the register native component types of the type if need be. And how is rustc to know exactly which are the register sized or smaller types? rustc mostly leaves such knowledge to LLVM.
Good idea, but the needed logistics would make it ugly.
test::black_box as noinlineAlso not ideal because of the mandatory function call overhead.
Impossible for Rust; generics are monomorphised into the crate in which it is used (ie the resulting function in IR won’t ever be external to the module using it), thus making this an impossible solution for Rust. Also, Rust doesn’t allow function overloading, so C++ style explicit specialization is also out. Also suffers from the same aforementioned call overhead.
Again, comments are welcome.
Richard Diamond
In microbenchmarks, performance is not additive. You can't compose
two pieces of code and predict that the benchmark results will be the
sum of the individual measurements.
Dmitri
--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <grib...@gmail.com>*/
On Mon, Nov 16, 2015 at 10:03 AM, James Molloy via llvm-dev
<llvm...@lists.llvm.org> wrote:
> You don't appear to have addressed my suggestion to not require a perfect
> external world, instead to measure the overhead of an imperfect world (by
> using an empty benchmark) and subtracting that from the measured benchmark
> score.
In microbenchmarks, performance is not additive. You can't compose
two pieces of code and predict that the benchmark results will be the
sum of the individual measurements.
Any benchmarking, and especially microbenchmarking, should not be
primarily about measuring the relative performance change. It is a
small scientific experiment, where you don't just get numbers -- you
need to have an explanation why are you getting these numbers. And,
especially for microbenchmarks, having an explanation, and a way to
validate it, as well as one's assumptions, is critical.
In large system benchmarks performance is not additive either -- when
you have multiple subsystems, cores and queues. But this does not
mean that system-level benchmarks are not useful. As any benchmarks,
they need interpretation.
The thing is, the black box function one can implement in the language
would not be a perfect substitute for the real producer or consumer.
I don't know about Rust, but in other high-level languages,
implementing a black box as a generic function might cause an overhead
due to the way generic functions are implemented, higher than the
overhead of a regular function call. For example, the value might
need to be moved to the heap to be passed as an unconstrained generic
parameter. This wouldn't be the case in real code, where the function
would be non-generic, and possibly even inlined into the callee.
Dmitri
--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <grib...@gmail.com>*/
If such a function can be implemented with an IR-level intrinsic, it
can be inlined (removing all the call overhead), but still keep the
opaque semantics.
Dmitri
--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <grib...@gmail.com>*/