ISPC as a shader language alternative?

427 views
Skip to first unread message

Syoyo Fujita

unread,
Oct 1, 2013, 1:09:54 AM10/1/13
to ispc-...@googlegroups.com
Hello ISPC-ers,

I'm been thinking ISPC as a shader language(such like RenderMan
shading language) alternative, and there's a demand to call C/C++
function such like texture() and trace() from ISPC, like this:

typedef float<4> vector;

struct ShadeSample
{
// output
vector Oi;
vector Ci;

// input
vector Cs;
vector Os;
float u;
float v;
};

export void
shader(uniform struct ShadeSample samples[], uniform int n)
{

}


I know this breaks SIMD efficiency but its very handy for traditional
C/C++/RenderMan SL shader writers.

Questions are:

1) Is there a better way to call C/C++ function from ISPC?

2) Is it possible to pass (array of) struct to C/C++ function from
ISPC? I tried it and sometimes it seems works but sometimes not.
(Calling C/C++ function with struct argument is not covered in ISPC
documentation: http://ispc.github.io/ispc.html)

Thanks in advance.

Syoyo Fujita

unread,
Oct 1, 2013, 1:15:30 AM10/1/13
to ispc-...@googlegroups.com
Oops, I sent WIP text... sorry.

shader() function will be something like this:


extern "C" uniform vector texture(uniform float u, uniform float v);

export void
shader(uniform struct ShadeSample samples[], uniform int n)
{
foreach (i = 0 ... n) {
samples[i].Ci = texture(samples[i].u, samples[i].v);

Dmitry Babokin

unread,
Oct 1, 2013, 2:07:51 PM10/1/13
to ispc-...@googlegroups.com
1) I think no. Any ideas what would be a better way?

2) It should work. Could you submit an example when it doesn't work?

Dmitry


--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Syoyo Fujita

unread,
Oct 2, 2013, 12:35:31 PM10/2/13
to ispc-...@googlegroups.com
Hello Dmitry,

On Wed, Oct 2, 2013 at 3:07 AM, Dmitry Babokin <bab...@gmail.com> wrote:
> 1) I think no. Any ideas what would be a better way?

For example support of "varying" argument in the external
function(i.e., mostly identical to C function).
It seems varying is not yet supported for external function.

Imagine incoherent raytracing example in ISPC. In this case, trace()
will be called with incoherent manner, and "uniform" calling with SIMD
lane masking doesn't work well.

> 2) It should work. Could you submit an example when it doesn't work?

Here it is,

https://github.com/syoyo/ispc/tree/shader/examples/shader

Tested with ispc 1.5.1(dev) on Mountain Linon.

In default config, at least it runs(no seg fault), but got 0.0 instead
0.5 for Ci[0].
I also tried pass-by-reference or pass-by-value argument for external
function but it cause a compile error or seg faults.

I don't know this is a bug or due to ISPC's language restriction.

Thanks in advance,

Syoyo

Dmitry Babokin

unread,
Oct 3, 2013, 8:17:19 AM10/3/13
to ispc-...@googlegroups.com
Hi Syoyo,

Supporting "varying" in external functions is being discussed now. But there are several issues with it. The most important thing is that it's platform dependent and makes the solution not portable between different hardware platforms. I.e. C/C++ part of the solution would need to be different for SSE and AVX. Moreover, we won't be able to generate "multi-target" object files, as they provide single entry point, with auto-dispatch inside, which calls different implementation. If you have an idea how to design a solution without this drawbacks, would be interesting to hear about it.

As for the fail, I'll have a look at it.

Dmitry Babokin

unread,
Oct 3, 2013, 9:28:56 AM10/3/13
to ispc-...@googlegroups.com
Syoyo,

Regarding the fail. It seems to work for me. i tried it on my MacBook Pro with LLVM 3.3 and gcc4.8. It produced an image with nice gradient (attached).

Why are you producing .bc files and loading them dynamically? Why not producing object files and linking them as usual?

When doing it "dynamically" and running through JIT, I would expect all kind of problems with mismatching target information.

To debug the problem, I would suggest writing several tests, which fill the data structures on ISPC side and reading them on C++ side to understand if the layout is matching. And probably reading beyond structure boundaries to understand that it's represented exactly as you expecting it.

Dmitry.


output.ppm

Syoyo Fujita

unread,
Oct 3, 2013, 11:24:26 AM10/3/13
to ispc-...@googlegroups.com
Hello Dmitry,

> Regarding the fail. It seems to work for me. i tried it on my MacBook Pro
> with LLVM 3.3 and gcc4.8. It produced an image with nice gradient
> (attached).

Actually, it fails. Red channel should have 0.5(or 127 in final pixel color).
(PPM is too large to attach so I didn't attach reference/bug PPM image :-) )

As I commented in shader.ispc:

// shader.ispc
foreach (i = 0 ... n) {
varying struct ShadeSample ss = samples[i];
muda(&ss, &ss.Cs); // OK

samples[i].Ci[0] = ss.Ci[0]; // expects 0.5, but get 0.0
}

ss.Ci[0] should have 0.5 as done in muda() function,

// main.cc
static void muda(
struct ispc::ShadeSample* env,
ispc::float4* org)
{
env->Ci.v[0] = 0.5;
}

but it returns unexpected 0.0. even though I am passing struct by pointer.
If it doesn't work, we cannot return value of struct type from C/C++
world to ISPC world.



> Why are you producing .bc files and loading them dynamically? Why not
> producing object files and linking them as usual?

This enables live interactive editing of shader language, something
like I am doing it with plain C/C++ + clang/LLVM:

https://docs.google.com/file/d/0B8AXAVUQ9BiKaU13WWM4dG9aYzg/edit?usp=sharing

>
> When doing it "dynamically" and running through JIT, I would expect all kind
> of problems with mismatching target information.

Hmm... if so, it would be a problem of LLVM, not ISPC.

> To debug the problem, I would suggest writing several tests, which fill the
> data structures on ISPC side and reading them on C++ side to understand if
> the layout is matching. And probably reading beyond structure boundaries to
> understand that it's represented exactly as you expecting it.

I would like to do it, but in this case it fails to compile or seg
fault before it gets run :-<
(See comment in shader.ispc carefully. And I also gave a comment for
exected value)

--
Syoyo

Dmitry Babokin

unread,
Oct 4, 2013, 2:42:56 PM10/4/13
to ispc-...@googlegroups.com
Syoyo,

Now I understand the problem. You declare "varying struct ShadeSample". Varying structs is laid out like it's a struct of varyings. I.e. for AVX you'll have:
struct {
    float4 Oi[8];
    float4 Ci[8];
    float4 Cs[8];
    float4 Os[8];
    float u[8];
    float v[8];
};

So C code assumes that the pointer points to uniform struct, while it's varying of this kind. So the C code just does incorrect assignment.

Hint: to debug this kind of issues print out addresses of variables on both C and ISPC side. In ISPC use "print()" function with printf semantics with only difference that you don't need to specify the type after %, for example: print("ISPC value is %\n", value);


>> When doing it "dynamically" and running through JIT, I would expect all kind
>> of problems with mismatching target information.

>Hmm... if so, it would be a problem of LLVM, not ISPC.

Ok, I never played with LLVM JIT, I hope they handle it correctly. My concern is that LLVM bitcode is not designed as a portable format (as many people think of it). For example, they specifically notice in documentation that datalayout string guarantees nothing, target machine (i.e. code gen) is expected to agree with datalayout string, but could actually diverge from it. Probably it's my misunderstanding of the design. If it work - then ok :)


You video is totally awesome! Thanks for sharing it!

Dmitry.

Syoyo Fujita

unread,
Oct 7, 2013, 6:54:12 AM10/7/13
to ispc-...@googlegroups.com
Hello Dmitry,

> Now I understand the problem. You declare "varying struct ShadeSample".
> Varying structs is laid out like it's a struct of varyings. I.e. for AVX
> you'll have:
> struct {
> float4 Oi[8];
> float4 Ci[8];
> float4 Cs[8];
> float4 Os[8];
> float u[8];
> float v[8];
> };
>
> So C code assumes that the pointer points to uniform struct, while it's
> varying of this kind. So the C code just does incorrect assignment.

Ah, Okay. I think I am getting how ISPC handles struct in C/C++ boundary.
I'll digg into ISPC internal.

>>> When doing it "dynamically" and running through JIT, I would expect all
>>> kind
>>> of problems with mismatching target information.
>
>>Hmm... if so, it would be a problem of LLVM, not ISPC.
>
> Ok, I never played with LLVM JIT, I hope they handle it correctly. My
> concern is that LLVM bitcode is not designed as a portable format (as many
> people think of it). For example, they specifically notice in documentation
> that datalayout string guarantees nothing, target machine (i.e. code gen) is
> expected to agree with datalayout string, but could actually diverge from
> it. Probably it's my misunderstanding of the design. If it work - then ok :)

Usually LLVM bitcode is portable across OS on same CPU architecture,
but as you pointed out, one problem I have already faced is
packing/layout of structure on different OS-es and compilers.
I've avoid this problem using explicit padding(adding dummy variables
to struct).

Other mostly works well. At least on x86 platform, I can run LLVM
bitcode on windows which is compiled on linux or darwin.

> You video is totally awesome! Thanks for sharing it!

Thanks!
If ISPC version works well by fixing the problem discussed here,
things could be much more nicer and it would be good example of ISPC
use!

Dmitry Babokin

unread,
Oct 7, 2013, 10:41:39 AM10/7/13
to ispc-...@googlegroups.com
>Usually LLVM bitcode is portable across OS on same CPU architecture,
>but as you pointed out, one problem I have already faced is
>packing/layout of structure on different OS-es and compilers.
>I've avoid this problem using explicit padding(adding dummy variables
>to struct).

>Other mostly works well. At least on x86 platform, I can run LLVM
>bitcode on windows which is compiled on linux or darwin.

Unfortunately, bitcode is not portable between OSes in general case (even on the same CPU).
It may work though. And the problem is not only datalayout (default alignments are different), but
different function calling ABI. I.e. for simple cases it will work the most likely, but for multiple
parameters (especially structure types, floating point values and hardware vector register types will
definitely break the the code). And this is a problem as ABI lowering happens in clang (i.e. in FE,
when IR is generated), but not in codegen. So, be careful calling library and other external functions.

>If ISPC version works well by fixing the problem discussed here,
>things could be much more nicer and it would be good example of ISPC
>use!

Thanks for bringing attention to this problem, we will definitely work on it.

Dmitry.

Syoyo Fujita

unread,
Oct 7, 2013, 11:54:29 AM10/7/13
to ispc-...@googlegroups.com
Hello Dmitry,

>>Other mostly works well. At least on x86 platform, I can run LLVM
>>bitcode on windows which is compiled on linux or darwin.
>
> Unfortunately, bitcode is not portable between OSes in general case (even on
> the same CPU).
> It may work though. And the problem is not only datalayout (default
> alignments are different), but
> different function calling ABI. I.e. for simple cases it will work the most
> likely, but for multiple
> parameters (especially structure types, floating point values and hardware
> vector register types will
> definitely break the the code). And this is a problem as ABI lowering
> happens in clang (i.e. in FE,
> when IR is generated), but not in codegen. So, be careful calling library
> and other external functions.

Yeah! my 10 years of PPC/AltiVec, x86/SSE, Cell/SPE, and other SIMD
architecture coding experience shows ABI incompatibility is annoying
problem.

Possibly only one solution I've found is pass everything by pointer
and don't use return value. e.g.

vec4 muda(vec4 a, float b, struct bora c);
->
void muda(vec4* ret, vec4* a, float* b, struct bora* c);

I use this technique to C/C++ JIT shader, MUDA compiler experience(
http://lucille.sourceforge.net/muda/ ), and other production
code(Mac/Linux/Win) using LLVM I'm developing. At least it has been
working well.

Even though this might not produce efficient code(Calling convention
might force the compiler to produce register-to-memory copy for vector
value, for example), but usually heavy compute kernel is resident in
ISPC(or other SIMD language) world, so inefficient code at C/C++
boundary won't be critical to the entire performance.


>>If ISPC version works well by fixing the problem discussed here,
>>things could be much more nicer and it would be good example of ISPC
>>use!
>
> Thanks for bringing attention to this problem, we will definitely work on
> it.

Also, it would be interesting if ISPC JIT shader could be combined
with embree ray tracing kernel.

--
Syoyo
Reply all
Reply to author
Forward
0 new messages