Plans for SIMD?

291 views
Skip to first unread message

Soeren Balko

unread,
Mar 27, 2014, 9:01:23 PM3/27/14
to native-cli...@googlegroups.com
My (legacy) application, which I successfully ported to PNaCl contains some IFDEF-ed assembly for processor-specific SIMD instructions (MMX, SSE, AVX and friends). As these aren't supported in PNaCl, the performance of my PNaCl module suffers quite substantially (~5x slower than native code). I saw that SIMD support is on PNaCl's roadmap and was wondering how this is going to be implemented. Specifically, what will be the programming model? Along the lines of the gcc (or llvm) SIMD compiler intrinsics? Or rather like a platform-neutral set of vector types and basic operations (like the Javascript SIMD extensions)? Or even more abstract and proprietary to PNaCl like the SIMD libraries from Intel and AMD?

And is there any timely horizon when this might be available in PNaCl?

Thanks!
Soeren

JF Bastien

unread,
Mar 27, 2014, 9:31:50 PM3/27/14
to native-cli...@googlegroups.com
Hi Soeren,

I'm currently working on adding support for the GCC/LLVM vector extensions to PNaCl as a starting point, the advantage here is that we don't need to define a special syntax and it allows us to support most types we want, while being well supported on most hardware platforms. I expect that this will be in the canary SDK toolchain in ~2 weeks, and will get to Chrome stable in version 36.

After this we'll add portable intrinsics to support vector operations, some of which will also make it to Chrome version 36, and more rolling out later. They won't be quite like x86 or quite like ARM since we want portability, but we want to be able to hit great performance on all platforms, even MIPS. We'll also put effort towards making the APIs we offer pretty familiar, so that the porting costs are as small as possible (in some cases we expect to create portable SIMD libraries, or add PNaCl SIMD support to existing libraries so that the porting cost is zero for users of those libraries). We have fairly demanding users so performance should get better as we tune these further, we nonetheless expect good performance gains from the start.

Once we have an acceptable level of support for hand-written SIMD we'll also experiment with auto-vectorization on the toolchain side, so that optimized builds will get performance gains even without hand-writing SIMD.

Hope this helps,

JF


--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-di...@googlegroups.com.
To post to this group, send email to native-cli...@googlegroups.com.
Visit this group at http://groups.google.com/group/native-client-discuss.
For more options, visit https://groups.google.com/d/optout.

Soeren Balko

unread,
Mar 27, 2014, 10:46:55 PM3/27/14
to native-cli...@googlegroups.com
Fantastic - thanks for the detailed response! Can't wait to see it landing in canary!

Soeren
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsub...@googlegroups.com.

Soeren Balko

unread,
Apr 23, 2014, 7:15:46 AM4/23/14
to native-cli...@googlegroups.com
Just curious: are the GCC/LLVM vector extensions by now available in pepper_canary? I have been looking out for a header file defining the PNaCl-specific "generic" SIMD intrinsics but wasn't able to find one. 

The native client source has a file simd.h, which comes with a fairly comprehensive set of "builtin" intrinsics. Will these be the ones that make it into pepper?

Soeren 

On Friday, March 28, 2014 11:31:50 AM UTC+10, JF Bastien wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsub...@googlegroups.com.

JF Bastien

unread,
Apr 23, 2014, 10:55:49 AM4/23/14
to native-cli...@googlegroups.com
Just curious: are the GCC/LLVM vector extensions by now available in pepper_canary? I have been looking out for a header file defining the PNaCl-specific "generic" SIMD intrinsics but wasn't able to find one. 

The native client source has a file simd.h, which comes with a fairly comprehensive set of "builtin" intrinsics. Will these be the ones that make it into pepper?

That file won't be part of the SDK until we've finalized the API it exposes.

The GCC/LLVM vector extensions don't require any headers though. I fixed one outstanding bug yesterday which hasn't made its way to canary yet. It looks like the NaCl version in Chrome hasn't been updated for a while, I'll try to make sure this happens today, and the vector extensions should then be usable. Example:

#include <stdint.h>
#include <stdio.h>

typedef uint32_t v32 __attribute__((vector_size(128)));

__attribute__((noinline))
void add(v32 *dst, const v32 *lhs, const v32 *rhs) {
  *dst = *lhs + *rhs;
}

int main(void) {
  v32 a = { 1, 2, 3, 4 };
  v32 b = { 5, 6, 7, 8 };
  v32 res;
  add(&res, &a, &b);
  for (size_t i = 0; i < 4; ++i)
    printf("%u\n", res[i]);
  return 0;
}

This is of course just a simple start, we'll add much more in the near future. You can follow along on the bug:

Sören Balko

unread,
Apr 24, 2014, 8:33:36 AM4/24/14
to native-cli...@googlegroups.com
Fantastic! That’s (almost) everything I need for now. Using some quirks, I think I can work around the missing intrinsics (like for _mm_min_pd, _mm_max_pd, _m_psadbw, etc. from xmmintrin.h/emmintrin.h) and still be faster than the scalar operations.

Other than that, I wonder if you can provide compatibility wrappers for the non-portable x86 SIMD (for now, the 128 bit MMX and SSE variants) intrinsics. I understand that portability is a design goal. Then again, most legacy code using platform-specific SIMD intrinsics would probably have an x86 variant. Not sure to what extent the NEON, AltiVec and whatever-the-MIPS-SIMD-instruction-set is named can be mapped to MMX/SSE or had to be emulated.

And btw, is this now also supported in the current Chrome Canary?

On a remotely related note: are there plans to also support OpenCL in PNaCl at some point? WebCL does not seem to be going anywhere (happy to be educated otherwise) and doing GPGPU based on OpenGL ES/WebGL is somewhat painful.

Thanks!
Soeren


--
You received this message because you are subscribed to a topic in the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/native-client-discuss/CgLtRlGIDIU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to native-client-di...@googlegroups.com.

To post to this group, send email to native-cli...@googlegroups.com.
Visit this group at http://groups.google.com/group/native-client-discuss.
For more options, visit https://groups.google.com/d/optout.

Soeren Balko, PhD
Founder & Director
zfaas Pty Ltd
Brisbane, QLD
Australia



JF Bastien

unread,
Apr 24, 2014, 12:04:25 PM4/24/14
to native-cli...@googlegroups.com

Fantastic! That’s (almost) everything I need for now. Using some quirks, I think I can work around the missing intrinsics (like for _mm_min_pd, _mm_max_pd, _m_psadbw, etc. from xmmintrin.h/emmintrin.h) and still be faster than the scalar operations.

Our plan is to support min/max in a similar way, and to add explicit support for saturation. This will probably come in the release after this one.
 
Other than that, I wonder if you can provide compatibility wrappers for the non-portable x86 SIMD (for now, the 128 bit MMX and SSE variants) intrinsics. I understand that portability is a design goal. Then again, most legacy code using platform-specific SIMD intrinsics would probably have an x86 variant. Not sure to what extent the NEON, AltiVec and whatever-the-MIPS-SIMD-instruction-set is named can be mapped to MMX/SSE or had to be emulated.

Yes, this is also on our roadmap, but getting the basic support in first takes priority, so it'll take a short while for us to write the compatibility shims.

And btw, is this now also supported in the current Chrome Canary?

Unfortunately not :-(
There's an unrelated bug blocking the NaCl → Chrome dependency update, we think it'll get fixed today.

On a remotely related note: are there plans to also support OpenCL in PNaCl at some point? WebCL does not seem to be going anywhere (happy to be educated otherwise) and doing GPGPU based on OpenGL ES/WebGL is somewhat painful.

Point taken. Our current plan with regards to GPU compute is to support the same thing Chrome does in general (since it requires little work from our team), but as we get PNaCl to be more feature-complete we may end up driving other "native code on the web" initiatives such as these.

Soeren Balko

unread,
May 2, 2014, 1:05:15 AM5/2/14
to native-cli...@googlegroups.com
I haven't had luck compiling a simple piece of code using vectors using the latest pepper_canary toolchain:

typedef uint32_t v32_t __attribute__((vector_size(128)));

int main(void) {
v32_t v32 = (v32_t) {1, 2, 3, 4};
}

Instead, I get this:

Function main disallowed: bad result type:   %.compoundliteral.bc = bitcast i8* %.compoundliteral to <32 x i32>*
Function main disallowed: bad pointer:   store <32 x i32> <i32 1, i32 2, i32 3, i32 4, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>, <32 x i32>* %.compoundliteral.bc, align 1
Function main disallowed: bad result type:   %.compoundliteral.bc1 = bitcast i8* %.compoundliteral to <32 x i32>*
Function main disallowed: bad pointer:   %0 = load <32 x i32>* %.compoundliteral.bc1, align 1
Function main disallowed: bad result type:   %v32.bc = bitcast i8* %v32 to <32 x i32>*
Function main disallowed: bad pointer:   store <32 x i32> %0, <32 x i32>* %v32.bc, align 1
LLVM ERROR: PNaCl ABI verification failed

My interpretation is that vector support hasn't arrived in the officially distributed toolchains. The README from pepper_canary says this:

Native Client Tools Bundle
Version: 36
Chrome Revision: 266749
Native Client Revision: 13018
Build Date: 2014/04/28 23:49:09

And this seems to be the latest version there is. I also tried setting up the development code line locally (following instructions from here: http://www.chromium.org/nativeclient), but had only very modest success (in fact, even gclient sync fails unless I tweak the subversion URLs to use https - however building with scons fails entirely). 

I would assume, vector support simply hasn't arrived in the toolchain version I have installed, no?

Soeren

Andrey Khalyavin

unread,
May 2, 2014, 1:27:10 AM5/2/14
to native-cli...@googlegroups.com
In vector declarations vector_size is measured in bytes, not bits. http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

-- Andrey Khalyavin

Soeren Balko

unread,
May 2, 2014, 5:23:43 AM5/2/14
to native-cli...@googlegroups.com
You are right, of course. After changing it to vector_size(16), it still doesn't compile, though.

JF Bastien

unread,
May 2, 2014, 12:14:55 PM5/2/14
to native-cli...@googlegroups.com
Hi Soeren,

Unfortunately we've had a few unrelated failures which have held things up. The changes are currently in Chrome (all except the shuffle changes, which will come in soon), but the SDK build broken on Windows. We're working on it right now, and it should be the last holdup before vectors make it to the canary SDK. Sorry for the holdup, it was quite unexpected!

JF


--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-di...@googlegroups.com.

Soeren Balko

unread,
May 3, 2014, 9:57:50 PM5/3/14
to native-cli...@googlegroups.com
Hi,

Thanks heaps - it's in pepper_canary since yesterday - great work! I had some initial success with some toy examples, which do compile. One useful addition was to support the ternary ?: operator with vector types, which works in g++ (not in gcc for some reason) - see here: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

test_simd.c:15:15: error: used type 'char __attribute__((ext_vector_type(16)))' where arithmetic, pointer, or vector
      type is required
vecu8_t m = v1>v2 ? v1-v2 : v2-v1; // works in g++

The error message seems to indicate that vector types are supported - but for some reason it still refuses to compile. m, v1, and v2 are vecu8_t, defined like so:

typedef uint8_t vecu8_t __attribute__((vector_size(16)));
 
Not sure if this is supposed to work in PNaCl and it is not a major show stopper at any rate, but would help me save some operations, hence optimize the code. The alternative is this:

vecu8_t result = v1>v2;
vecu8_t m1 = result & (v1-v2), m2 = ~result & (v2-v1);
vecu8_t m = m1+m2;

which would probably be more costly.

Any thoughts?

Soeren

To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsub...@googlegroups.com.

JF Bastien

unread,
May 4, 2014, 2:01:25 AM5/4/14
to native-cli...@googlegroups.com


> Thanks heaps - it's in pepper_canary since yesterday - great work! I had some initial success with some toy examples, which do compile. One useful addition was to support the ternary ?: operator with vector types, which works in g++ (not in gcc for some reason) - see here: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

That's a limitation of Clang, probably not hard to fix but either approach should generate the same code, otherwise it's a performance bug that should be fixed too.

We're finishing up a few more things for the Chrome 36 release, I can look at this afterwards. It should only be a frontend change, so Chrome version won't matter much (any fix there will be usable from pepper canary, and work in older Chrome).

Soeren Balko

unread,
May 4, 2014, 6:57:27 AM5/4/14
to native-cli...@googlegroups.com
If it's merely syntactic sugar, it may not be worthwhile in the first place. 

On a different note, I could compile some code with added vector instructions, but was unable to run it in Chrome Canary (Mac, 36.0.1973.2 canary). I get an error on the Javascript console stating "NativeClient: PnaclCoordinator: PNaCl Translator Error: Unknown type code in type table: 12" which I assume means that this version of Chrome doesn't have vector support as yet...

JF Bastien

unread,
May 4, 2014, 11:50:47 AM5/4/14
to native-cli...@googlegroups.com


> If it's merely syntactic sugar, it may not be worthwhile in the first place. 
>
> On a different note, I could compile some code with added vector instructions, but was unable to run it in Chrome Canary (Mac, 36.0.1973.2 canary). I get an error on the Javascript console stating "NativeClient: PnaclCoordinator: PNaCl Translator Error: Unknown type code in type table: 12" which I assume means that this version of Chrome doesn't have vector support as yet...

I assume so, but to be sure do you have a way we can reproduce this?

JF Bastien

unread,
May 4, 2014, 1:45:15 PM5/4/14
to native-cli...@googlegroups.com

> If it's merely syntactic sugar, it may not be worthwhile in the first place. 
>
> On a different note, I could compile some code with added vector instructions, but was unable to run it in Chrome Canary (Mac, 36.0.1973.2 canary). I get an error on the Javascript console stating "NativeClient: PnaclCoordinator: PNaCl Translator Error: Unknown type code in type table: 12" which I assume means that this version of Chrome doesn't have vector support as yet...

I assume so, but to be sure do you have a way we can reproduce this?

Just to clarify, 12 is indeed naclbitc::TYPE_CODE_VECTOR so it does look like the error you'd get.

Soeren Balko

unread,
May 4, 2014, 9:17:58 PM5/4/14
to native-cli...@googlegroups.com
I tried with the simple-most program conceivable - please see here for the source: http://pastebin.com/D11C6GdA

The versions of Chrome I tried are Chromium 36.0.1975.0 (268124) and Chrome 36.0.1974.2 canary (all on OSX). Will try on Ubuntu later today to confirm. I reckon this is a simple issue of the feature not having progressed to the canary builds.

Soeren

Derek Schuff

unread,
May 6, 2014, 12:17:20 PM5/6/14
to native-cli...@googlegroups.com
Hi Soeren,
The PNaCl translator component is updated separately from the rest of Chrome. The dev and canary channels should now be getting the updates automatically. Check your PNaCl translator version in chrome://nacl and look for a version >= 0.1.0.13104. If you don't have that you can try checking chrome://components and click the "check for update" button under pnacl (then you'll just have to wait a bit: it's not a fantastic UI) but you should get the update fairly quickly then.
--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-di...@googlegroups.com.

Soeren Balko

unread,
May 7, 2014, 3:29:25 AM5/7/14
to native-cli...@googlegroups.com, dsc...@google.com
Thanks, that has helped - wasn't aware the pnacl has its own update lifecycle, definitely good to know!

I got a bit further and could successfully run simple vectorized code. However, there seems to be a bug related to compiling with -O3. 

When I compile the code here: http://pastebin.com/YNJbJ1v8 with -O3 like so:

../nacl_sdk/pepper_canary/toolchain/linux_pnacl/bin/pnacl-clang  -I../nacl_sdk/pepper_canary/include -O3 test_simd.c -L../nacl_sdk/pepper_canary/lib/pnacl/Release -lppapi -o test_simd.pexe

followed by the usual:

../nacl_sdk/pepper_canary/toolchain/linux_pnacl/bin/pnacl-finalize test_simd.pexe

I get the following error on the Javascript console: 

NativeClient: PnaclCoordinator: PNaCl Translator Error: Error reading bitcode file: Invalid argument 

Without the -O3, it works like a charm. Without -O3 I would probably lose the extra performance from vectorizing the code elsewhere, which is why I think this needs to be looked at.

Ideas, anyone?

Soeren

On Wednesday, May 7, 2014 2:17:20 AM UTC+10, Derek Schuff wrote:
Hi Soeren,
The PNaCl translator component is updated separately from the rest of Chrome. The dev and canary channels should now be getting the updates automatically. Check your PNaCl translator version in chrome://nacl and look for a version >= 0.1.0.13104. If you don't have that you can try checking chrome://components and click the "check for update" button under pnacl (then you'll just have to wait a bit: it's not a fantastic UI) but you should get the update fairly quickly then.

On Sun May 04 2014 at 6:18:03 PM, Soeren Balko <soe...@zfaas.com> wrote:
I tried with the simple-most program conceivable - please see here for the source: http://pastebin.com/D11C6GdA

The versions of Chrome I tried are Chromium 36.0.1975.0 (268124) and Chrome 36.0.1974.2 canary (all on OSX). Will try on Ubuntu later today to confirm. I reckon this is a simple issue of the feature not having progressed to the canary builds.

Soeren


On Monday, May 5, 2014 3:45:15 AM UTC+10, JF Bastien wrote:

> If it's merely syntactic sugar, it may not be worthwhile in the first place. 
>
> On a different note, I could compile some code with added vector instructions, but was unable to run it in Chrome Canary (Mac, 36.0.1973.2 canary). I get an error on the Javascript console stating "NativeClient: PnaclCoordinator: PNaCl Translator Error: Unknown type code in type table: 12" which I assume means that this version of Chrome doesn't have vector support as yet...

I assume so, but to be sure do you have a way we can reproduce this?

Just to clarify, 12 is indeed naclbitc::TYPE_CODE_VECTOR so it does look like the error you'd get.

--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsub...@googlegroups.com.

Nicholas Fullagar

unread,
May 7, 2014, 12:11:14 PM5/7/14
to native-cli...@googlegroups.com, Derek Schuff
Hi Soren, we've been looking at what might be the same issue.  Here's a small repro - if it is possible to boil yours down, it would be interesting to also have.  thx -nicholas

#define NOINLINE inline __attribute__((noinline))

// 128 bit vector types
typedef int32_t i32x4_t __attribute__ ((vector_size (16)));
typedef float f32x4_t __attribute__ ((vector_size (16)));

// Vector helper functions
NOINLINE f32x4_t min(f32x4_t a, f32x4_t b) {
  i32x4_t m = a < b;
  return (f32x4_t)(((i32x4_t)a & m) | ((i32x4_t)b & ~m));
}

NOINLINE f32x4_t max(f32x4_t a, f32x4_t b) {
  i32x4_t m = a > b;
  return (f32x4_t)(((i32x4_t)a & m) | ((i32x4_t)b & ~m));
}

int example_main_FAILS(int argc, char* argv[]) {
  f32x4_t a = {1.0f, 2.0f, 3.0f, 4.0f};
  f32x4_t b = {-1.0f, 3.0f, -4.0f, 6.0f};
  f32x4_t c = min(a, b);
  c = max(a, c);
  printf("c: %f %f %f %f\n", c[0], c[1], c[2], c[3]);
  return 0;
}

int example_main_PASSES(int argc, char* argv[]) {
  f32x4_t a = {1.0f, 2.0f, 3.0f, 4.0f};
  f32x4_t b = {-1.0f, 3.0f, -4.0f, 6.0f};
  f32x4_t c = min(a, b);
  f32x4_t d = max(a, b);
  printf("c: %f %f %f %f\n", c[0], c[1], c[2], c[3]);
  printf("d: %f %f %f %f\n", d[0], d[1], d[2], d[3]);
  return 0;
}


To unsubscribe from this group and stop receiving emails from it, send an email to native-client-di...@googlegroups.com.

JF Bastien

unread,
May 7, 2014, 2:23:19 PM5/7/14
to native-cli...@googlegroups.com, Derek Schuff, Nicholas Fullagar, David Sehr
Hi Soeren,

We have a fix for the issue, an oversight on my part:

It'll have to make its way through to the SDK and Chrome, but we'll try to speed things through today.

Thanks for bearing with us and reporting the bug!

JF

Soeren Balko

unread,
May 7, 2014, 11:19:37 PM5/7/14
to native-cli...@googlegroups.com, Derek Schuff
Hi Derek,

Did you check out the link I posted. Basically, it comes down to this function:

  1. typedef uint16_t vec_uint16_t __attribute__((vector_size(16)));
  2.  
  3. #define LENGTH 64
  4.  
  5. static int testfunc(uint8_t *a, uint8_t *b) {                            
  6.     vec_uint16_t sumv = {0, 0, 0, 0, 0, 0, 0, 0};                              
  7.     vec_uint16_t v1, v2, gt;                                            
  8.                                                      
  9.                  
  10.     for (int x=0; x<LENGTH; x+=8) {                    
  11.         v1 = (vec_uint16_t){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]};
  12.         v2 = (vec_uint16_t){b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7]};
  13.  
  14.         gt = v1>v2;                                                        
  15.         sumv += (gt & (v1-v2)) + (~gt & (v2-v1));                          
  16.     }                                                                                                                    
  17.     return sumv[0]+sumv[1]+sumv[2]+sumv[3]+                                    
  18.            sumv[4]+sumv[5]+sumv[6]+sumv[7];                                  
  19. }

Soeren
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsubscri...@googlegroups.com.
To post to this group, send email to native-cli...@googlegroups.com.

Soeren Balko

unread,
May 7, 2014, 11:21:09 PM5/7/14
to native-cli...@googlegroups.com, Derek Schuff, Nicholas Fullagar, David Sehr
You guys are amazing! Will watch our for the fix to arrive in Chrome and the native client toolchain.

Soeren
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsubscri...@googlegroups.com.
To post to this group, send email to native-cli...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-discuss+unsub...@googlegroups.com.
To post to this group, send email to native-cli...@googlegroups.com.
Visit this group at http://groups.google.com/group/native-client-discuss.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages