barrier() is only used to wait for all invocations in a single work group. There is no sync between work group.
The first barrier is used to make sure a tile is loaded. All data has been uploaded to shared memory. So we need to synchronize all invocations in a single work group.
The second barrier is used to make sure that above tile data has been correctly calculated into acc. If this barrier is missed, above tile data may be modified before you save it to acc.
In D3D side, the translation is like below:
barrier -> GroupMemoryBarrierWithGroupSync
memoryBarrierShared -> GroupMemoryBarrier
memoryBarrierAtomicCounter -> DeviceMemoryBarrier
memoryBarrierBuffer -> DeviceMemoryBarrier
memoryBarrierImage -> DeviceMemoryBarrier
memoryBarrier -> AllMemoryBarrier
Details can be found here.
Regards,
Jiajia
--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
webgl-dev-lis...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
The first barrier is used to make sure a tile is loaded. All data has been uploaded to shared memory. So we need to synchronize all invocations in a single work group.
The second barrier is used to make sure that above tile data has been correctly calculated into acc. If this barrier is missed, above tile data may be modified before you save it to acc.
In D3D side, the translation is like below:
barrier -> GroupMemoryBarrierWithGroupSync
memoryBarrierShared -> GroupMemoryBarrier
memoryBarrierAtomicCounter -> DeviceMemoryBarrier
memoryBarrierBuffer -> DeviceMemoryBarrier
memoryBarrierImage -> DeviceMemoryBarrier
memoryBarrier -> AllMemoryBarrier
Details can be found here.
as I understand "barrier()" waits only for all shaders in a WG (not writings)
barrier(CLK_LOCAL_MEM_FENCE); - the only operator in OpenCL
barrier(CLK_LOCAL_MEM_FENCE); - in OpenCL looks like memoryBarrierShared() not barrier()
barrier(); - waits only for all shaders in a WG but not for accurate writings to acc?
In OpenGL, it is not permitted to initialize shared memory when declaring. And that's not permitted also in CUDA.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#assignment-operator
Would you please take a look whether this auto inserted zero initialization is taking bad effect?
Can you quote the sentence where it is not permitted to initialize shared memory in OpenGL? I only find that 'Variables declared as shared may not have initializers and their contents are undefined at the beginning of shader execution.'
In WebGL, undefined behavior is not allowed due to the security. We should clearly point it out what will happen if the native is undefined. Here is some discussions about webgl2-compute security and shared memory initialization in ANGLE.
Yes, I have confirmed it. But I haven't got a good solution to fix it. Currently, I just simply disable the initialization of shared memory. Feel free to comment under the bug.
1. shall I remove memoryBarrierShared(); call in WebGL2-compute (ES 3.1 based) ?
2. how does it work in D3D11 ?
In my investigation, I also found this works well on D3D before your fix(simply disable the initialization of shared memory). So at least [force adding initialization when declaring] + [force adding synchronization at the top of main func] may work well. But synchronization is not good for performance as we know...