How to do external memory synchronization across processes

梁蕴锋

unread,

Nov 3, 2022, 5:36:58 AM11/3/22

to angleproject

Hi, I have a question about Vulkan's external memory usage, which has been bothering me for a while. I have done some research on the internet but still, be confused about some of the concepts and details.

I know it's not particularly related to angle, but as an Angle follower I know how wonderful the angle is, and I'm sure you guys could enlighten me on this one. Here is the problem simplified:

1 There are 3 processes, A, B, and S, all of which have their VkInstance.
2 Then, in process S, I allocate a VkDeviceMemory Sm, initialize its content, and export a win32 handle.
3 Then, I pass the handle to A. In process A, a VkDeviceMemory Am is also created, by importing the handle, now Am should be an alias of Sm.
4 Then, I create a VkImage Ai in A, and bind Am to Ai.
5 Then, I repeat steps 3 and 4 for process B, creating memory object Bm and VkImage Bi
6 Then, in A and B, I use the images as shader input samplers to draw something, respectively.

When I test the steps on my GTX1650 card with the latest driver, it performs quite well, without any visible defects. Here is my question:

1. Considering that after the memory is initialized, it remains READ-ONLY. In such a case, could I skip all the synchronization?
In my testing, no synchronization seems to be ok. I wonder if this is right on the Vulkan API level, or it just happens to be ok
because of the driver's compatibility.

2. If synchronization must be performed, how to do it exactly?
should I wait for a semaphore before any draw call using the resource and signal it after it in every queue submission in both A and B?
By doing that, all resource reads will be serialized, right? performance wise, is it the best way to go?

3. If A and B do both random reads and writes to the images, then how to do the synchronization?
Is there a standard way or any best practice?

I have found some related articles talking about inter-operation between Vulkan and other APIs, like OpenGL, but none of them
clear me up. Any help will be much appreciated, thanks!

Yours

Liang.

Shahbaz Youssefi

unread,

Nov 3, 2022, 10:37:52 AM11/3/22

to angleproject

Hi,

In your example, you'd need synchronization from S to A, and also from S to B. This is to make sure that:

- S finishes initialization before A and B start using the image

- The relevant caches are flushed / invalidated for the data written by S to be available / visible to A and B

It _works_ for you now probably because:

- Timing happens to make S finish before A and B on the GPU

- On your particular GPU, your method of writing to and reading from the image happen to be on the same cache path

And to answer your questions:

1 and 2. Not _all_, as I mentioned above you'd need synchronization for S->A and S->B. You don't need to synchronize A and B, because as you mentioned they are both read-only accesses.

2 and 3. Since this is happening between instances, I believe you'd need to export your semaphore from S and import it into A and B. That's similarly done to how you exported the memory itself. Watch out for binary semaphores, I don't believe you can have both processes wait on the same semaphore (so you'd need two semaphores).

Cheers,

梁蕴锋

unread,

Nov 4, 2022, 9:11:50 PM11/4/22

to angleproject

Thank you very much for the reply, certainly very helpful!

Message has been deleted

梁蕴锋

unread,

Nov 11, 2022, 11:24:35 PM11/11/22

to angleproject

Hi, after the previous discussion, I did some experiments, trying to understand more about the synchronization of Vulkan. Now I have some following questions:

What I did is this: still 3 processes, S, A, and B
1. In S, I allocate a VkDeviceMemory and export the win32 handle to A and B.
2. Then in A, I create a VkImage, import the memory handle, and bind it to the image.
3. Then in A, I initialize the VkImage by a compute shader or a copy operation. The remote memory should be initialized after this.
4. In A, I use the image to render something in every frame.

For now, everything works fine, the process A is rendered correctly, at 30FPS.
Now, After a significant amount of time, like 1min, I start B and do the following:
5. In B, create a VkImage, import the memory handle, and bind it to the image.
6. In B, I use the image to render something in every frame.

In several AMD cards I tested, at this point, the A's rendering stays correct all the time, while the B's rendering is not right, here are the snapshots:

Note the blue background images, which are sharing external memory. Here are my questions:

1. In the steps above, I did not use a semaphore to sync between A and B, because I had waited for a long time so B will not be reading until A's writing is done,

so no semaphore is needed in this scenario, am I right?

2. It seems the process B is using the memory A writes, but with different interpretation. The image in A seems to be cut into blocks and be re-arranged in B,

why is that? I don't think this is a cache problem ( in which case the content should be blank or undefined, right?)

3. Is this because of no semaphore synchronization between A and B, or does it related to the image layout transition? or queue ownership transfer?

What can I do to make it right.

I know that many operations may be missing here, but I just want to understand the exact cause of the issue before making it right.

I found the vulkan spec about inter-instance synchronization is a bit vague and hard to follow.

Also, I tested it in several Nvidia cards, none have the issue. It only appears on AMD pc cards. so is there any chance that this is a driver issue? (although unlikely)

Thanks!

在2022年11月3日星期四 UTC+8 22:37:52<syou...@chromium.org> 写道：

Shahbaz Youssefi

unread,

Nov 15, 2022, 10:13:44 AM11/15/22

to angleproject

Hi,

First, I'm assuming A stops using the image once B starts, right?

That said, short answers to your questions are:

On Friday, November 11, 2022 at 11:24:35 PM UTC-5 lyf....@gmail.com wrote:

Note the blue background images, which are sharing external memory. Here are my questions:

1. In the steps above, I did not use a semaphore to sync between A and B, because I had waited for a long time so B will not be reading until A's writing is done,
so no semaphore is needed in this scenario, am I right?

No you'd still need a semaphore. A may have finished, but caches could still be dirty.

2. It seems the process B is using the memory A writes, but with different interpretation. The image in A seems to be cut into blocks and be re-arranged in B,
why is that? I don't think this is a cache problem ( in which case the content should be blank or undefined, right?)

It could be a cache problem, but more likely an image layout problem.

3. Is this because of no semaphore synchronization between A and B, or does it related to the image layout transition? or queue ownership transfer?
What can I do to make it right.

I don't know the details of your app, but if B uses the image with the wrong layout, yes you'd get a mess. You do need to do the queue family ownership transfer (QFOT) and transfer the layout info from A to B, or you can skip that but then B needs to transition the image from UNDEFINED (and discard A's rendering).

I know that many operations may be missing here, but I just want to understand the exact cause of the issue before making it right.
I found the vulkan spec about inter-instance synchronization is a bit vague and hard to follow.

Also, I tested it in several Nvidia cards, none have the issue. It only appears on AMD pc cards. so is there any chance that this is a driver issue? (although unlikely)

Nvidia might very well be using the same memory layout for multiple Vulkan layouts, while AMD could be using different ones.

Thanks!

You're welcome. As your questions are getting more detailed, please ask further questions in stackoverflow.com.

Reply all

Reply to author

Forward