How to use shared-memory inter-process

969 views
Skip to first unread message

Yuya MARUYAMA

unread,
Dec 13, 2015, 10:55:27 PM12/13/15
to ROS SIG NG ROS
I am researching ROS2 (alpha2) and suffering from executing shared-memory inter-process on Connext and OpenSplice with using ros2/exaples (talker and listener).
Seen the performance of examples, inter-process executes by sockets. To use shared memory, I would like to know if I need extra configuration in DDS or ROS2. 
Could you tell me how to use shared-memory inter-process?

Regards,
Maruyama

William Woodall

unread,
Dec 13, 2015, 11:43:01 PM12/13/15
to ROS SIG NG ROS
Inter-process shared memory is a function of the underlying middleware. If you're using OpenSplice, then I'd recommend having a look at their deployment guide: http://www.prismtech.com/download-documents/1331

Connext is configured to use shared memory by default, and on my mac I often run into exhausted shared memory when running tests in parallel, for which there is a work around: https://community.rti.com/kb/osx510

However, we do enable discovery via loop back in Connext so that we can do Connext/OpenSplice compatibility tests: https://github.com/ros2/rmw_connext/blob/45596c37d7aca76b1806751b96863b0378f55af5/rmw_connext_shared_cpp/src/shared_functions.cpp#L246-L260

I believe this doesn't prevent Connext to Connext communication from occurring over shared memory, but I haven't verified that myself. If you're only using Connext then you could try changing that option and recompiling our code. If you see a difference, please open an issue against rmw_connext: https://github.com/ros2/rmw_connext/issues

If you're using eProsima's FastRTPS, then you may not get shared memory transport on the local host, but I'm not sure about that (I don't actually know if they support it).

Also, we're working on giving a good out-of-the-box experience with each of the supported vendors, and that includes good performance, but we haven't quite gotten to a point where that is the case. I'm certain that there are plenty of options and optimizations we need to make yet.

Hope that helps,

--
You received this message because you are subscribed to the Google Groups "ROS SIG NG ROS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-ng-ro...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
William Woodall
ROS Development Team

Yuya MARUYAMA

unread,
Dec 14, 2015, 1:34:47 AM12/14/15
to ROS SIG NG ROS
Thank you for your quick reply.

On Monday, December 14, 2015 at 1:43:01 PM UTC+9, William Woodall wrote:
Inter-process shared memory is a function of the underlying middleware. If you're using OpenSplice, then I'd recommend having a look at their deployment guide: http://www.prismtech.com/download-documents/1331

I will read this guide and adjust OpenSplice's configuration. 


Connext is configured to use shared memory by default, and on my mac I often run into exhausted shared memory when running tests in parallel, for which there is a work around: https://community.rti.com/kb/osx510

However, we do enable discovery via loop back in Connext so that we can do Connext/OpenSplice compatibility tests: https://github.com/ros2/rmw_connext/blob/45596c37d7aca76b1806751b96863b0378f55af5/rmw_connext_shared_cpp/src/shared_functions.cpp#L246-L260

I believe this doesn't prevent Connext to Connext communication from occurring over shared memory, but I haven't verified that myself. If you're only using Connext then you could try changing that option and recompiling our code. If you see a difference, please open an issue against rmw_connext: https://github.com/ros2/rmw_connext/issues

 
I executed talker and listener in ros2/example locally with using Connext and ROS2 (alpha2) on Linux (Ubuntu14.04). However, during its transport, RTPS's packets are observed by Wireshark. This transport seems to be done without shared memory. I did not change RTI Connext's configuration after its installation. Is it ROS2's problem? Or, Is it Connext's problem?


Regards,
Maruyama

Morgan Quigley

unread,
Dec 14, 2015, 1:47:16 AM12/14/15
to ros-sig...@googlegroups.com
Were the observed RTPS messages containing data submessages, or were
they just discovery or negotiation submessages? The default
configuration of Connext (or any DDS implementation) will be
periodically broadcasting RTPS messages on the LAN to discover other
participants. Depending on the configuration of the DDS implementation
though, many of them will use shared memory to send their peer-to-peer
data submessages if they discover that both of them are on the same
machine.

It can be confusing to sift through all the RTPS network traffic, but
if you flip through the packets in Wireshark after the initial flurry
of discovery/negotiation messages (i.e., the network traffic a few
seconds after the nodes start talking to each other), you can look at
the ASCII representation pane to see if the data messages actually are
being sent, since you'd see the "Hello, World" strings in there.

Dirk Thomas

unread,
Dec 14, 2015, 1:51:06 AM12/14/15
to ROS SIG NG ROS
Hi Maruyama,

I executed talker and listener in ros2/example locally with using Connext and ROS2 (alpha2) on Linux (Ubuntu14.04). However, during its transport, RTPS's packets are observed by Wireshark. This transport seems to be done without shared memory. I did not change RTI Connext's configuration after its installation. Is it ROS2's problem? Or, Is it Connext's problem?

For Connext we currently force local traffic to be sent over the loopback interface (https://github.com/ros2/rmw_connext/blob/095582313cb30b8b0a3f2394f2d81fe49aa3be83/rmw_connext_shared_cpp/src/shared_functions.cpp#L253-L264). Otherwise a Connext process wouldn't talk to an OpenSplice process at all which we need for cross-vendor testing. Currently there is no option to toggle that behavior but there should clearly be one in the future. Maybe you can comment the property out and try it again?

Cheers,
- Dirk

Yuya MARUYAMA

unread,
Dec 14, 2015, 7:20:50 AM12/14/15
to ROS SIG NG ROS
Morgan and Dirk, I appreciate your help.


On Monday, December 14, 2015 at 3:51:06 PM UTC+9, Dirk Thomas wrote:

For Connext we currently force local traffic to be sent over the loopback interface (https://github.com/ros2/rmw_connext/blob/095582313cb30b8b0a3f2394f2d81fe49aa3be83/rmw_connext_shared_cpp/src/shared_functions.cpp#L253-L264). Otherwise a Connext process wouldn't talk to an OpenSplice process at all which we need for cross-vendor testing. Currently there is no option to toggle that behavior but there should clearly be one in the future. Maybe you can comment the property out and try it again?

 
After I comment the property and build ROS2, I am success in the shared memory inter-process on Connext. I have confirmed that RTPS packets containing messages do not exist.

However, in spite of using shared memory, its transport takes time much longer than ROS1 loopback transport. Does the transport with shared memory still require serialization/deserialization due to non-optimization of ROS2? Or, I wonder if converting DDS messages into ROS message makes non-trivial overhead on the current stage of ROS2. 


Regards,
Maruyama

Geoffrey Biggs

unread,
Dec 14, 2015, 8:30:25 AM12/14/15
to ros-sig...@googlegroups.com
Connext DDS still marshals messages even when using shared memory. I'm told that this is done so that the various tools (logger, observer, etc.) work. To avoid marshalling, you need to use the native ROS2 intra-process transport, which obviously requires putting your nodes in the same process.

Geoff

Dirk Thomas

unread,
Dec 14, 2015, 10:55:29 AM12/14/15
to ROS SIG NG ROS
Hi Maruyama,

However, in spite of using shared memory, its transport takes time much longer than ROS1 loopback transport.

Without knowing how exactly you measure the time it is difficult to provide any feedback. I would certainly not expect a significantly different result from these two approaches. Maybe adding a simple benchmark to the tests would be good to ensure that it performs as expected - but we haven't gotten to that yet.

Also there are numerous DDS and vendor specific configuration options which might affect he performance. 

Cheers,
- Dirk

Yuya MARUYAMA

unread,
Dec 16, 2015, 11:57:28 PM12/16/15
to ROS SIG NG ROS
Thank you for your reply.
Without knowing how exactly you measure the time it is difficult to provide any feedback. I would certainly not expect a significantly different result from these two approaches. Maybe adding a simple benchmark to the tests would be good to ensure that it performs as expected - but we haven't gotten to that yet. 
In Linux (not on VM), small message(such as ~64Kbyte) is not significantly different from ros1; I noted that ros2 is a little slower than ros1, in spite of shared memory. I expected that this transport is faster than ros1 thanks to shared memory.  (I measured time from *->publish(msg) to callback by clock_gettime().) In addition, when I transport large message, its difference between ros1 and ros2 becomes longer and non-trivial.
Also there are numerous DDS and vendor specific configuration options which might affect he performance. 
I need to know more knowledge about DDS configuration. I'll study it.

In addition to the above, I would like to know environments of ros2 development. Is ros2 implemented mainly on Mac or Linux (with/without VM)?


Best regards,
Maruyama

Yuya MARUYAMA

unread,
Dec 17, 2015, 12:09:15 AM12/17/15
to ROS SIG NG ROS
Thank you for your advice.


On Monday, December 14, 2015 at 10:30:25 PM UTC+9, Geoffrey Biggs wrote:

Connext DDS still marshals messages even when using shared memory. I'm told that this is done so that the various tools (logger, observer, etc.) work. To avoid marshalling, you need to use the native ROS2 intra-process transport, which obviously requires putting your nodes in the same process.

Geoff
 I will use intra-process transport for avoiding marshalling.

Best regards,
Maruyama

Dirk Thomas

unread,
Dec 17, 2015, 1:24:38 AM12/17/15
to ROS SIG NG ROS
On Wed, Dec 16, 2015 at 8:57 PM, Yuya MARUYAMA <cir...@gmail.com> wrote:
In addition to the above, I would like to know environments of ros2 development. Is ros2 implemented mainly on Mac or Linux (with/without VM)?

Most of us work directly on an Ubuntu Trusty system right now. William runs OS X by default. Only for Windows we regularly use VMs or separate machines.

But all three platforms are being tested by our CI jobs (on native machines).

Cheers,
- Dirk

William Woodall

unread,
Dec 17, 2015, 1:52:18 AM12/17/15
to ROS SIG NG ROS
On Wed, Dec 16, 2015 at 8:57 PM, Yuya MARUYAMA <cir...@gmail.com> wrote:
Thank you for your reply.
Without knowing how exactly you measure the time it is difficult to provide any feedback. I would certainly not expect a significantly different result from these two approaches. Maybe adding a simple benchmark to the tests would be good to ensure that it performs as expected - but we haven't gotten to that yet. 
In Linux (not on VM), small message(such as ~64Kbyte) is not significantly different from ros1; I noted that ros2 is a little slower than ros1, in spite of shared memory. I expected that this transport is faster than ros1 thanks to shared memory.  (I measured time from *->publish(msg) to callback by clock_gettime().) In addition, when I transport large message, its difference between ros1 and ros2 becomes longer and non-trivial.

In one of the upcoming alpha sprints we'll probably take a closer look at performance versus ROS 1. As for the shared memory transport being slow, there is a very good chance we're doing something unintelligent which causes the discrepancy :).

ROS 1 should actually be very close in performance to a DDS shared memory transport, and I don't expect a significant performance gain (if any) with inter process communications on the same machine. This is because TCP over the local loop back is very efficient.

For inter process communication across the network, one thought off the top of my head is that you can try to increase the UDP packet size if your system and network support it. This post on the RTI community site was helpful when I was prototyping this stuff: https://community.rti.com/comment/233#comment-233

There's a similar setting for OpenSplice in their config.xml, you can see our default one here (already at 64k): https://github.com/ros2/rmw_opensplice/blob/master/opensplice_cmake_module/config/ros_ospl.xml#L30

Increasing the UDP packet size (and maybe the UDP packet buffer size) on your system will improve performance. It will mean DDS needs to split up your message into few packets and that means less overhead in transport and fewer system calls.

I think there are some things we can improve to provide a better "large message" profile, which also performs relatively well for small messages. We've already planned in the idea of different profiles for different use cases, but we haven't taken the time to refine them. If you find any good configurations, I'd be interested to try them.
Also there are numerous DDS and vendor specific configuration options which might affect he performance. 
I need to know more knowledge about DDS configuration. I'll study it.

In addition to the above, I would like to know environments of ros2 development. Is ros2 implemented mainly on Mac or Linux (with/without VM)?

There's a mixture. Most people at OSRF use Linux, but I use OS X. We have CI for Windows, OS X, and Linux:

 


Best regards,
Maruyama

--
You received this message because you are subscribed to the Google Groups "ROS SIG NG ROS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-ng-ro...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yuya MARUYAMA

unread,
Dec 17, 2015, 5:03:12 AM12/17/15
to ROS SIG NG ROS
William and Dirk, I really appreciate your replies.

On Thursday, December 17, 2015 at 3:52:18 PM UTC+9, William Woodall wrote:

In one of the upcoming alpha sprints we'll probably take a closer look at performance versus ROS 1. As for the shared memory transport being slow, there is a very good chance we're doing something unintelligent which causes the discrepancy :).

ROS 1 should actually be very close in performance to a DDS shared memory transport, and I don't expect a significant performance gain (if any) with inter process communications on the same machine. This is because TCP over the local loop back is very efficient.

For inter process communication across the network, one thought off the top of my head is that you can try to increase the UDP packet size if your system and network support it. This post on the RTI community site was helpful when I was prototyping this stuff: https://community.rti.com/comment/233#comment-233

There's a similar setting for OpenSplice in their config.xml, you can see our default one here (already at 64k): https://github.com/ros2/rmw_opensplice/blob/master/opensplice_cmake_module/config/ros_ospl.xml#L30

Increasing the UDP packet size (and maybe the UDP packet buffer size) on your system will improve performance. It will mean DDS needs to split up your message into few packets and that means less overhead in transport and fewer system calls.

I think there are some things we can improve to provide a better "large message" profile, which also performs relatively well for small messages. We've already planned in the idea of different profiles for different use cases, but we haven't taken the time to refine them. If you find any good configurations, I'd be interested to try them.


I 'll learn and tune configurations of DDS (Connext and OpenSplice) for large message. 


Best regards,
Maruyama

Reply all
Reply to author
Forward
0 new messages