master discovery vs connectivity state (was mm revival, next steps)

26 views
Skip to first unread message

Jeff Rousseau <jrousseau@aptima.com>

unread,
May 2, 2012, 3:47:44 PM5/2/12
to ros-s...@googlegroups.com
After running through some zeroconf_implementation examples, I see now that it provides a ROS service for Zeroconf services and a couple ros-topics for following network-service add/removes. Due to the nomenclature clash I’ll differentiate the two ‘services’ by using the terms network-service and ros-service.

> From: ros-s...@googlegroups.com [mailto:ros-s...@googlegroups.com] On Behalf Of Daniel Stonier
> Sent: Wednesday, May 02, 2012 9:43 AM
> To: ros-s...@googlegroups.com
> Subject: Re: [ros-sig-mm] Re: [Ros-sig-mm] mm revival, next steps
...
> The app manager is responsible for what is publicly exposed by a robot, not the zeroconf node. In addition, it also
> sets up callbacks/ros pubsubs to keep track of zeroconf services as they come online or offline (or out of wireless
> range).

Daniel, in your experience have you reliably gotten remove_service (network-service) callbacks to trigger when machines providing network-services go out of network suddenly? In the past I’ve found this hasn’t always been the case with certain SD implementations I’ve played with—which results in some odd situations where you think a service still exists after a network disconnect...

Which leads me to bring up the confusion between network-service availability (where nodes with topics/ros-services are either running/publishing or not) and machine connection-state (online/offline); can we bunch these two together as if they’re the same thing and get away with it? My guess is: no.

Currently the master backend relies on XMLRPC (TCP+HTTP) calls and doesn’t actually know anything about network disconnects as it assumes a persistent network connection where client nodes call ‘unregister’ methods over RPC. The current master sync node in multimaster_experimental similarly relies on XMLRPC. To support a common service and connectivity abstraction we’d have to push the SD callbacks into the core of the master, making an SD protocol implementation like ZeroConf a hard requirement. I’m not a 100% convinced this is a good idea.

In the relatively distant past Ken has suggested re-implementing the master API using something like Redis that has built-in support for replication and TTL (I assume for master sync & registrations?). However, something like this might be approaching “ROS 2.0” territory…

PS Could you elaborate on what DARC is? Google seems to return some irrelevant results.

From: ros-s...@googlegroups.com [mailto:ros-s...@googlegroups.com] On Behalf Of Daniel Stonier
Sent: Wednesday, May 02, 2012 9:43 AM
To: ros-s...@googlegroups.com
Subject: Re: [ros-sig-mm] Re: [Ros-sig-mm] mm revival, next steps


On 2 May 2012 20:01, Jeff Rousseau <jrou...@cs.uml.edu> wrote:
Ah, simple misunderstanding.  I was browsing the source in the repo instead of reading the tutorials. Sorry for the confusion.  I had found the ZeroConfNode source and thought it was an example of how to write a zeroconf-enabled node.  I'll run through the tutorials

 
No worries.

Daniel.
 
On Wed, May 2, 2012 at 3:55 AM, Daniel Stonier <d.st...@gmail.com> wrote:

On 1 May 2012 23:38, Jeff Rousseau <jrou...@aptima.com> <jrou...@aptima.com> wrote:
The avahi/dbus node I have is rather primitive—the source is currently unpublished (I’d have to go through contracts/legal to publish it as BSD).  All it does though is read two ros params for “name” and “port” and add a service a la (python):
 
in = dbus.Interface(bus.get_object(avahi.DBUS_NAME,
                                   server.EntryGroupNew()),
                    avahi.DBUS_INTERFACE_ENTRY_GROUP)
 
in.AddService(avahi.IF_UNSPEC, avahi.PROTO_UNSPEC,dbus.UInt32(0),
                     self.robot_name, "_ros._tcp", "", "",
                     dbus.UInt16(self.port), "")
 
My aim was for “public” masters to discover each other, using the inherent ability to query a list of topics/services using the XMLRPC API that already exists.
 
So it looks like our approaches differ in that mine aims for discovery at the master level, while the zeroconf_implementation targets the node level (judging by a quick look in ZeroconfNode.cpp).  I’d like to hear arguments for putting the node developer in charge of how their node should be exposed on the network—it feels like a possible maintenance nightmare if I had to (re)configure every node and what topics/services it exposed on my network.  It seems like a simple master-level white list would suffice.  

Jeff, are you on the same page I am? There is no ZeroconfNode.cpp and there is nothing in the zeroconf_avahi package targeting the node level or even related to exposing a node's pubsubs and services. As you say, that is a configuration nightmare unless it's something built into the system (something I think DARC is aiming at).

There is the ZeroconfNode class, but the only reason that has been labelled with a 'Node' reference is because it runs as a standalone node. It is only responsible for publishing and discovering zeroconf services and doesn't know anything else about the rest of the system it is running with. The api I thought should be fairly clear on that.

We use it to publish a ros master, exactly as you do. We also use it to discover a 'building manager' master that can be used with the multimaster app manager. And lastly, we also use it to advertise the multimaster app manager so that the building master can invite it to a multi-robot system. The app manager is responsible for what is publicly exposed by a robot, not the zeroconf node. In addition, it also sets up callbacks/ros pubsubs to keep track of zeroconf services as they come online or offline (or out of wireless range).

We're talking about the same thing, zeroconf_avahi just provides a few extra zeroconf related bells and whistles.
 
The tutorials should provide you with a clear idea of what it is doing.

Daniel.

 
From: ros-s...@googlegroups.com [mailto:ros-s...@googlegroups.com] On Behalf Of Daniel Stonier
Sent: Sunday, April 29, 2012 1:45 AM
To: ros-s...@googlegroups.com
Subject: Re: [ros-sig-mm] Re: [Ros-sig-mm] mm revival, next steps
 
 
On 24 April 2012 20:55, Jeff Rousseau <jrou...@aptima.com> <jrou...@aptima.com> wrote:
I currently have a simple node that uses Avahi to broadcast a named master with a specified port (so you can run many discoverable masters on one machine).  However, I may soon be forced to move to Bonjour due to a customer requirement.  As long as our service/proto strings match, it doesn’t really matter what mDNS solution we use as long as it’s largely Zeroconf compatible.
 
I agree. I've been using _ros-master._tcp, _ros-master._tcp and _app-manager._tcp (advertising a variant of willow's multimaster style app manager instead of the ros master). I do think zeroconf implementations would benefit from a consistent ros api (pub/subs) though. That way one could move a suite of programs watching/reacting to a zeroconf node's list of services from linux (avahi) to apple (avahi) to java-based (jmdns) without a code change. 
 
 
For our service/protocol string we just use “_ros._tcp” (obviously for TCP transport)
 
What kind of “interface” did you have in mind? A ROS service & latched topic for master availability events perhaps?
 
Yes, very similar. I wanted to be able to notify when services appeared and disappeared rather than just resolving on the expectation that it was going to be there 100%. Both avahi and jmdns provide hooks for that (though that level of jmdns is still rather experimental). 
 
End result, I have both a c++ library and a node interface for doing that kind of thing (zeroconf_implementations) and the jmdns interface just has a jar file which wraps the same functionality rather more simply (zeroconf_android). These are still very early implementations and I'm happy to iterate on what's there.
 
Do you have some code up and about?
 
Daniel.
 
 
From: ros-s...@googlegroups.com [mailto:ros-s...@googlegroups.com] On Behalf Of Daniel Stonier
Sent: Tuesday, April 24, 2012 3:39 AM
To: ros-s...@googlegroups.com
Subject: [ros-sig-mm] Re: [Ros-sig-mm] mm revival, next steps
 
We've been using zeroconf mdns/dns-sd with avahi (linux) and jmdns (android) for a while now also. Having reasonably consistent 'ros' interfaces for configuring and setting these alongside a Bonjour implementation would be great. 
 
What kind of custom mDNS solution did you create?
 
 
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.



 
 

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.




--
Phone : +82-10-5400-3296 (010-5400-3296)
Home: http://snorriheim.dnsdojo.com/
Yujin Robot: http://www.yujinrobot.com/
Embedded Ros : http://www.ros.org/wiki/eros
Embedded Control Libraries: http://snorriheim.dnsdojo.com/redmine/wiki/ecl





--
Phone : +82-10-5400-3296 (010-5400-3296)
Home: http://snorriheim.dnsdojo.com/
Yujin Robot: http://www.yujinrobot.com/
Embedded Ros : http://www.ros.org/wiki/eros
Embedded Control Libraries: http://snorriheim.dnsdojo.com/redmine/wiki/ecl

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Jeff Rousseau <jrousseau@aptima.com>

unread,
May 9, 2012, 9:45:31 AM5/9/12
to ros-s...@googlegroups.com
Let me know if I'm way off base (again). I'm trying to figure out where discovery fits into a MM solution--and whether or not it’s a required component or not.

Perhaps network-service removes could trigger an XMLRPC call to the foreign masters to unregister, but that only works when nodes go down and not the network itself.

> -----Original Message-----
> From: ros-s...@googlegroups.com [mailto:ros-sig-
> m...@googlegroups.com] On Behalf Of Jeff Rousseau
> <jrou...@aptima.com>
> Sent: Wednesday, May 02, 2012 3:48 PM
> To: ros-s...@googlegroups.com
> Subject: [ros-sig-mm] master discovery vs connectivity state (was mm
> revival, next steps)
>
> After running through some zeroconf_implementation examples, I see now
> that it provides a ROS service for Zeroconf services and a couple ros-topics
> for following network-service add/removes. Due to the nomenclature clash
> I’ll differentiate the two ‘services’ by using the terms network-service and
> ros-service.
>

Jeff Rousseau <jrousseau@aptima.com>

unread,
May 9, 2012, 9:57:33 AM5/9/12
to ros-s...@googlegroups.com
Errata inline:

> -----Original Message-----
> From: ros-s...@googlegroups.com [mailto:ros-sig-
> m...@googlegroups.com] On Behalf Of Jeff Rousseau
> <jrou...@aptima.com>
> Sent: Wednesday, May 09, 2012 9:46 AM
> To: ros-s...@googlegroups.com
> Subject: [ros-sig-mm] RE: master discovery vs connectivity state (was mm
> revival, next steps)
>
> Let me know if I'm way off base (again). I'm trying to figure out where
> discovery fits into a MM solution--and whether or not it’s a required
> component or not.
>
> Perhaps network-service removes could trigger an XMLRPC call to the foreign
> masters to unregister, but that only works when nodes go down and not the
> network itself.

I meant _foreign_ network-service and _local_ master in the above statement, and this may or may not work for network disconnects depending on SD implementation (I have to do further tests with Avahi)

Armstrong-Crews, Nicholas - 1002 - MITLL

unread,
May 9, 2012, 10:24:29 AM5/9/12
to ros-s...@googlegroups.com
Hi all, sorry for long radio silence.

For our use case (two or three robots running around with flaky wifi and sometimes out of wifi range), the features of service discovery, mDNS, and "automatic, simple, no-configuration" are not required. Yes, they would be nice, but I'm perfectly happy to assign each robot a static IP, manually specify routes, and trust my IP layer to tell me when there's a link from robot A to robot B. But I'd rather not have to write that logic into every single node.

Not being a network programming guru, I've had a hard time following these proceedings... but would you say that this simple use case is satisfied by any of the existing or proposed solutions? It seems like "rosmaster uses TCP" breaks everything.

Cheers,
-Nick

Daniel Stonier

unread,
May 10, 2012, 5:24:43 AM5/10/12
to ros-s...@googlegroups.com
On 3 May 2012 04:47, Jeff Rousseau <jrou...@aptima.com> <jrou...@aptima.com> wrote:
After running through some zeroconf_implementation examples, I see now that it provides a ROS service for Zeroconf services and a couple ros-topics for following network-service add/removes.  Due to the nomenclature clash I’ll differentiate the two ‘services’ by using the terms network-service and ros-service.

Also to clarify nomenclature, what is meant by discovery - there are two rather separate functionalities.
  1. detection and resolution of ip's and port numbers.
  2. awareness of network-services coming up and down (robots in and out of wireless, turning on/off)
The zeroconf_implementations is experimenting with both - 1) can be done via a polling/listing mechanism and 2) via callbacks. 

1) Would be really nice to have. Just like printers - you can use zeroconf to find them, fallback to hostnames if that fails, then ip's if that fails too. Just relying on ip's fails when you have dhcp and just relying on hostnames or ip's is ok in the research lab or one-off systems, but looking at setting up 50+ installations in classrooms and training teachers about ips and port numbers...scary ;)

2) Comments below.


> From: ros-s...@googlegroups.com [mailto:ros-s...@googlegroups.com] On Behalf Of Daniel Stonier
> Sent: Wednesday, May 02, 2012 9:43 AM
> To: ros-s...@googlegroups.com
> Subject: Re: [ros-sig-mm] Re: [Ros-sig-mm] mm revival, next steps
...
> The app manager is responsible for what is publicly exposed by a robot, not the zeroconf node. In addition, it also
> sets up callbacks/ros pubsubs to keep track of zeroconf services as they come online or offline (or out of wireless
> range).

Daniel, in your experience have you reliably gotten remove_service (network-service) callbacks to trigger when machines providing network-services go out of network suddenly?  In the past I’ve found this hasn’t always been the case with certain SD implementations I’ve played with—which results in some odd situations where you think a service still exists after a network disconnect...

This is a good question. We also did as well. However, many of the problems we've discovered so far have been due to our poor usage of it rather than zeroconf itself. That being the case, I'd like to rigorously test it to at least determine why it is unreliable rather than assuming it is a zeroconf problem, not because of our usage.

If we could help improve the tools available for this - it'd save reinventing the wheel and I can't see any reason why it wouldn't be as reliable as any other network monitoring solution, but...needs exploring.

Which leads me to bring up the confusion between network-service availability (where nodes with topics/ros-services are either running/publishing or not) and machine connection-state (online/offline); can we bunch these two together as if they’re the same thing and get away with it?  My guess is: no.

Is machine connection-state really of interest? I'm only really worried if the advertised network-service is available and resolvable or not. 

Currently the master backend relies on XMLRPC (TCP+HTTP) calls and doesn’t actually know anything about network disconnects as it assumes a persistent network connection where client nodes call ‘unregister’ methods over RPC.  The current master sync node in multimaster_experimental similarly relies on XMLRPC.  To support a common service and connectivity abstraction we’d have to push the SD callbacks into the core of the master, making an SD protocol implementation like ZeroConf a hard requirement.  I’m not a 100% convinced this is a good idea.

I concur re putting this dependency in the master. 

However, does the master have to know about connectionst? I'd rather see everything in a separate node. Let the master do it's job, and move the job of network-service monitoring to another node on the same machine (via zeroconf or otherwise). A robot in this situation can go out of range and come back and unregister itself. That seems to work for us, but this does somewhat ignore a problem for MM that the system has persistence (xml-rpc) built into the system (master's xml-rpc).  

In the relatively distant past Ken has suggested re-implementing the master API using something like Redis that has built-in support for replication and TTL (I assume for master sync & registrations?).

I need to read up on this so I can talk sensibly about it - I'll try and do some while at ICRA. Feel like Ken is flying at 10,000 and I'm still down near sea level sometimes.
 
 However, something like this might be approaching “ROS 2.0” territory…

PS Could you elaborate on what DARC is? Google seems to return some irrelevant results.

One of the guys that was working with Troy is working on a masterless system that might potentially be very suited to MM (github - darc).  It looks very long term though so I don't know if this is tied in with ROS 2.0 territory or what that entails. I'll get some coding time for mm-related frameworks shortly, so it would be good to plan with 2.0 in mind so we don't waste effort.

Is there a chance we can meet up at Roscon and perhaps pull one of the willow guys to enlighten us?

Regards,

Daniel Stonier

unread,
May 10, 2012, 5:48:32 AM5/10/12
to ros-s...@googlegroups.com
On 9 May 2012 23:24, Armstrong-Crews, Nicholas - 1002 - MITLL <nickarmst...@ll.mit.edu> wrote:
Hi all, sorry for long radio silence.

For our use case (two or three robots running around with flaky wifi and sometimes out of wifi range), the features of service discovery, mDNS, and "automatic, simple, no-configuration" are not required. Yes, they would be nice, but I'm perfectly happy to assign each robot a static IP, manually specify routes, and trust my IP layer to tell me when there's a link from robot A to robot B. But I'd rather not have to write that logic into every single node.


We're not talking about configuring that info in every node - absolutely not practical. The zeroconf at the moment, is sitting to the side to be used or not.
 
Not being a network programming guru, I've had a hard time following these proceedings... but would you say that this simple use case is satisfied by any of the existing or proposed solutions? It seems like "rosmaster uses TCP" breaks everything.

Certainly, that's probably the biggest issue.

Piyush Khandelwal

unread,
May 10, 2012, 12:10:44 PM5/10/12
to ROS Multi-Master Special Interest Group
Hi all,

I would like to chime in that my use case is fairly similar to Nick,
perhaps with the following changes:
1) We are trying to setup a ROS enabled system in our department. This
system will use any publicly available department machine, and as many
robots that we can assimilate into the system (we'll start at 2-3 and
build our way up).
2) We need a solution to be scalable - possibly 20-50 machines (even
more if possible). All the machines are connected over ethernet, but
the robots will run over flaky wifi.
3) We would like the solution to be portable and as close to zero
configuration as possible, as we'll try to get users unfamiliar with
the inner workings of ROS to write nodes for their own research.
Essentially a user will power on a Robot or start an application on a
workstation and it is ready to go.

I suspect my requirements probably are already part of what this SIG
is trying to accompolish. I just tested the multimaster master_sync
script. Although there are some issues with manually specifying the
topics (and probably a number of other problems that I have not faced
yet), it works largely as expected. Since I don't completely
understand the discussions in this thread, I'll start a new thread in
hopes of getting a clear explanation of the problems with the current
multimaster solutions in terms of reliability and scalability.

Thanks!
Piyush

On May 10, 4:48 am, Daniel Stonier <d.ston...@gmail.com> wrote:
> On 9 May 2012 23:24, Armstrong-Crews, Nicholas - 1002 - MITLL <
>
> > > On 2 May 2012 20:01, Jeff Rousseau <jrous...@cs.uml.edu> wrote:
> > > Ah, simple misunderstanding.  I was browsing the source in the repo
> > instead
> > > of reading the tutorials. Sorry for the confusion.  I had found the
> > > ZeroConfNode source and thought it was an example of how to write a
> > > zeroconf-enabled node.  I'll run through the tutorials
>
> > > No worries.
>
> > > Daniel.
>
> > > On Wed, May 2, 2012 at 3:55 AM, Daniel Stonier <d.ston...@gmail.com>
> > > wrote:
>
> > > On 1 May 2012 23:38, Jeff Rousseau <jrouss...@aptima.com>
> > > On 24 April 2012 20:55, Jeff Rousseau <jrouss...@aptima.com>
> > > <jrouss...@aptima.com> wrote:
> > > I currently have a simple node that uses Avahi to broadcast a named
> > master
> > > with a specified port (so you can run many discoverable masters on one
> > > machine).  However, I may soon be forced to move to Bonjour due to a
> > > customer requirement.  As long as our service/proto strings match, it
> > doesn’t
> > > really matter what mDNS solution we use as long as it’s largely Zeroconf
> > > compatible.
>
> > > I agree. I've been using _ros-master._tcp, _ros-master._tcp and _app-
> > > manager._tcp (advertising a variant of willow's multimaster style app
> > > manager instead of the ros master). I do think zeroconf implementations
> > > would benefit from a consistent ros api (pub/subs) though. That way one
> > > could move a suite of programs watching/reacting to a zeroconf node's
> > list of
> > > services from linux (avahi) to apple (avahi) to
>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages