Undefined behaviour at startup, shutdown

Mike Purvis

unread,

Feb 5, 2015, 11:06:46 AM2/5/15

to ros-sig-ro...@googlegroups.com, Paul Bovbel

Hi all,

This is a follow-on to my concurrency question from a few months ago. We put some workarounds in place for this on Jackal, which temporarily resolved it, but upon deploying ros_control on Husky, we are now seeing similar behaviour come up on that platform. The symptom is out of control commands sent to the actuators, under certain conditions around robot startup and shutdown. Seeing it now on a second platform which shares almost no code with the original one has focused our debugging attention on ros_control itself, and our particular implementations of RobotHW. Some immediate questions which are not obvious to me from the docs and reading source:

Does the ControllerManager::update() method under any circumstances modify output the joint hardware interface values?
In the ControllerManager::update() method, what are the circumstances under which one is supposed to pass the reset_controllers boolean as true?
How is the transition from no-controller to controller and back handled? Are there circumstances where this transition could lead to undefined behaviour if not managed properly?

In general, is there a good reference implementation for a simple RobotHW class? I worked primarily from the documentation and diff_drive_controller's diffbot unit test, but seeing a vetted, end-to-end example would be helpful in understanding what I might be doing wrong. Is the RobotHW code for either the REEM base or PAL mobile base publicly available somewhere? Ours are here:

https://github.com/jackal/jackal_robot/blob/indigo-devel/jackal_base/src/jackal_base.cpp#L45

https://github.com/husky/husky_base/blob/indigo-devel/src/husky_base.cpp#L39

Neither are a strictly realtime control loop, but neither is actually doing anything closed loop (diff_drive_controller is open-loop), and both are doing USB comms to the hardware anyway.

Are there any specific suggested debugging strategies for this type of issue? Obviously, out of control behaviour at any time is completely unacceptable, so we'd like to not only fix it for our own robots, but also improve the documentation and examples so that other implementors can avoid this potentially dangerous pitfall.

Mike

Adolfo Rodríguez Tsouroukdissian

unread,

Feb 6, 2015, 7:12:46 AM2/6/15

to Mike Purvis, ros-sig-ro...@googlegroups.com, Paul Bovbel

On Thu, Feb 5, 2015 at 5:06 PM, Mike Purvis <mpu...@clearpathrobotics.com> wrote:

Hi all,

This is a follow-on to my concurrency question from a few months ago. We put some workarounds in place for this on Jackal, which temporarily resolved it, but upon deploying ros_control on Husky, we are now seeing similar behaviour come up on that platform. The symptom is out of control commands sent to the actuators, under certain conditions around robot startup and shutdown. Seeing it now on a second platform which shares almost no code with the original one has focused our debugging attention on ros_control itself, and our particular implementations of RobotHW. Some immediate questions which are not obvious to me from the docs and reading source:
Does the ControllerManager::update() method under any circumstances modify output the joint hardware interface values?

No, controllers do that. Obviously, RobotHW can also manipulate these values, as it has access to the raw data.

The cotroller manager does bookkeeping, but does not control resources itself.

In the ControllerManager::update() method, what are the circumstances under which one is supposed to pass the reset_controllers boolean as true?

Whenever you want to trigger a stop -> start -> update sequence for all controllers, instead of only doing update. Typically, this is done when recovering from a protective stop.

How is the transition from no-controller to controller and back handled? Are there circumstances where this transition could lead to undefined behaviour if not managed properly?

That's entirely up to you, and how you implement your RobotHW. My suggestion for the inital case of no controllers running is to initialize your commands with semantic zero values, or no-op values (some people use NaNs for this). Once you go back from a controller to no-controller configuration, typically controllers take care of doing sane things upon stopping.

In general, is there a good reference implementation for a simple RobotHW class? I worked primarily from the documentation and diff_drive_controller's diffbot unit test, but seeing a vetted, end-to-end example would be helpful in understanding what I might be doing wrong. Is the RobotHW code for either the REEM base or PAL mobile base publicly available somewhere?

No, as they leverage proprietary code. I think that real-time aspects aside, revamping the gazebo_ros_control package would be a good way to showcase common ros_control usage patterns. This has been on my TODO list for a while, I just haven't found a means to get it done.

Ours are here:

https://github.com/jackal/jackal_robot/blob/indigo-devel/jackal_base/src/jackal_base.cpp#L45
https://github.com/husky/husky_base/blob/indigo-devel/src/husky_base.cpp#L39

Neither are a strictly realtime control loop, but neither is actually doing anything closed loop (diff_drive_controller is open-loop), and both are doing USB comms to the hardware anyway.

Are there any specific suggested debugging strategies for this type of issue?

The problem as you describe it is quite open. Try to narrow it down, divide and conquer:

- Does it happen in the absence of controllers?

- Can you reproduce it in simulation?, not only Gazebo, but try writing a dummy hardware interface that implements perfect kinematic control, if possible.

One general piece of advice: Resource handles in ros_control typically contain _pointers_ to the raw data. Make sure to not invalidate them. There are many subtle ways in which you could unknowingly do this. A typical example (untested) would be:

const int n = 42;

std::vector<double> cmd_vec;

std::vector<FooHandle> h_vec;

cmd_vec.reserve(n); // Comment-out to enter the twilight zone

for (int i=0; i < n; ++i)
{

cmd_vec.push_back(0.0); // Might trigger reallocation of vector, invalidation of all previous handles

FooHandle h(&cmd_vec.back());

h_vec.push_back(h);

}

In the example above, if the 'cmd_vec.reserve(n)' call is omitted, it's possible that pushing back to 'cmd_vec' will reallocate it to a completely different memory region, invalidating all previously created handles. This is something that will happen only sometimes, and it depends on factors you don't have control of.

If you don't reserve(), you can also first create all the raw data, then the handles (in separate loops). Whatever you do, keep an eye on not invalidating pointers.

Obviously, out of control behaviour at any time is completely unacceptable, so we'd like to not only fix it for our own robots, but also improve the documentation and examples so that other implementors can avoid this potentially dangerous pitfall.

Keep us posted,

Adolfo.

Mike

--
You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Adolfo Rodríguez Tsouroukdissian
Senior robotics engineer
adolfo.r...@pal-robotics.com
http://www.pal-robotics.com

PAL ROBOTICS S.L
c/ Pujades 77-79, 4º4ª
08005 Barcelona, Spain.
Tel. +34.93.414.53.47
Fax.+34.93.209.11.09
Skype: adolfo.pal-robotics
Facebook - Twitter - PAL Robotics YouTube Channel

AVISO DE CONFIDENCIALIDAD: Este mensaje y sus documentos adjuntos, pueden contener información privilegiada y/o confidencial que está dirigida exclusivamente a su destinatario. Si usted recibe este mensaje y no es el destinatario indicado, o el empleado encargado de su entrega a dicha persona, por favor, notifíquelo inmediatamente y remita el mensaje original a la dirección de correo electrónico indicada. Cualquier copia, uso o distribución no autorizados de esta comunicación queda estrictamente prohibida.CONFIDENTIALITY NOTICE: This e-mail and the accompanying document(s) may contain confidential information which is privileged and intended only for the individual or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of this e-mail and/or accompanying document(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender at the above e-mail address.

pbo...@clearpathrobotics.com

unread,

Feb 13, 2015, 1:25:15 PM2/13/15

to ros-sig-ro...@googlegroups.com, mpu...@clearpathrobotics.com, pbo...@clearpathrobotics.com, adolfo.r...@pal-robotics.com

Hello all,

We've narrowed down the issue to an inconvenient jump in wall time that was passing a negative period to ControllerManager::update().

While this is something that needs to be corrected on the client side, is there a mechanism for reporting the issue from ControllerManager?

Would it be reasonable to issue a warning and bump negative periods to ros::Duration(0), to prevent out-of-control behaviour with diff_drive_controller, and potentially others?

Paul

Mike

To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-control+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

G.A. vd. Hoorn - 3ME

unread,

Feb 13, 2015, 1:52:41 PM2/13/15

to ros-sig-ro...@googlegroups.com

On 13/02/15 19:25, pbo...@clearpathrobotics.com wrote:
> Hello all,
>
> We've narrowed down the issue to an inconvenient jump in wall time that was
> passing a negative period to ControllerManager::update().
>
> While this is something that needs to be corrected on the client side, is
> there a mechanism for reporting the issue from ControllerManager?
>
> Would it be reasonable to issue a warning and bump negative periods to
> ros::Duration(0), to prevent out-of-control behaviour with
> diff_drive_controller, and potentially others?

I'm not intimately familiar with ros_control internals, but I've seen
many systems use CLOCK_MONOTONIC instead of CLOCK_REALTIME to avoid
clock adjustments having an effect on things like delta-T calculations.

Would it be an idea to use that here as well?

If ros::Time::now() is used, it appears it is using CLOCK_REALTIME [1].

Gijs

[1]
https://github.com/strawlab/ros_comm/blob/6f7ea2feeb3c890699518cb6eb3d33faa15c5306/utilities/rostime/src/time.cpp#L103

> On Friday, February 6, 2015 at 7:12:46 AM UTC-5, Adolfo Rodríguez
> Tsouroukdissian wrote:
>>
>>
>>
>> On Thu, Feb 5, 2015 at 5:06 PM, Mike Purvis <mpu...@clearpathrobotics.com
>> <javascript:>> wrote:
>>
>>> Hi all,
>>>
>>> This is a follow-on to my concurrency question from a few months ago. We
>>> put some workarounds in place for this on Jackal, which temporarily
>>> resolved it, but upon deploying ros_control on Husky, we are now seeing
>>> similar behaviour come up on that platform. The symptom is out of control
>>> commands sent to the actuators, under certain conditions around robot
>>> startup and shutdown. Seeing it now on a second platform which shares
>>> almost no code with the original one has focused our debugging attention on
>>> ros_control itself, and our particular implementations of RobotHW. Some
>>> immediate questions which are not obvious to me from the docs and reading
>>> source:
>>>

>>> - Does the ControllerManager::update() method under any circumstances

>>> modify output the joint hardware interface values?
>>>
>>> No, controllers do that. Obviously, RobotHW can also manipulate these
>> values, as it has access to the raw data.
>>
>> The cotroller manager does bookkeeping, but does not control resources
>> itself.
>>
>>>

>>> - In the ControllerManager::update() method, what are the

>>> circumstances under which one is supposed to pass the reset_controllers
>>> boolean as true?
>>>
>>> Whenever you want to trigger a stop -> start -> update sequence for all
>> controllers, instead of only doing update. Typically, this is done when
>> recovering from a protective stop.
>>
>>>

>>> - How is the transition from no-controller to controller and back

>>> email to ros-sig-robot-co...@googlegroups.com <javascript:>
>>> .

>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Adolfo Rodríguez Tsouroukdissian
>> Senior robotics engineer

>> adolfo.r...@pal-robotics.com <javascript:>

>> http://www.pal-robotics.com
>>
>> PAL ROBOTICS S.L
>> c/ Pujades 77-79, 4º4ª
>> 08005 Barcelona, Spain.
>> Tel. +34.93.414.53.47
>> Fax.+34.93.209.11.09
>> Skype: adolfo.pal-robotics

>> Facebook <http://www.facebook.com/palrobotics1> - Twitter
>> <http://twitter.com/#%21/palrobotics> - PAL Robotics YouTube Channel
>> <http://www.youtube.com/user/PALRobotics>

pbo...@clearpathrobotics.com

unread,

Feb 13, 2015, 3:36:01 PM2/13/15

to ros-sig-ro...@googlegroups.com

I've switched our robots to using a monotonic time source instead of ros::Time (system/wall time), which alleviates the problem.

I was thinking more along the lines that feeding a negative period to cm::update should trigger a warning of some sort, and a fallback behaviour such as zero-ing negative periods. This would go a long way in preventing programmer error from creating uncontrolled robots.

>>> email to ros-sig-robot-control+unsub...@googlegroups.com <javascript:>

Adolfo Rodríguez Tsouroukdissian

unread,

Feb 17, 2015, 6:06:13 AM2/17/15

to G.A. vd. Hoorn - 3ME, ros-sig-ro...@googlegroups.com

On Fri, Feb 13, 2015 at 7:52 PM, G.A. vd. Hoorn - 3ME <g.a.van...@tudelft.nl> wrote:

On 13/02/15 19:25, pbo...@clearpathrobotics.com wrote:

Hello all,

We've narrowed down the issue to an inconvenient jump in wall time that was
passing a negative period to ControllerManager::update().

The signature of the controller_manager::update method takes in the current (system) time and control loop period. It's been explicitly left up to the implementer to choose the best mechanism for providing these values.

http://docs.ros.org/indigo/api/controller_manager/html/c++/classcontroller__manager_1_1ControllerManager.html#a340a4947a82cb93c3bcbb42ca8dc645c

While this is something that needs to be corrected on the client side, is
there a mechanism for reporting the issue from ControllerManager?

No, the controller manager does not use the time and control loop duration values, it just passes them on to the controllers.

Would it be reasonable to issue a warning and bump negative periods to
ros::Duration(0),

I would not issue log statements from the control thread, which is potentially realtime, at least until we don't have a real-time safe logging mechanism.

Negative periods are being generated on your end, not by ros_control. Why not deal with the issue on your end as well, in your hardware abstraction?. It might make sense for you to bump negative periods to zero, but that's a specific policy that I'd not enforce for everybody. Other systems might want to react differently, maybe even enter an emergency state before operating under unexpected conditions.

to prevent out-of-control behaviour with
diff_drive_controller, and potentially others?

Controllers should strive to enforce or at least document their operating assumptions. I can think of controllers that are able to operate with negative periods, although they are not (at least typically) used like that.

I'm not intimately familiar with ros_control internals, but I've seen many systems use CLOCK_MONOTONIC instead of CLOCK_REALTIME to avoid clock adjustments having an effect on things like delta-T calculations.

Would it be an idea to use that here as well?

If ros::Time::now() is used, it appears it is using CLOCK_REALTIME [1].

Time sources are left to the implementer of a robot hardware abstraction, as different choices make sense on different deployment scenarios. I recommend using a monotonic source for computing control periods, as opposed to using system time differences (current - previous). The latter is not a good practice because system time is subject to changing at different rates (eg. ntp slew) or even discretely (eg. user calling `date --set=some_arbitrary_date`).

Adolfo.

Gijs

[1] https://github.com/strawlab/ros_comm/blob/6f7ea2feeb3c890699518cb6eb3d33faa15c5306/utilities/rostime/src/time.cpp#L103

email to ros-sig-robot-control+unsub...@googlegroups.com <javascript:>

To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-control+unsub...@googlegroups.com.

Mike Purvis

unread,

Feb 17, 2015, 12:03:20 PM2/17/15

to ros-sig-ro...@googlegroups.com, g.a.van...@tudelft.nl

On Tuesday, February 17, 2015 at 6:06:13 AM UTC-5, Adolfo Rodriguez wrote:

Negative periods are being generated on your end, not by ros_control. Why not deal with the issue on your end as well, in your hardware abstraction?. It might make sense for you to bump negative periods to zero, but that's a specific policy that I'd not enforce for everybody. Other systems might want to react differently, maybe even enter an emergency state before operating under unexpected conditions.

...

Time sources are left to the implementer of a robot hardware abstraction, as different choices make sense on different deployment scenarios. I recommend using a monotonic source for computing control periods, as opposed to using system time differences (current - previous). The latter is not a good practice because system time is subject to changing at different rates (eg. ntp slew) or even discretely (eg. user calling `date --set=some_arbitrary_date`).

We have corrected the issue in our immediate situation by switching to monotonic time as supplied by boost::chrono::steady_clock. However, this is a terrifying failure mode and IMO a really easy mistake for a novice implementor to make. The symptom (an occasionally out of control robot) has little in it which would lead the implementor to this cause, and should it occur for a novice, would reflect very poorly in their mind on the ros_control framework. That I asked for help at least twice, and even linked to within a few lines of the point where this_time - last_time was passed to the update() method, but no one spotted it as a problem is, I feel, significant.

In my opinion:

diff_drive_controller's vel/accel limit logic must be made robust to negative dt (probably by treating it as zero, and therefore permitting no acceleration in that update cycle).
the ros_control framework as a whole should find a way to warn users when the update() method is passed a negative dt. Having a full featured realtime-safe logging facility would be ideal, but failing that, some kind of flag which triggers this one message to be published elsewhere would be a very welcome.

With some design guidance, Paul and I can provide PRs for either or both of these.

In any case, I/we won't make this particular mistake again, but it's important to maintain an awareness that ros_control + diff_drive_controller is almost certainly the easiest route to getting a basic navigation-capable robot brought up. The degree of the malfunction combined with the difficulty in tracking it down means to me that the core framework should contain detection and warnings for this scenario.

M.

Adolfo Rodríguez Tsouroukdissian

unread,

Feb 18, 2015, 4:19:16 AM2/18/15

to Mike Purvis, ros-sig-ro...@googlegroups.com, G.A. vd. Hoorn - 3ME

On Tue, Feb 17, 2015 at 6:03 PM, Mike Purvis <mpu...@clearpathrobotics.com> wrote:

On Tuesday, February 17, 2015 at 6:06:13 AM UTC-5, Adolfo Rodriguez wrote:
Negative periods are being generated on your end, not by ros_control. Why not deal with the issue on your end as well, in your hardware abstraction?. It might make sense for you to bump negative periods to zero, but that's a specific policy that I'd not enforce for everybody. Other systems might want to react differently, maybe even enter an emergency state before operating under unexpected conditions.
...
Time sources are left to the implementer of a robot hardware abstraction, as different choices make sense on different deployment scenarios. I recommend using a monotonic source for computing control periods, as opposed to using system time differences (current - previous). The latter is not a good practice because system time is subject to changing at different rates (eg. ntp slew) or even discretely (eg. user calling `date --set=some_arbitrary_date`).

We have corrected the issue in our immediate situation by switching to monotonic time as supplied by boost::chrono::steady_clock. However, this is a terrifying failure mode and IMO a really easy mistake for a novice implementor to make. The symptom (an occasionally out of control robot) has little in it which would lead the implementor to this cause, and should it occur for a novice, would reflect very poorly in their mind on the ros_control framework. That I asked for help at least twice, and even linked to within a few lines of the point where this_time - last_time was passed to the update() method, but no one spotted it as a problem is, I feel, significant.

I understand your concern, and I'm OK for giving this issue more notoriety. I make a few suggestions below. On your example code that uses differences between ros::Time::now() stamps. Although using a monotonic clock is a fail-safe solution, there are usecases where what you did would be plausible. Simulated settings are the obvious example, but if you have control over your system clock, and you know that nobody is modifying it (manually or automatically), you can get ros::Time::now() to behave monotonically. I admit it's not really worth it with equally simple and more robust alternatives out there.

In my opinion:
diff_drive_controller's vel/accel limit logic must be made robust to negative dt (probably by treating it as zero, and therefore permitting no acceleration in that update cycle).
the ros_control framework as a whole should find a way to warn users when the update() method is passed a negative dt. Having a full featured realtime-safe logging facility would be ideal, but failing that, some kind of flag which triggers this one message to be published elsewhere would be a very welcome.
With some design guidance, Paul and I can provide PRs for either or both of these.

Two things come to mind:

1. Documentation: Write some decent entry-level tutorials, and put them in a highly visible place. Right now there are a few examples in the ros_control wiki [1] that could be revised and updated. The timing issue could be made explicit there, both in code and in the explanation.

2. Code: Make the controller manager publish diagnostic messages. I find this much more interesting than logging. We could compute some health metrics and report them at low frequency. Negative periods could be one. I'd love to also publish when an update cycle overruns, but this needs to be done in a realtime-safe manner.

3. More code: Make controller test suites exercise unintended operating conditions that could arise in practice.

[1] https://github.com/ros-controls/ros_control/wiki

Adolfo.

In any case, I/we won't make this particular mistake again, but it's important to maintain an awareness that ros_control + diff_drive_controller is almost certainly the easiest route to getting a basic navigation-capable robot brought up. The degree of the malfunction combined with the difficulty in tracking it down means to me that the core framework should contain detection and warnings for this scenario.

M.

--

You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.

Mike Purvis

unread,

Feb 18, 2015, 9:25:05 PM2/18/15

to ros-sig-ro...@googlegroups.com, mpu...@clearpathrobotics.com, g.a.van...@tudelft.nl

1. Documentation: Write some decent entry-level tutorials, and put them in a highly visible place. Right now there are a few examples in the ros_control wiki [1] that could be revised and updated. The timing issue could be made explicit there, both in code and in the explanation.

Right now there are some docs/tutorials on the ROS Wiki and others on the Github Wiki, and neither are really complete or up-to-date. At risk of introducing yet another thing, I wonder about a ros_control_tutorials package which contains actual compiling, runnable examples which can be studied, along with tutorials prepared as Sphinx pages, which suck in live source snippets. Perhaps as a compiling, CIed, version-controlled thing, it would be more likely to be maintained?

I guess there are two questions about such an undertaking: first, what level of completeness would have to be hit before we could actually begin to remove wiki content and instead point users at the live examples, and second, what even should the live examples be? Part of the promise of ros_control is that most users don't have to write controllers, and that you can get started in simulation with no code at all. So what would make sense to present as a self-contained example which can run anywhere? What "hardware" would lead to a simple, but representative implementation?

2. Code: Make the controller manager publish diagnostic messages. I find this much more interesting than logging. We could compute some health metrics and report them at low frequency. Negative periods could be one. I'd love to also publish when an update cycle overruns, but this needs to be done in a realtime-safe manner.

Perhaps a starting point could be another component in realtime_tools which extends RealtimePublisher to a RealtimeDiagnosticPublisher. You'd have to set it up and do all the string formatting and publishing on the non-RT thread (perhaps a singleton instance shared between all RealtimeDiagnosticPublisher instances?), and then basically have typed boxes available where the RT logic can stick the values to be sent out (including canned strings).

3. More code: Make controller test suites exercise unintended operating conditions that could arise in practice.

I feel way less qualified in this area, though I agree that it would be good to have. The diff_drive_controller limit fix should certainly include a test which verifies that it behaves safely in the face of a negative dt.

M.

Adolfo Rodríguez Tsouroukdissian

unread,

Feb 19, 2015, 6:08:43 AM2/19/15

to Mike Purvis, ros-sig-ro...@googlegroups.com, G.A. vd. Hoorn - 3ME

On Thu, Feb 19, 2015 at 3:25 AM, Mike Purvis <mpu...@clearpathrobotics.com> wrote:

1. Documentation: Write some decent entry-level tutorials, and put them in a highly visible place. Right now there are a few examples in the ros_control wiki [1] that could be revised and updated. The timing issue could be made explicit there, both in code and in the explanation.

Right now there are some docs/tutorials on the ROS Wiki and others on the Github Wiki, and neither are really complete or up-to-date. At risk of introducing yet another thing, I wonder about a ros_control_tutorials package which contains actual compiling, runnable examples which can be studied, along with tutorials prepared as Sphinx pages, which suck in live source snippets. Perhaps as a compiling, CIed, version-controlled thing, it would be more likely to be maintained?

After the recent rosindex (beta) announcement, I'm tempted to host documentation in-place with the code. After migrating the docs to a single place, with a uniform and consistent look and feel, we can offload the current ROS wiki contents. Rather than removing the ROS wiki pages, they could provide a link to the actual doc site. Compiling, runnable examples would definitely rock. No more bitrot for them.

I am well aware that the documentation of ros_control and friends is currently fragmented between the ROS wiki, Github wikis and the ROScon'14 talk. It's not that there is no documentation, but that there is no single go-to place, and this is not good. The thing is that this is a work package that I have not yet been able to justify, and nobody else has volunteered so far to take on it. If you'd like to champion at least part of this effort, I'd be happy to contribute as much as I can.

I guess there are two questions about such an undertaking: first, what level of completeness would have to be hit before we could actually begin to remove wiki content and instead point users at the live examples, and second, what even should the live examples be? Part of the promise of ros_control is that most users don't have to write controllers, and that you can get started in simulation with no code at all. So what would make sense to present as a self-contained example which can run anywhere? What "hardware" would lead to a simple, but representative implementation?

I would say that a diff-drive with a simple arm, and maybe a gripper would be enough to exemplify most of ros_control. I agree that a single, simple platform used across most tutorials would be great for simplicity and consistency's sake.

2. Code: Make the controller manager publish diagnostic messages. I find this much more interesting than logging. We could compute some health metrics and report them at low frequency. Negative periods could be one. I'd love to also publish when an update cycle overruns, but this needs to be done in a realtime-safe manner.

Perhaps a starting point could be another component in realtime_tools which extends RealtimePublisher to a RealtimeDiagnosticPublisher. You'd have to set it up and do all the string formatting and publishing on the non-RT thread (perhaps a singleton instance shared between all RealtimeDiagnosticPublisher instances?), and then basically have typed boxes available where the RT logic can stick the values to be sent out (including canned strings).

There are two alternatives for pushing info out to ROS at lower frequencies:

1. Frequency divider + realtime publisher:
You publish content from the realtime loop once every n cycles. You typically pre-allocate resources and use a realtime publisher. This is more or less what you're proposing above.

2. ROS timer and shared data:

You compute a set of values in the realtime loop and store them in a shared data structure. For diagnostics purposes, these could be counters and timestamps for events like negative periods, overruns, etc. This structure would be protected by a synchronization primitive such that it can be read from a callback associated to a ROS timer. The upside is that you can offload some non real-time processing, such as pretty-printing messages, to a non real-time thread. As synchronization primitive, RealtimeBuffer would not do, as it's meant for non-rt writes, rt reads. A simple lock could do for an initial implementation.

This last aspect raises the need for a good lock-free synchronization primitive, which `realtime_tools` lacks. This was recently brought up here:
https://github.com/ros-controls/realtime_tools/issues/14

3. More code: Make controller test suites exercise unintended operating conditions that could arise in practice.

I feel way less qualified in this area, though I agree that it would be good to have. The diff_drive_controller limit fix should certainly include a test which verifies that it behaves safely in the face of a negative dt.

We can at least open issues when such problems are identified, just to have a record that they exist.

Thanks for your comments, you bring up very relevant subjects.

Adolfo.

M.

--
You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Adolfo Rodríguez Tsouroukdissian
Senior robotics engineer
adolfo.r...@pal-robotics.com

http://www.pal-robotics.com

PAL ROBOTICS S.L
c/ Pujades 77-79, 4º4ª
08005 Barcelona, Spain.
Tel. +34.93.414.53.47
Fax.+34.93.209.11.09
Skype: adolfo.pal-robotics

Facebook - Twitter - PAL Robotics YouTube Channel

Reply all

Reply to author

Forward