Hi all,This is a follow-on to my concurrency question from a few months ago. We put some workarounds in place for this on Jackal, which temporarily resolved it, but upon deploying ros_control on Husky, we are now seeing similar behaviour come up on that platform. The symptom is out of control commands sent to the actuators, under certain conditions around robot startup and shutdown. Seeing it now on a second platform which shares almost no code with the original one has focused our debugging attention on ros_control itself, and our particular implementations of RobotHW. Some immediate questions which are not obvious to me from the docs and reading source:
- Does the ControllerManager::update() method under any circumstances modify output the joint hardware interface values?
- In the ControllerManager::update() method, what are the circumstances under which one is supposed to pass the reset_controllers boolean as true?
- How is the transition from no-controller to controller and back handled? Are there circumstances where this transition could lead to undefined behaviour if not managed properly?
In general, is there a good reference implementation for a simple RobotHW class? I worked primarily from the documentation and diff_drive_controller's diffbot unit test, but seeing a vetted, end-to-end example would be helpful in understanding what I might be doing wrong. Is the RobotHW code for either the REEM base or PAL mobile base publicly available somewhere?
Ours are here:Neither are a strictly realtime control loop, but neither is actually doing anything closed loop (diff_drive_controller is open-loop), and both are doing USB comms to the hardware anyway.Are there any specific suggested debugging strategies for this type of issue?
Obviously, out of control behaviour at any time is completely unacceptable, so we'd like to not only fix it for our own robots, but also improve the documentation and examples so that other implementors can avoid this potentially dangerous pitfall.
--Mike
You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Mike
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-control+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
>>> email to ros-sig-robot-control+unsub...@googlegroups.com <javascript:>
On 13/02/15 19:25, pbo...@clearpathrobotics.com wrote:
Hello all,
We've narrowed down the issue to an inconvenient jump in wall time that was
passing a negative period to ControllerManager::update().
While this is something that needs to be corrected on the client side, is
there a mechanism for reporting the issue from ControllerManager?
Would it be reasonable to issue a warning and bump negative periods to
ros::Duration(0),
to prevent out-of-control behaviour with
diff_drive_controller, and potentially others?
I'm not intimately familiar with ros_control internals, but I've seen many systems use CLOCK_MONOTONIC instead of CLOCK_REALTIME to avoid clock adjustments having an effect on things like delta-T calculations.
Would it be an idea to use that here as well?
If ros::Time::now() is used, it appears it is using CLOCK_REALTIME [1].
Gijs
[1] https://github.com/strawlab/ros_comm/blob/6f7ea2feeb3c890699518cb6eb3d33faa15c5306/utilities/rostime/src/time.cpp#L103
email to ros-sig-robot-control+unsub...@googlegroups.com <javascript:>
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-control+unsub...@googlegroups.com.
Negative periods are being generated on your end, not by ros_control. Why not deal with the issue on your end as well, in your hardware abstraction?. It might make sense for you to bump negative periods to zero, but that's a specific policy that I'd not enforce for everybody. Other systems might want to react differently, maybe even enter an emergency state before operating under unexpected conditions.
...
Time sources are left to the implementer of a robot hardware abstraction, as different choices make sense on different deployment scenarios. I recommend using a monotonic source for computing control periods, as opposed to using system time differences (current - previous). The latter is not a good practice because system time is subject to changing at different rates (eg. ntp slew) or even discretely (eg. user calling `date --set=some_arbitrary_date`).
On Tuesday, February 17, 2015 at 6:06:13 AM UTC-5, Adolfo Rodriguez wrote:Negative periods are being generated on your end, not by ros_control. Why not deal with the issue on your end as well, in your hardware abstraction?. It might make sense for you to bump negative periods to zero, but that's a specific policy that I'd not enforce for everybody. Other systems might want to react differently, maybe even enter an emergency state before operating under unexpected conditions....Time sources are left to the implementer of a robot hardware abstraction, as different choices make sense on different deployment scenarios. I recommend using a monotonic source for computing control periods, as opposed to using system time differences (current - previous). The latter is not a good practice because system time is subject to changing at different rates (eg. ntp slew) or even discretely (eg. user calling `date --set=some_arbitrary_date`).We have corrected the issue in our immediate situation by switching to monotonic time as supplied by boost::chrono::steady_clock. However, this is a terrifying failure mode and IMO a really easy mistake for a novice implementor to make. The symptom (an occasionally out of control robot) has little in it which would lead the implementor to this cause, and should it occur for a novice, would reflect very poorly in their mind on the ros_control framework. That I asked for help at least twice, and even linked to within a few lines of the point where this_time - last_time was passed to the update() method, but no one spotted it as a problem is, I feel, significant.
In my opinion:
- diff_drive_controller's vel/accel limit logic must be made robust to negative dt (probably by treating it as zero, and therefore permitting no acceleration in that update cycle).
- the ros_control framework as a whole should find a way to warn users when the update() method is passed a negative dt. Having a full featured realtime-safe logging facility would be ideal, but failing that, some kind of flag which triggers this one message to be published elsewhere would be a very welcome.
With some design guidance, Paul and I can provide PRs for either or both of these.
In any case, I/we won't make this particular mistake again, but it's important to maintain an awareness that ros_control + diff_drive_controller is almost certainly the easiest route to getting a basic navigation-capable robot brought up. The degree of the malfunction combined with the difficulty in tracking it down means to me that the core framework should contain detection and warnings for this scenario.M.
--
You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.
1. Documentation: Write some decent entry-level tutorials, and put them in a highly visible place. Right now there are a few examples in the ros_control wiki [1] that could be revised and updated. The timing issue could be made explicit there, both in code and in the explanation.
2. Code: Make the controller manager publish diagnostic messages. I find this much more interesting than logging. We could compute some health metrics and report them at low frequency. Negative periods could be one. I'd love to also publish when an update cycle overruns, but this needs to be done in a realtime-safe manner.
3. More code: Make controller test suites exercise unintended operating conditions that could arise in practice.
1. Documentation: Write some decent entry-level tutorials, and put them in a highly visible place. Right now there are a few examples in the ros_control wiki [1] that could be revised and updated. The timing issue could be made explicit there, both in code and in the explanation.Right now there are some docs/tutorials on the ROS Wiki and others on the Github Wiki, and neither are really complete or up-to-date. At risk of introducing yet another thing, I wonder about a ros_control_tutorials package which contains actual compiling, runnable examples which can be studied, along with tutorials prepared as Sphinx pages, which suck in live source snippets. Perhaps as a compiling, CIed, version-controlled thing, it would be more likely to be maintained?
I guess there are two questions about such an undertaking: first, what level of completeness would have to be hit before we could actually begin to remove wiki content and instead point users at the live examples, and second, what even should the live examples be? Part of the promise of ros_control is that most users don't have to write controllers, and that you can get started in simulation with no code at all. So what would make sense to present as a self-contained example which can run anywhere? What "hardware" would lead to a simple, but representative implementation?
2. Code: Make the controller manager publish diagnostic messages. I find this much more interesting than logging. We could compute some health metrics and report them at low frequency. Negative periods could be one. I'd love to also publish when an update cycle overruns, but this needs to be done in a realtime-safe manner.Perhaps a starting point could be another component in realtime_tools which extends RealtimePublisher to a RealtimeDiagnosticPublisher. You'd have to set it up and do all the string formatting and publishing on the non-RT thread (perhaps a singleton instance shared between all RealtimeDiagnosticPublisher instances?), and then basically have typed boxes available where the RT logic can stick the values to be sent out (including canned strings).
3. More code: Make controller test suites exercise unintended operating conditions that could arise in practice.I feel way less qualified in this area, though I agree that it would be good to have. The diff_drive_controller limit fix should certainly include a test which verifies that it behaves safely in the face of a negative dt.
M.--
You received this message because you are subscribed to the Google Groups "ROS/Orocos Robot Control Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-robot-co...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.