We have testing!

Michael Wimble

unread,

Jan 2, 2026, 1:15:30 AM (6 days ago) Jan 2

to hbrob...@googlegroups.com

2026 01 01 Thursday

When someone recently asked me recently about what I did for safety in my robot, I spent a few days looking over notes, code, system documentation, and created a brief paper describing a combination of the safety system I’ve actually implemented and a few bits about what was intended but not quite there yet.

About 4 years ago, I created the robot Raven, the predecessor of my current robot, Sigyn. You can see a description of Raven at https://wimblerobotics.wimble.org/wp/2021/09/19/an-overview-of-puck/.

One of its main goals was to implement a much better safety system. Basically, if I cannot trust my robot, it’s useless to me. I need to know that the battery isn’t about to die or catch fire, that the robot doesn’t smash into things, that it hasn’t fallen over and is crying its eyes out.

There are actually a fair number of things that can go wrong with a robot, and some of the bad things happen fairly often. For instance, the robot can try to drive over something that can get tangled up in its wheels. The robot can get lost and not know exactly where it is and maybe not how to get to somewhere else. The robot can just drain its battery and shut down without warning. The robot can lose connection to the network and not be able to talk to me or figure out what to do next.

So Raven began getting mostly software upgrades to detect various kinds of failures or predict failures. It got redundant current monitors for the motors. When the robot runs into something like a wall, something has already gone wrong. It not only should have known it was about to run into the wall and not gone there, so that’s one kind of failure, but when the robot hits a wall, the motors will either slow way down or even stop. In both cases, bad things happen; especially, it’s very possible for the copper wires inside of the motor to actually melt.

This can cause magic smoke to start coming out of the robot. It definitely means that I’m going to have to spend hundreds of dollars to replace the motor. In fact, the motors I use are not even made anymore, so it will be very expensive and take a fairly long time to find replacements and redesign the robot to use the new motors.

Raven had multiple sensors to detect when things got near. No sensor that I can buy, though, works really well for all of the dangers in a house. Cameras don’t work well in the dark. Lasers don’t see glass very well. SONAR sensors don’t deal well with sound-absorbing coatings, like soft, cushy furniture. Raven tried to get around this by using all of those kinds of sensors at the same time. But each sensor has its own set of problems and limitations as well. Avoiding obstacles by a robot is a really hard problem.

One of the safety systems I tried in Raven was to implement “rings of protection”. If the nearest known obstacle was, say, six inches or more away, the robot operated normally. If the robot detected an obstacle between 4 and six inches away, however, it knew something had gone wrong and the sensor signaled the robot to slow way down, thinking that maybe all the calculations were just backed up and, given a bit of time, it could catch up with handling all of the data and move away from the obstacle. If the robot found something that was less than 4 inches away, though, it knew that the software that was trying to figure out where to go and avoid hitting things had failed, so the sensor told the robot to just stop. Ideally, it would then reach out to me for help.

Even things like temperature sensors entered into the safety equation. Not only do you want the temperature of everything that could be measured to stay within a safe range, but you want to measure the rate of change of temperature. For instance, when the robot moves, the motors heat up. When the robot gets, say, a power cord or an old sock that was left on the floor wrapped around the wheel axle, the motors start slowing down and they heat up pretty quickly.

Raven mostly dealt with trying to sense whatever safety information could be derived from each of the kinds of sensors it had. Accelerometers not only gave information about the change of speed of the robot but they also told whether the robot was tilting over, or was even about to tilt over. Voltage monitors not only told if a battery was needing to be charged, but they could help predict how long the robot could continue to run before the battery died.

Sign has picked up from Raven. Sigyn is quite a bit more complex than Raven. Sigyn has at least 6 computers in it, and each of them has to deal with safety in some way. In particular, there are issues you have to deal with in just how do the computers talk to each other to signal the safety information they know about. Also, Sigyn now wants to deal with self-healing problems. If the robot detects the motors are overheating, it doesn’t want to just turn off the robot; it wants to slow down the motors, or stop the motors for just a little while. If the temperature returns to a safe value, there is no need to get me involved.

I had a lot of code for safety in my robot, but it wasn’t all well defined, it wasn’t all written completely, it wasn’t all hooked up to actually do something in response to safety concerns.

My paper talked a lot about my robot safety, but had to keep it simple and short. It was just a teaser about how safety worked at a high level. That began to change starting on December 31, 2025.

For the last two days I’ve spent a huge part of my mortality fixing the safety system. Not only have I but actually completed some of the code that hadn’t actually gotten finished before, but I began tweaking how safety actually works. That meant the whole philosophy of how to deal with safety has begun to change. Conceptually, at a high level, it’s still the same, but the actual lines of code and data structures have changed quite a bit.

Even more importantly, I now have rearchitected a lot of code to support actual testing of safety. Most of the time, I can’t actually get the robot to fail for a test. It’s not practical for me to tilt the robot in a lot of directions to see if it actually detects that it’s about to fall over. It’s not practical for me to aim a blowtorch at the motors to see if the robot detects that the motors are heating up too quickly. It’s not good to actually drain the battery until it fails—each time that the battery dies, it shortens the life of the expensive battery and makes it more likely to go up in flames in the future.

So, by using “dependency injection”, all of my hardware can be swapped out by a bit of text in a file that says that I want to not actually talk to a real voltage sensor or a real accelerometer, but I want to talk to a fake device. Now my tests can talk to, say, my fake thermometer and say, “pretend to sense a 20°C temperature. Now wait 3.7 seconds and pretend the thermometer now reads 23.7°C, and so on. Now my code that looks for how quickly the motor temperature sensor is ramping up can actually get exercised. I can see that the code detects the unsafe condition. I can see that a safety condition message gets sent to the safety coordinator software component. I can see that the safety coordinator asks the motors to shutdown or asks the navigation planner to slow the motors down. Then I can reverse the temperature rise and see if the safety system “self heals” and turns the motor back on or lets the motors run fast again.

It’s the start of not only designing a better safety system but actually verifying that the various algorithms work correctly, and that all the interconnected parts of the robot “do the right thing” when something goes wrong.

It’s a great start of the year for my robot adventures.

Oh, have I ever mentioned that everything about robots is hard?

camp .

unread,

Jan 2, 2026, 5:44:57 AM (6 days ago) Jan 2

to hbrob...@googlegroups.com

> Oh, have I ever mentioned that everything about robots is hard?

Sergei's always saying it's "easy." :-]]]

- Camp

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/1A7E9A31-FD95-4505-9668-6572EAC4D5F4%40gmail.com.

Sergei Grichine

unread,

Jan 2, 2026, 12:32:31 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

Well, Camp, my philosophy is to "search for the easiest way to solve the hardest problem at hand and, in the process, make it easy for others to repeat". And, hopefully, to get some feedback to further polish the solution.

But back to Michael’s very insightful essay.

“Dependency injection” usually implies something fixed at compile time, and a file with test parameters also suggests something static. Ideally, you want two testing scenarios:
- a robot in a Gazebo simulation with pre-programmed or manually triggered faults
- a real robot with faults that can be triggered from the command line, a script, or a GUI

When it comes to testing, there are two common patterns: SITL and HITL. The PX4 team has perfected this approach and has a solid track record of using both.

Google AI says the following:

In the context of 2026 robotics and autonomous systems development, SITL and HITL are hierarchical testing patterns used to validate flight or motion control logic before real-world deployment.

1. SITL (Software-in-the-Loop)

SITL involves running the entire system—including the control software (firmware) and the physical environment—entirely on a development computer.

How it Works: The autopilot or control code is compiled to run on a standard OS (like Linux or Windows) rather than an embedded chip. It communicates with a simulator (e.g., Gazebo, AirSim, or FlightGear) that provides "fake" sensor data (GPS, IMU) and receives "fake" motor commands.
Best For: Rapid iteration, testing flight plans, refactoring code, and initial tuning of control gains.
Advantage: Fast and low-cost; requires no physical hardware and can often run faster than real-time.
Limitation: It does not account for hardware-specific constraints like processor latency, memory limits, or electrical noise.

2. HITL (Hardware-in-the-Loop)

HITL incorporates the actual physical flight controller (the "hardware") into the simulation loop.

How it Works: The real firmware runs on the actual embedded hardware (e.g., a Pixhawk or custom MCU). This hardware is connected to a computer running a simulation environment. The simulator sends simulated sensor data to the physical hardware via serial/USB, and the hardware responds by sending back control signals as if it were flying.
Best For: Validating hardware-specific behaviors, testing I/O connectivity, and ensuring the CPU can handle the computational load in real-time.
Advantage: High fidelity; it reveals "hidden" issues like threading bugs, timing latencies, or driver errors that SITL cannot catch.
Limitation: More complex to set up; requires physical hardware and must run in strict real-time.

And, from the PX4 docs:

- https://docs.px4.io/main/en/simulation/#sitl-simulation-environment

- https://docs.px4.io/main/en/simulation/failsafes

- https://docs.px4.io/main/en/simulation

I hope this helps.

Best Regards,
-- Sergei

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/76884377.6212829.1767350686109%40mail.yahoo.com.

Thomas Messerschmidt

unread,

Jan 2, 2026, 1:55:13 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com, hbrob...@googlegroups.com

To make my robot safe, I just tell it that it must follow Isaac Asimov’s “Three Laws of Robotics.” 😆 But, I am guessing that your methods should work just as well. 😁

Thomas Messerschmidt

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jan 1, 2026, at 10:15 PM, Michael Wimble <mwi...@gmail.com> wrote:

--

James H Phelan

unread,

Jan 2, 2026, 2:00:33 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

Asimov's stories detail how the "Three Laws of Robotics" fail in multiple cases!

Paging Dr Calvin! Paging Dr Susan Calvin!

James

James H Phelan
"Nihil est sine ratione cur potius sit quam non sit"
Leibniz

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/B5F209E7-4AFB-4775-A685-38365AEBDC11%40gmail.com.

Thomas Messerschmidt

unread,

Jan 2, 2026, 2:04:32 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com, hbrob...@googlegroups.com

😆

Thomas Messerschmidt

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jan 2, 2026, at 11:00 AM, 'James H Phelan' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/31060ac6-4536-41f1-b756-4ce463b69411%40hal-pc.org.

Michael Wimble

unread,

Jan 2, 2026, 2:06:41 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

Well, I suppose my testing I’m describing is SITL (software in the loop). More specifically, it’s dependency injection, which in terms of testing means that dependencies are replaceable. My temperature monitor (driver) depends on an analog temperature sensor. With dependency injection, I can run my code with a real temperature sensor or a mock temperature sensor.

With the ability to inject the decision about what kind of temperature sensor to use, I can startup the code normally and then an independent testing program can inject a mock temperature sensor as well as inject a mock Arduino device where I can simulate the analog to digital converter and can control the timing registers so that the millis() and micros() read whatever I want, allowing me to simulate a temperature rise and fall over time.

Here’s an example. I have a driver for the temperature sensors. Some interesting functions are:

class TemperatureMonitor : public Module {
public:
  static TemperatureMonitor& getInstance();
  
  // Sensor access interface
  float getTemperature(uint8_t sensor_index, bool fahrenheit = false) const;
  bool isSensorValid(uint8_t sensor_index) const;
  bool isTemperatureCritical(uint8_t sensor_index) const;
  uint8_t getSensorCount() const;
  const TemperatureSensorStatus& getSensorStatus(uint8_t sensor_index) const;
  const TemperatureSystemStatus& getSystemStatus() const { return system_status_; }
  
  // Configuration interface
  void updateConfig(const TemperatureMonitorConfig& config) { config_ = config; }
  const TemperatureMonitorConfig& getConfig() const { return config_; }
  void configureSensor(uint8_t sensor_index, const TemperatureSensorConfig& sensor_config);
  
  // Dependency injection for testing
  void setAnalogReader(IAnalogReader* reader) { analog_reader_ = reader; }
  

Especially look at the setAnalogReader function. The TemperatureMonitor class needs an analog sensor reader. That Analog reader is defined as an abstract class with only two functions like:

class IAnalogReader {
public:
  virtual ~IAnalogReader() = default;

  /**
   * @brief Read raw analog value from specified pin
   *
   * @param pin Analog pin number to read
   * @return Raw ADC value (0-4095 for 12-bit resolution)
   *
   * Production: Calls Arduino analogRead()
   * Testing: Returns injected test value
   */
  virtual uint16_t readAnalog(uint8_t pin) = 0;

  /**
   * @brief Set analog read resolution
   *
   * @param bits Resolution in bits (typically 10-12 for Teensy)
   *
   * Production: Calls Arduino analogReadResolution()
   * Testing: No-op or stores for validation
   */
  virtual void setAnalogResolution(uint8_t bits) = 0;
};

By default, when the Teensy board code starts up, it uses the real hardware subclass which is pretty simple and calls the normal Arduino hardware functions to set the analog resolution and read an analog value:

class AnalogReaderHW : public IAnalogReader {
public:
  AnalogReaderHW() = default;
  ~AnalogReaderHW() override = default;

  uint16_t readAnalog(uint8_t pin) override {
    return analogRead(pin);
  }

  void setAnalogResolution(uint8_t bits) override {
    analogReadResolution(bits);
  }
};

So the constructor in TemperatureMonitor looks like:

// Static hardware reader for production use (no heap allocation)
static AnalogReaderHW hardware_reader;

TemperatureMonitor::TemperatureMonitor()
    : Module(),
      config_(),
      system_status_(),
      temp_state_(TempState::UNINITIALIZED),
      warning_start_time_ms_(0),
      critical_start_time_ms_(0),
      thermal_protection_engaged_(false),
      last_status_report_time_ms_(0),
      last_diagnostic_report_time_ms_(0),
      last_sensor_scan_time_ms_(0),
      last_safety_check_time_ms_(0),
      system_start_time_ms_(0),
      total_system_readings_(0),
      total_system_errors_(0),
      last_performance_update_ms_(0),
      analog_reader_(&hardware_reader) {  // Initialize with hardware reader

And you see that the last assignment in the constructor says that, by default (i.e., when not testing), use an analog reader class that talks to the real hardware. But when testing, after instantiating TemperatureMonitor, the testing software comes along and replaces the analog reader with a mock class:

class SafetyCoordinatorTest : public ::testing::Test {
protected:
  void SetUp() override {
    // Reset mock Arduino state before each test
    arduino_mock::reset();
    
    // Create mock analog reader
    mock_reader = std::make_unique<MockAnalogReader>();
    
    // Get instances and inject mock
    temp_monitor = &TemperatureMonitor::getInstance();
    temp_monitor->setAnalogReader(mock_reader.get());

Now, for instance, I have a test that can set the mock temperature sensor to read any value I want. My temperature monitor software uses an EMA (exponential moving average) low pass filter to remove noise from the actual hardware readings. That means there is a lag in getting the real-world temperature. My code doesn’t read the temperature sensor once and believe it, it needs several readings. The EMA filter gives weight to the more recent readings, but the EMA filter makes it responsive to current trends while reducing noise. It isn’t sufficient to just set the fake analog reading I want, I have to take multiple readings over time and wait for the temperature to stabilize.

On my real robot, I have a temperature sensor on each motor housing. It’s a simple, 3-terminal sensor (power, ground, analog out) that is epoxied to the motor housing. It is subject to both analog sensing noise and induced noise on the cable that connects the sensor to the analog-to-digital converter in the Teensy 4.1 chip. When the motors run, they begin to warm up. If the motor isn’t running at the commanded speed, though, like if you grab the wheel to slow it down (similar to climbing a hill) or, much worse, you cause the motor to actually stop when it’s trying to move, the temperature rises rather quickly in motor. I’m not just interested in whether the long-term temperature of the motor is within some range of values, I need to know if the motor is going to melt the windings in the near future.

One of my safety features is to compute the slope of the temperature rise. I would be impractical to test my code on real hardware to see if the size of the filter buffer, the weights of the exponential filter and such will cause me to detect the motor stall correctly. The safety system mustn’t miss a prediction of melting windings by being too slow to react, and it must not falsely predict a meltdown when the motor is operating normally.

This is a fairly good example of the kind of test I need. But it’s part of a whole series of test. Below, know that my EMA filter has a circular buffer for holding the last 50 samples read; the sensor is currently read 20 times a second; at least 5 readings must take place before the monitor considers that is has enough data to predict the actual temperature. Here’s what I have after one day of generating tests—it is testing the temperature monitor, and it’s interaction with the safety coordinator which integrates the safety reports from all of the hardware and controls the safety response, and also that the performance monitor computes statistics correctly:

Test Summary (17 Tests Total)

Temperature Monitor Tests (11 tests in temperature_monitor_test.cpp)

TrendCalculation_InsufficientData: Verifies that temperature trend returns 0.0 when fewer than 5 samples are available
TrendCalculation_MinimumSamples: Tests trend calculation with exactly 5 samples (minimum required)
TrendCalculation_FullBuffer: Tests circular buffer with all 50 samples filled
TrendCalculation_AfterWraparound: Verifies correct oldest/newest sample identification after buffer wraps
TrendCalculation_StableTemperature: Tests that stable temperature produces zero trend
TrendCalculation_CoolingTrend: Tests negative temperature slope detection
MockArduino_TimeControl: Verifies mock Arduino time functions work correctly
MockArduino_AnalogIO: Tests analog pin value injection
MockAnalogReader_TemperatureInjection: Tests temperature-to-ADC conversion via mock
ReadTemperature_NormalRange: Verifies temperature reading and conversion accuracy
Safety_CriticalHighTemperature: Tests that 90°C triggers critical high alarm (85°C threshold)
Safety_WarningTemperature: Tests warning threshold (71°C) without triggering critical
Safety_ThermalRunaway: Tests rapid temperature rise detection (165°C/min) and self-healing
Safety_SlowRise_NoRunaway: Verifies slow rise (67°C/min) doesn't trigger runaway
Safety_TemperatureRecovery: Tests that warnings clear when temperature drops
MultiSensor_IndependentOperation: Verifies left and right motor sensors operate independently
SystemStatus_Aggregation: Tests system-level statistics (average, min, max temperature)

Safety Coordinator Tests (6 tests in safety_coordinator_test.cpp)

Integration_TemperatureWarningFault: Tests that SafetyCoordinator tracks temperature warning faults from TemperatureMonitor
Integration_TemperatureCriticalFault: Tests critical temperature fault propagation to SafetyCoordinator
Integration_ThermalRunawayFault: Tests thermal runaway detection integration
Integration_TemperatureRecovery: Tests fault recovery and clearing
Integration_LowTemperatureWarning: Tests low temperature warning detection (below 5°C)
Integration_LowTemperatureCritical: Tests critical low temperature alarm (below 0°C)

Key Testing Capabilities

Time-based simulation: Tests can control Arduino millis() to simulate time passing
Temperature injection: Mock analog reader can inject specific temperature values
Trend analysis: Tests verify temperature rate-of-change calculations (°C/min)
Safety thresholds: Tests verify warning (70°C), critical (85°C), and thermal runaway (100°C/min) detection
Self-healing: Tests verify that thermal runaway clears when temperature stabilizes
Multi-sensor: Tests verify independent operation of left and right motor sensors
Integration: SafetyCoordinator tests verify proper fault tracking across modules

The tests use Google Test framework with dependency injection (mock IAnalogReader) to simulate hardware behavior without requiring actual Teensy boards.

Jeremy Williams

unread,

Jan 2, 2026, 2:19:30 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

On the surface it seems so simple. But the reality is the issue of robot safety is complex and nuanced.

An off switch that shuts all power off may seem obvious until you have someone underneath a heavy and raised arm. In that case you need the power on to keep the arm in a safe position. But what if the motors are getting overheated to the point of approaching failure?

The issue of safety could be dismissed as overthinking the problem, but harm enough humans and robotics could be set back decades. “Think of the children” political arguments can really cramp progress.

Putting bots behind cages to keep us safe just won’t work in our new robots everywhere culture to come.

How many servos and motors have to die before we can provide better ways to keep them within safe ranges of load and power?

At the risk of reinventing the wheel here, as I know others have spent considerable time on these matters, I’d suggest a hierarchy of safety goals. Danger to over/under currents. Shorts, collisions, sensors within range (similar to OBS2 car systems) and defaults to sub out for bad levels. Video inputs of the work area for clear movement zones. Sensing of humans/pets to make safety fences on the fly. And then a ranking system to see which takes priority.

The more I consider, the more squirrel holes I see.

Maybe we will need to augment ROS with SOS, a Safety Operating System. It could oversee all that ROS does.

And as ROS is super easy to implement, I’m sure SOS will also be a walk in the park.

Thanks to all whom have contributed to this most interesting topic 😊

Jeremy Williams

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/B5F209E7-4AFB-4775-A685-38365AEBDC11%40gmail.com.

Jeremy Williams

unread,

Jan 2, 2026, 2:39:09 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

Great work Michael!

Do you think that temperature sensing may be too slow? Is there merit in using software to monitor voltage (movement) outputs compared to actual movement?

For example, if a voltage has been supplied for x ms with less than expected movement from the motor, an assumption can be made that temperature will soon rise to unsafe levels? A caution flag could be set that will cause limits to increased voltage to preempt the heating that will soon come.

From experience, magic smoke can be produced incredibly fast. Maybe before a temperature sensor has time to heat and report before damage has already been done.

I did dabble with this last year with some pretty bad feedback issues causing my bot to pretty much go insane. Camp may recall that short lived demo lol. It’s was breakdancing on steroids. And a motor burned out just to spite me.

But I tried.

--

You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/E2AC8C4E-15C1-48C5-A962-E32C9B3E9F58%40gmail.com.

Michael Wimble

unread,

Jan 2, 2026, 8:59:00 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

On Jan 2, 2026, at 11:38 AM, Jeremy Williams <hotandsunnywi...@gmail.com> wrote:

Great work Michael!

Do you think that temperature sensing may be too slow? Is there merit in using software to monitor voltage (movement) outputs compared to actual movement?

For example, if a voltage has been supplied for x ms with less than expected movement from the motor, an assumption can be made that temperature will soon rise to unsafe levels? A caution flag could be set that will cause limits to increased voltage to preempt the heating that will soon come.

From experience, magic smoke can be produced incredibly fast. Maybe before a temperature sensor has time to heat and report before damage has already been done.

I did dabble with this last year with some pretty bad feedback issues causing my bot to pretty much go insane. Camp may recall that short lived demo lol. It’s was breakdancing on steroids. And a motor burned out just to spite me.

But I tried.

I’m sensing the temperature 20 times per second. The mass of the motor case is, I’m guessing, far more than enough to guarantee that temperature rises will always be caught by the sensor. The 20 times per second is really there just to get rid of the noise in the readings, though the buffer then needs to be long enough to capture enough readings to catch the trend line. I’m not sure I have the buffer length correct yet.

I do intend to capture a discrepancy between intended motor motion and actual motion. Here’s what I think I’ll do:

* The cmd_vel is captured by my Teensy software which then computes not only left and right motor velocity, but also assigns an acceleration.

* I can capture the current velocity, capture the current time when cmd_vel comes in and then read the wheel encoders after that and see if the motor is accelerating as expected and running at the velocity as expected.

* The Teensy computes the odom to base_link transform. I could send the map to odom transform to the Teensy or on the ROS 2 side use the two transforms and compare the localization change to the cmd_vel expected change and detect that the robot isn’t moving at all as expected, like if the motors are slipping or not turning at the expected rate. That is, if the Teensy says the robot is moving but localization says it is not, or if cmd_vel says the robot should be moving but the odom to base_link transform says it is not moving as expected, then something has gone wrong. This would be an important safety check.

My safety coordinator captures the following severity for each monitored subsystem:

  NORMAL,          ///< All systems operational
  WARNING,         ///< Minor issues detected, monitoring
  DEGRADED,        ///< Operating with reduced functionality
  EMERGENCY_STOP,  ///< Emergency stop active
  SYSTEM_SHUTDOWN  ///< Complete system shutdown required

So when one of my two IMUs that were working suddenly stops reporting correctly, the sensor is marked as DEGRADED.

Some of the other safety checks I’m going to implement soon include:

For the battery system.

There are 5 battery monitors: the battery itself, the 24v dc to dc converter, and also the 12, 5, and 3.3 volt dc to dc converters. Each is tagged as to whether it is a real battery on a power supply.

- If the real battery discharge velocity is too high, a WARNING will be signaled. I will have code that will read either the rosout logs or the SD card logs on occasion to look for trends that I need to pay attention to, such as sensors failing now and then, software not running at the expected rate, and so forth. One of the analyses will be whether too many devices are sucking too much power out of the battery, implying that the robot won’t be able to run as long as I expected between charges.

- Likewise, the current and voltage from each power supply is monitored. If the current approaches the design limit, that will signal a WARNING. If the voltage or current drops to an unexpected level, that will signal an EMERGENCY_STOP.

- The code will look for the battery voltage reaching a certain low level and send a signal to the behavior tree that the robot needs to get back to the charger. It will also signal a WARNING to the safety coordinator.

- The code will compute the discharge slope and predict when the battery will fail.

- The code will signal a SYSTEM_SHUTDOWN when the battery voltage is near the critical value and, when a second, lower critical voltage is detected, it will physically shut down various subsystems and eventually disconnect the battery.

- If the 24 volt power supply, used to drive the motors, is drawing too much current or if (eventually to be tested) the dc to dc converter is overheating or if certain, unrecoverable errors are detected by the RoboClaw device, the power to the RoboClaw device will be cycled off and on.

For the IMUs.

There are two IMUs on the robot chassis and one in the OAK-D camera.

- Currently, only one of the chassis IMUs is used by the EKF filter node. In the future, if one fails, it will signal a DEGRADED condition, and the other IMU will be hot-stopped into the EKF filter. If both fail, it will signal an EMERGENCY_STOP.

- All three IMUs will be sampled to look at the gravity vector. If the robot is tilted at least at some angle, a WARNING will be signaled that the robot is tipping over unexpectedly. If the robot tilts more, an EMERGENCY_STOP will be signaled. Finally, if the robot seems to have fallen over, a SYSTEM_SHUTDOWN will be signaled.

- The IMU z velocity and the odometry z velocity will be checked to see if the robot is spinning unexpectedly, signaling an EMERGENCY_STOP.

- If the chassis IMU acceleration exceeds some value. meaning the speeding up or tilting too fast, an EMERGENCY_STOP will be signaled. I might use an intermediate value to only signal a WARNING.

For the Time of Flight sensors.

There are 8 time of flight sensors mounted as pairs, pointing 90 degrees apart, at 4 equidistant points to look for obstacles. Each sensor, though, has a narrow cone of detection. The intention is to provide 3 rings of protection based on detected obstacle distance.

- There is a minimum obstacle distance configured into the Nav2 system. The robot should never get nearer than that distance to any obstacle. Everything beyond that distance is considered to be in the first ring of protection.

Unfortunately, the robot doesn’t have complete visibility to its surroundings. There is a LIDAR which can see several segments of the full 360 degrees at one height—the segmentation comes from pieces of the robot body getting in the way of the LIDAR beam itself. There is an OAK-D camera that is mounted about 4 feet above the top plate of the robot and it’s looking downward at an angle towards the front of the robot to look for obstacles around some piece of the area ahead of where the robot normally moves, as well as aiding in operating the gripper mechanism. There are eight time-of-flight sensors but they have a narrow cone of detection and low resolution. The result is that there are a lot of holes in what the robot can detect for obstacles around it. And, of course, there is the problem that each kind of sensor has its own limitations, like laser-based sensors do not detect panes of glass very well.

- If the robot stays farther away from any obstacle that what the Nav2 system considers safe, then things are fine. But the robot has momentum, the software, especially the software running on Linux, has poor guarantees for when it will even be able to deal with the sensors to compute the distance to any obstacle. Things can suddenly pop up in any sensor. The time-of-flight sensors might, say, suddenly see a box that fell onto the floor that was completely missed by the LIDAR. If an obstacle is detected by the time-of-flight sensors that is between two distances, that is it is closer than the minimum distance expected by the Nav2 system but only a little bit closer, the sensor will signal to the ROS 2 system to slow down the motors and the sensor will signal a WARNING. That corresponds to the second ring of protection.

- If the time-of-flight sensors see an obstacle even closer than the second ring of protection, something has really gone wrong and an EMERGENCY_STOP is signaled. That corresponds to the third ring of protection.

- I plan on also computing the velocity of change for detected obstacles and report that and using that velocity and, with the motor’s velocity, predict when a collision is likely. If a collision is going to happen faster than the loop frequency for the local planner, an EMERGENCY_STOP will also be signaled as it means the robot will collide quicker than the ROS 2 software will be able to react.

Other sensors.

I’ll stop here. The safety system for the RoboClaw itself is pretty complicated. And I haven’t decided what to do about failures related to the SD card which is used to log all the low-level happenings on the Teensy processor itself. I’m also not going to describe the safety system for the gripper mechanism with its various stepper and servo motors.

James H Phelan

unread,

Jan 2, 2026, 10:33:18 PM (6 days ago) Jan 2

to hbrob...@googlegroups.com

A "robot safety module" should include:

A big red "bop to stop" button.

A key fob remote kill switch to cut power to drive motors or perhaps everything.

Not sure about arms. A graceful degradation to a resting position?

An obnoxious beep like a backing garbage truck.

But obnoxious beeps are, in fact, obnoxious. And can get tuned out.

Perhaps better to activate in a warning status.

If the robot is too quiet, a synthesized engine noise to alert bystanders? A whistled tune? Fanfare?

I prefer the voice of C3PO saying "Excuse me, coming through, make way please, ..."

Or, "Pardon the intrusion, Michael, but I believe my battery is on fire!"

"Help! I'm stuck!" "I feel dizzy".

The possibilities are endless.

Flashing yellow lights. Though I would prefer something more aesthetic.

Green lights for normal condition, yellow lights for warning, red flashing lights for emergency?

A Pironi LED hat could have a pixel for each monitor. Blue inactive, teal initiating, green normal, yellow-green minor caution, yellow warning, orange near critical, red critical, red flashing emergency.

A glance would reveal the system's health.

Robotics is hard, but who says it can't be fun? And beautiful?

Wayne Gramlich

unread,

Jan 3, 2026, 1:18:11 AM (5 days ago) Jan 3

to 'James H Phelan' via HomeBrew Robotics Club, Wayne Gramlich

James: My $.02 are spliced in as comments below. -Wayne

On 1/2/26 19:33, 'James H Phelan' via HomeBrew Robotics Club wrote:

A "robot safety module" should include:

A big red "bop to stop" button.

You should not be constrained to 1 big red button. Most robots put the one red button
at the back of the robot where it can not be reached. Also, the hardware and
software needs to be able to figure out what triggered the E-stop before it can be cleared.
Seriously, consider making robot bumpers be equivalent to E-Stop. If there is one or more
lidars, it may make sense to object to any objects that penetrate a safety perimeter.

By the way, any robot subsystem should be be able to trigger the EStop.

A key fob remote kill switch to cut power to drive motors or perhaps everything.

Not sure about arms. A graceful degradation to a resting position?

If the arms have are carrying a caustic fluid in an open container, you do not want
the container to be dropped. Locking the arms is the best option.

An obnoxious beep like a backing garbage truck.

But obnoxious beeps are, in fact, obnoxious. And can get tuned out.

Perhaps better to activate in a warning status.

If the robot is too quiet, a synthesized engine noise to alert bystanders? A whistled tune? Fanfare?

I prefer the voice of C3PO saying "Excuse me, coming through, make way please, ..."

Or, "Pardon the intrusion, Michael, but I believe my battery is on fire!"

"Help! I'm stuck!" "I feel dizzy".

The possibilities are endless.

Flashing yellow lights. Though I would prefer something more aesthetic.

Green lights for normal condition, yellow lights for warning, red flashing lights for emergency?

A Pironi LED hat could have a pixel for each monitor. Blue inactive, teal initiating, green normal, yellow-green minor caution, yellow warning, orange near critical, red critical, red flashing emergency.
A glance would reveal the system's health.
Robotics is hard, but who says it can't be fun? And beautiful?

James
James H Phelan
"Nihil est sine ratione cur potius sit quam non sit"
Leibniz

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/7e1ab95a-f695-4418-9f81-875350c07975%40hal-pc.org.

Thomas Messerschmidt

unread,

Jan 3, 2026, 1:27:59 AM (5 days ago) Jan 3

to hbrob...@googlegroups.com, hbrob...@googlegroups.com

And perhaps a voice that says, “Danger Will Robinson!”

Thomas Messerschmidt

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jan 2, 2026, at 7:33 PM, 'James H Phelan' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

camp .

unread,

Jan 3, 2026, 2:11:17 AM (5 days ago) Jan 3

to 'James H Phelan' via HomeBrew Robotics Club, Wayne Gramlich

> Seriously, consider making robot bumpers be equivalent to E-Stop.

Bumpers are the sensor of last resort. To your point, Wayne, if you hit something with your bumper, navigation has failed. You are now in recovery mode.

- Camp

https://groups.google.com/d/msgid/hbrobotics/2c7f8b1c-f1a3-448e-a30b-194dcd3ceb11%40gmail.com

.

Thomas Messerschmidt

unread,

Jan 3, 2026, 3:14:57 PM (5 days ago) Jan 3

to hbrob...@googlegroups.com, Wayne Gramlich, 'James H Phelan' via HomeBrew Robotics Club

Just like humans. If you stub your toe on a table leg, your navigation has failed. You are in recovery mode (and you’re in a whole world of hurt!) Your trip to the fridge has been interrupted! At least a robot won’t need to wait for his throbbing toe to stop throbbing before continuing on. 😁

Thomas Messerschmidt

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jan 2, 2026, at 11:11 PM, camp . <ca...@camppeavy.com> wrote:

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/636966560.6530064.1767424268191%40mail.yahoo.com.

Craig Austin

unread,

Jan 4, 2026, 4:17:24 PM (4 days ago) Jan 4

to hbrob...@googlegroups.com

IMO, any robot large enough and/or powerful enough to damage a human ought to have a human proximity sensor to stop all movement when a human approaches the danger zone..

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/2c7f8b1c-f1a3-448e-a30b-194dcd3ceb11%40gmail.com.

James H Phelan

unread,

Jan 4, 2026, 4:59:24 PM (4 days ago) Jan 4

to hbrob...@googlegroups.com

Wayne et al.

If the power goes out, how do you freeze the arm?

James

James H Phelan
"Nihil est sine ratione cur potius sit quam non sit"
Leibniz

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/2c7f8b1c-f1a3-448e-a30b-194dcd3ceb11%40gmail.com.

Michael Wimble

unread,

Jan 4, 2026, 6:18:28 PM (4 days ago) Jan 4

to hbrob...@googlegroups.com

My robot is a personal assistant. Its main task will be to operate around me.

On Jan 4, 2026, at 1:17 PM, Craig Austin <agu...@gmail.com> wrote:

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/CAPtXVj833zGag0VxTsxLzSHdoUxGGPkWnySi2CR-OEU9NNEFQQ%40mail.gmail.com.

Michael Wimble

unread,

Jan 4, 2026, 6:22:27 PM (4 days ago) Jan 4

to hbrob...@googlegroups.com

I don’t have this solved everywhere on my robot yet. My next gripper will be using a work gear, I think, to move fingers and that should suffice. My elevator and extender, though, use stepper motors and when the power goes off the elevator crashes back the the floor. I suppose I could install a failsafe indent mechanism around the stepper gear

On Jan 4, 2026, at 1:59 PM, 'James H Phelan' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/197d0288-0c4b-46df-a418-d3af250a21f3%40hal-pc.org.

James H Phelan

unread,

Jan 5, 2026, 12:18:17 PM (3 days ago) Jan 5

to hbrob...@googlegroups.com

I had thought, and believed seeing it demonstrated, that if you connected the two poles of a DC motor together, that the electric force generated by turning the shaft would directly oppose the shaft turning thus locking it.

Theoretically, in case of power failure a DPDT cross-over relay could short the motor terminals and lock it in place.

However in testing it just now on a couple motors that does not seem to be true. So I asked GPT-4o to explain:

https://poe.com/s/XH4ZEk1YhpIZaUss6H0E

One motor's winding resistance is just under 1Kohm, another is 0.175Kohm. Both turn when shorted.

Of course, going from stop, the turning rate would be very slow.

James

James H Phelan
"Nihil est sine ratione cur potius sit quam non sit"
Leibniz

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/88E6CFDE-0633-4C67-851F-C372524DEDA8%40gmail.com.

Karim Virani

unread,

Jan 5, 2026, 3:04:50 PM (3 days ago) Jan 5

to hbrob...@googlegroups.com

The issue is with the word "lock". yes, this will resist motion and is a good way to cause the motors to brake from running at speed when you neither have regenerative braking nor the desire to use your battery to brake. but the feedback is lossy and it will not lock the motor, just resist motion. it can look like a very quick stop from high speeds (unlike coasting), but doesn't have hold-at-zero capability.

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/9dfd09ab-4b09-4db3-9cb5-b41a8d02ad25%40hal-pc.org.

Sergei Grichine

unread,

Jan 5, 2026, 8:04:17 PM (3 days ago) Jan 5

to hbrob...@googlegroups.com

Just a word of caution - the voltage generated by motors in jerky motions is not something you can easily ignore: https://groups.google.com/g/hbrobotics/c/o32oRUNHI0M/m/zDtrCNLnAwAJ

Best Regards,

-- Sergei

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/CAKtnkiyfKZhr%2BfOs0zXZKG%2BDHEfJD%3DzpRWsFQ4c_RU%2BX8jjV6Q%40mail.gmail.com.

Reply all

Reply to author

Forward