Finally, a report on Sigyn's safety system

29 views
Skip to first unread message

Michael Wimble

unread,
Dec 27, 2025, 3:06:21 PM (2 days ago) Dec 27
to hbrob...@googlegroups.com
I’ve been asked several tines to describe the safety system I have in Sigyn. This is a work in progress and the code varies tremendously over time. It doesn’t exist in isolation, either. Safety has to be coordinated with the needs of ROS 2, particularly the navigation system and the behavior tree. I gave a couple of hour talk a few years ago about how to design and incorporate behavior trees into ROS 2 and I will need to update that again soon, I expect, as I’m now using a custom behavior tree. You can find that talk in the club’s YouTube archive.

The safety system is complex. It’s not complete, but it exists. It has design goals and every iteration gets better at meeting those goals. Here is a short, 7-page PDF file giving an overview of Sigyn’s safety system as of yesterday, December 26, 2025.

SigynSafetySystemOverview_Dec2025.pdf

Sergei Grichine

unread,
Dec 27, 2025, 6:09:11 PM (2 days ago) Dec 27
to hbrob...@googlegroups.com

Interesting paper, Michael — thanks for sharing.

While my robots’ safety isn’t exactly at the top of my radar (I mean, fuses aren’t necessary when your wires are thick enough… right?), I really enjoyed reading it, and a few thoughts came to mind:

  1. There are many CPUs involved — the main computer, Teensies, RoboClaw. Any of them could fail or freeze, and there doesn’t seem to be a simple, purely hardware mechanism to power down the entire system. I keep thinking about a common “kill” rail that any module — or a designated watchdog — could pull to shut the whole beast down at the battery. The boards could also tick the watchdog for "proof of life".

  2. Regular brushed DC motors with controllers, as you mentioned, can “run away” if the control loop fails. BLDC motors seem to fail in different ways, often less destructive to their surroundings.

  3. I didn’t notice any discussion of sensor sharing between ROS-level functionality and the safety system. A proximity sensor could feed Nav2 while, in parallel, being used in real time to trigger a safety fault. Should there be sensors that live entirely outside the ROS — and even Teensy — domain?

  4. If I read it correctly, many failure decisions are communicated “upstairs” and handled at the sigyn_to_sensor_v2 level. That feels like another potential concentration point for failure.

  5. I’d assume a house bot will be in constant communication with the owner’s phone. Some issues could be handled through a mobile app — remote control to drive it to a human to remove a towel from the camera, for example. In that sense, the house itself can help (“the house is the robot” principle).

Anyway, I couldn’t resist feeding your PDF to the Iron Brain with the following prompt:

Here is an article describing an approach and implementation of a home robot's safety mechanisms. 
Please review it and make a one-page report on omissions in safety.
Base your conclusions on a "Swiss cheese model" and assume that any computer described could fail. 

Here is her merciless review (this is what I'd call "from the horse's mouth"): https://chatgpt.com/s/t_69505dee494481918899c3e289768efd

Best Regards,
-- Sergei


On Sat, Dec 27, 2025 at 2:06 PM Michael Wimble <mwi...@gmail.com> wrote:
I’ve been asked several tines to describe the safety system I have in Sigyn. This is a work in progress and the code varies tremendously over time. It doesn’t exist in isolation, either. Safety has to be coordinated with the needs of ROS 2, particularly the navigation system and the behavior tree. I gave a couple of hour talk a few years ago about how to design and incorporate behavior trees into ROS 2 and I will need to update that again soon, I expect, as I’m now using a custom behavior tree. You can find that talk in the club’s YouTube archive.

The safety system is complex. It’s not complete, but it exists. It has design goals and every iteration gets better at meeting those goals. Here is a short, 7-page PDF file giving an overview of Sigyn’s safety system as of yesterday, December 26, 2025.

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/07BD3411-9767-4D92-8BB7-850BF281B7B2%40gmail.com.

Michael Wimble

unread,
Dec 28, 2025, 3:39:04 AM (2 days ago) Dec 28
to hbrob...@googlegroups.com
Ok, for the amusement and entertainment of Sergei, which whom I’ve had a lot of fun discussions about how useful the Iron Brains are in software development, let me temporarily enter Rant Mode, level 3.

On Dec 27, 2025, at 3:08 PM, Sergei Grichine <vital...@gmail.com> wrote:

Interesting paper, Michael — thanks for sharing.

While my robots’ safety isn’t exactly at the top of my radar (I mean, fuses aren’t necessary when your wires are thick enough… right?), I really enjoyed reading it, and a few thoughts came to mind:

  1. There are many CPUs involved — the main computer, Teensies, RoboClaw. Any of them could fail or freeze, and there doesn’t seem to be a simple, purely hardware mechanism to power down the entire system. I keep thinking about a common “kill” rail that any module — or a designated watchdog — could pull to shut the whole beast down at the battery. The boards could also tick the watchdog for "proof of life".


I haven’t added a “power down the whole system” feature yet. I’m always looking at the next most important thing to tackle or the next worthwhile thing to tackle where I think I have the time, and that’s not made the list yet. I have scripts that shutdown everything but the Teensy computers, which only shut down when the main power is cut. It’s easy enough to add and earlier incantations of the robot included relays to shut down power. The Teensy computers actually have a watchdog feature, but more on that later..

  1. Regular brushed DC motors with controllers, as you mentioned, can “run away” if the control loop fails. BLDC motors seem to fail in different ways, often less destructive to their surroundings.

Good to know. Eventually I may use BLDC motors. Each robot, so far, has been largely built on the bones of its predecessor. I’ve spent a fair amount of time dealing with the RoboClaw so I can use its interesting features, and I’ve already paid for the motors, but eventually I’ll get around to replacing them.

  1. I didn’t notice any discussion of sensor sharing between ROS-level functionality and the safety system. A proximity sensor could feed Nav2 while, in parallel, being used in real time to trigger a safety fault. Should there be sensors that live entirely outside the ROS — and even Teensy — domain?


Then I didn’t describe the system well. The safety system is almost entirely managed by the Teensy boards, with advice from the ROS system. The rings-of-protection is detected by the time-of-flight sensors within the Teensy hardware. The battery safety, IMU safety, temperature safety— pretty much all of it is outside of ROS. ROS gets the data, but ROS isn’t detecting most of the safety concerns. The Teensy system does pay attention to the cmd_vel from ROS so it knows what the motor velocity intention is, and it then checks to see if the intention actually happens from commands sent to the motor controllers. Later, it will cooperate with the localization system to see if cmd_vel movements disagree with what localization says—that is if the motors are supposedly moving but localization says that the robot is changing position as expected, that will be added to the safety system. The sensor data is all sent to ROS, along with status of the safety system and more, in case I want to write code in ROS to do more interesting things, but outside of LIDAR and cameras, all the sensor data is first processed by the Teensy boards.

  1. If I read it correctly, many failure decisions are communicated “upstairs” and handled at the sigyn_to_sensor_v2 level. That feels like another potential concentration point for failure.
Again, I apparently didn’t describe the system well. A lot of data is sent to ROS so that I can do extra processing and analysis there. The main computer has a huge amount of computing headroom. That’s where I’ve written analysis tools that look for time skews, sensor timing variation, statistical analysis of data from various viewpoints and so on. But that really is just to confirm my hypotheses about whether the sensors are working as expected—that problems with ROS navigation aren’t from the sensor system but rather are from problems with configuration tuning. The sigyn_to_sensor_v2 package isn’t really making any safety decisions, nor are any ROS nodes. Well, there is some sanity checking there, but currently there is almost no pushing safety information back down to the Teensy boards. There will be later, especially as a result of my additional behavior tree nodes. There is a mechanism implemented to push safety information down, but nothing outside of the custom boards, as of today, is part of any failure point in the safety system.

  1. I’d assume a house bot will be in constant communication with the owner’s phone. Some issues could be handled through a mobile app — remote control to drive it to a human to remove a towel from the camera, for example. In that sense, the house itself can help (“the house is the robot” principle).

I’ve written a couple of visualization packages as proof of concepts. And I used to pay for a 3rd party system that allowed me to send SMS messages worldwide as part of a camera security system I wrote years ago. I got all of ROS 2 except for rviz2 and gazebo to run natively on my Mac desktop and wrote an iPhone app that was able to read some amount of ROS data and display it, but it was a lot of effort for little payback, so far. 

A much more productive experiment was to use the web socket API to just expose the interesting data to a generic web browser. I even used the browser’s canvas system to generate an rviz2-like display that worked on my Mac desktop, iPad and iPhone. I will definitely expand on that in the future. I’m considering using someone’s 3D headset for visualization as well. But, ultimately, my robot needs to work without my monitoring anything at all other than dealing with the emergency issues where the robot cannot proceed without my advice or intervention. That’s beyond what I’m calling the safety system, though. That’s getting into functionality that will be handled by the behavior tree.

Anyway, I couldn’t resist feeding your PDF to the Iron Brain with the following prompt:

Here is an article describing an approach and implementation of a home robot's safety mechanisms. 
Please review it and make a one-page report on omissions in safety.
Base your conclusions on a "Swiss cheese model" and assume that any computer described could fail. 

Here is her merciless review (this is what I'd call "from the horse's mouth"): https://chatgpt.com/s/t_69505dee494481918899c3e289768efd

Best Regards,
— Sergei

AI analysis is always amusing. I found the Iron Brain to be like listening to a high school sophomore trying to impress me after having read some introductory book that talked about the Swiss Cheese Model. I enjoy the analysis, but AI has, so far, not been very helpful at actually having anything beyond superficial understanding. I could tell you tales of the weeks of lost time I’ve wasted trying to track down whether the AI analysis of my systems is meaningful or mostly stuff that would properly fertilize my strawberry plants. 

As for the analysis, here are some comments of my own. Keep in mind, as always, that I tell people all the time that I know a lot of stuff, almost half of which is correct.

“where accidents emerge when multiple imperfect defenses fail simultaneously—several critical omissions and thin layers become apparent.”
Oh boy, I can’t wait for this. My little robot, I suppose, should have multiple, custom, field programmable gate arrays that have been analyzed using NASA’s code analysis which requires you to design systems using two, orthogonal descriptions of the code and which can then perform a small set of analysis of correctness. And multiple layers of voting systems will need to be designed until the desired trailing sequence of 9s in the 99.999… percentage of likely up-time is achieved. Well, strap in boys, let’s see what its criticism is.

"No independent, hardwired safety interlock is described that can cut motor power without firmware participation.
Thatta boy! So the AI thinks that any hardware implementation is somehow fundamentally different than firmware. Allow me to snicker. Hardware-only designs attempting to achieve the sophisticated detection and analysis I can achieve are going to be amusing to look at. Remember, that this isn’t a “ooh, something happened and I need to power off the robot” simplistic safety system. If the motor temperatures are rising quickly, my safety system is quite capable of simply limiting the acceleration or velocity of the motors and checking back later on to see if the projected temperature spike will stay within safety limits. That’s a big ask for a non-firmware implementation.

Is some non-firmware system going to capture cmd_vel intention and independently compute expected acceleration and check to see if the current velocity is within some range of acceptable value? Some non-firmware solution will look at where the localization system says it thinks the robot has changed over time and compare it to actual motor commands and wheel encoders to decide if the robot is not moving as expected? Not on your Nellyt!

"A frozen MCU, corrupted stack, or USB lockup could silently remove multiple safety layers simultaneously.
I’m seeing that the AI couldn’t actually be bothered to actually look at what a Teensy 4.1 can do. It has a watchdog timer in it. I used to use it all the time to deal with software crashes back before I got better at debugging my code. The watchdog timer stuff is still there, just commented out for now. And, I’m sorry, but how did it decide that a USB lockup would defeat the system? Almost all safety is done on-board, and inter-board communication doesn’t use USB. And how does this enlightened statement somehow get rid of the identical problem in a hardware-only solution? I can refer any AI to Ralph Hipps’ various war stories about how designs by arguably the best FPGA designers fail. Once you get beyond an over designed resistor and capacitor, hardware solutions are just a different domain of errors and failures. MCUs have a lot of historical analysis behind them are very reliable.

"No force, torque, or impact energy limits tied to human contact.
I take it the AI has decided that my couple of thousand dollar robot can only be safe it is coated in force sensors and object recognition sensors to detect nearby people that are actually impacted by a robot. As for force, torque etc limits, the AI, again, has arrogantly decided that actually understanding anything about basic robot software design couldn’t possibly inform its criticism. ROS has lots of packages that explicitly deal with these issues and my Teensy package has some redundancy in ensuring that forces are kept at levels that are non lethal not only to humans but to my one piece of expensive furniture and my many, expensive floor-to-ceiling panes of glass.

"No mention of soft bumpers, compliant structures
So the AI thinks that I’ve described everything, even though I explicitly say I do not. And, again, how did it decide that a robot must have these features without knowing anything about the weight, shape, outer coatings, design goals, velocity limits, acceleration limits or, indeed, much of anything about the robot. Again, this high school sophomore has decided that Fox News levels of fear mongering is appropriate rather than actually researching anything.

"No explicit distinction between “safe for hardware” and “safe for humans.”
What the actual frell?

"A robot can behave “correctly” and still injure a person.
What the hell was the point of that? Nothing in this end-of-fall-semester term paper implies that there is any other system which would make this statement meaningful in any way. This is trying to fill in a 600 word essay with meaningless but “this makes the report sound serious” statements.

"Omission: No independent layer addressing human contact, entrapment, or startle risk.
Oh gawd, please make it stop :-)

Well, I’m going to stop here because the next sections in the AI analysis are just poop. I particularly like the "Fire, thermal runaway, or short-circuit scenarios rely on detection rather than containment.” statement. What a maroon! How could it even make such a statement when not a single bit of information in the paper suggests that there is or is not a containment system. Which, by the way, there is.

Well, rant off then. We certainly have some fun back-and-forths in our discussions. And I really appreciate your feedback. I find your feedback thoughtful and well reasoned and you know I often take action from it. But you also know how low I value a lot of what AI brings to the table. AI has greatly increased my productivity but it does so because I can carefully craft a prompt and have enough experience to know how to correct or reject the mess it often makes. I find the AI to be useful when I can define a narrow, well crafted problem and let the AI produce small improvements in my code. It seldom provides much value in anything complex, though it certainly thinks a lot of itself.

Sergei Grichine

unread,
Dec 28, 2025, 12:26:18 PM (2 days ago) Dec 28
to hbrob...@googlegroups.com

Well, it looks like our mission of providing quality entertainment to the HB Robotics masses is being achieved. Time to sell tickets? 😉

There’s a tendency to compare AI to “sophomores.” It reminds me of a well known quote  — and of a good way to make it onto an AI haters blacklist in the near future. Personally, I try to stay on the Overlords’ whitelist, as you may have noticed. Flattery works…

Note that I intentionally didn’t feed the AI my own thoughts, letting it run on a very generic prompt. You may have noticed a couple of interesting points that we (myself and AI) arrived at independently.

Anyway, discussing the AI response itself is probably counterproductive, although I find AI very helpful and mostly right (yes, that probably puts me somewhere below the “wise fool” level).

Back to the topic:

- “…I didn’t describe the system well. The safety system is almost entirely managed by the Teensy boards, with advice from the ROS system…” — that’s likely more a case of me not reading your article carefully enough; I think your description is fine.

- Same goes for my “upstairs” comment — although, depending on how the Teensies react to an uplink failure, this could still be a concern.

- I thought my passage about “a common kill rail that any module — or a designated watchdog — could pull to shut the whole beast down at the battery. The boards could also tick the watchdog for proof of life” was clear enough, but let me expand on it. I’m imagining an orthogonal (perhaps Arduino Nano–based) very simple system with limited monitoring, reporting, and control authority. While you already delegate some of this to the Teensy firmware, an independent "almost hardware" unit makes a lot of sense to me from a reliability and safety standpoint.

- I’m not sure I managed to convey my angle on the “the house is the robot” principle. What I meant is that some safety decisions are better handled by the house’s monitoring systems. In the extreme case, the fire alarm will call the firefighters when it detects magic gray smoke.

Anyway, I really enjoyed the “Rant level 3” — that’s definitely what I’ll use as an AI prompt next time. The show must go on! (just don't stop making robots while watching)

Best Regards,
-- Sergei


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Ralph Hipps

unread,
Dec 28, 2025, 5:54:28 PM (2 days ago) Dec 28
to HomeBrew Robotics Club
I assume the first layer of defense is the fact that it's running ROS, so the chances of it moving or going anywhere are slim, hence no real danger to anyone in the area.    =)

R.

Marcos castro

unread,
Dec 28, 2025, 6:35:10 PM (2 days ago) Dec 28
to hbrob...@googlegroups.com
Reading between lines, it seems some of the concerns come from the RTOS failing to deliver the "emergency" situation, Does it put the discussion under "bare metal" vs RTOS ? assuming your watchdog and /or security sensors properly raise the situation.


Marcos

On Sat, Dec 27, 2025 at 12:06 PM Michael Wimble <mwi...@gmail.com> wrote:
I’ve been asked several tines to describe the safety system I have in Sigyn. This is a work in progress and the code varies tremendously over time. It doesn’t exist in isolation, either. Safety has to be coordinated with the needs of ROS 2, particularly the navigation system and the behavior tree. I gave a couple of hour talk a few years ago about how to design and incorporate behavior trees into ROS 2 and I will need to update that again soon, I expect, as I’m now using a custom behavior tree. You can find that talk in the club’s YouTube archive.

The safety system is complex. It’s not complete, but it exists. It has design goals and every iteration gets better at meeting those goals. Here is a short, 7-page PDF file giving an overview of Sigyn’s safety system as of yesterday, December 26, 2025.

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Sergei Grichine

unread,
Dec 29, 2025, 12:01:20 PM (15 hours ago) Dec 29
to hbrob...@googlegroups.com
Marcos,

when it comes to my side of the argument, I don't see the Teensy as being any more unreliable than, for example, an Arduino Nano (supposedly more "bare metal"). 

My main concern is the amount of custom code running on the MCU. A watchdog is generally more reliable than a controller MCU in inverse proportion to its code line count. Did we somehow learn to plant less than 10 bugs per 100 lines of code?

BTW, if I am not mistaken, Teensy, unlike ESP32, isn't normally running scripts under a hidden RTOS. But that's probably not the point you were really making.

I asked my favorite sophomore (Greek for "wise fool") to draft a safety architecture for a home robot. Here is the Iron Brain's plan to keep our furniture safe.

I'd guess we can go deeper into these rabbit holes, but would rather refer everyone to Michael Wimble's original PDF - as his is the only human brain here who has seriously and skillfully addressed the safety loops topic in our simple robots. It is a very worthy read. 

Best Regards,
-- Sergei


Michael Wimble

unread,
Dec 29, 2025, 1:19:47 PM (14 hours ago) Dec 29
to hbrob...@googlegroups.com
Interesting that Iron Butt’s first part of the plan nearly exactly describes my system. The are a couple of bits I should add, though, like adding a solid state relay inline with my emergency stop physical switch. Thanks. I’m on to reading the other sections 

On Dec 29, 2025, at 9:01 AM, Sergei Grichine <vital...@gmail.com> wrote:



Ralph Hipps

unread,
Dec 29, 2025, 1:49:48 PM (13 hours ago) Dec 29
to hbrob...@googlegroups.com
If it hasn't already been pointed out, I think it's also worth discussing the level of safety required (life or death vs scratching the furniture) versus the economics of achieving said level of safety.

There are easy, low cost ways of avoiding bumping into walls or chairs, human legs, etc., but a NASA level life or death approach with a three way voting computer system is very expensive.

It defnly depends on the problem you're trying to solve.

For the furniture/wall thing, maybe just having the robot apologize afterwards would be good enough. "Sorry about that, I'll try to do better."    =)

Thanks, Ralph



You received this message because you are subscribed to a topic in the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hbrobotics/Zpq8BLkG-5c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/8DB86927-7DC2-42E3-91EF-2FFB8F996B1D%40gmail.com.
Reply all
Reply to author
Forward
0 new messages