Multi-Robot ROS Infrastructure - Phase 1: It Actually Works

7 views

Skip to first unread message

Michael Wimble

unread,

Feb 17, 2026, 8:15:03 PM (5 days ago) Feb 17

to hbrob...@googlegroups.com

Alternate title: I Automated My Robot Fleet and You Can Too, if you have enough chocolate chip cookies to bribe me.

A few days ago I shared my frustrations about managing multiple computers, with or without multiple robots. TL;DR version: I was manually trying to

Keep cyclone DDS to stay alive on all machines and not flood the network with discovery. I sometimes could only see topics but not actually read topic messages.
Keep all the bash aliases working. I'm constantly twiddling with them and forgetting to move them to other machines. But I don't want all aliases on all machines.
Keep /etc/hosts up to date when new machines were added and removed.
Keep .ssh/config up to date

And a small number of similar issues, like the small issue of what packages I've manually added somewhere but don't remember if I need it on another machine. Every time I've had to rebuild because of NVME failures (yes, they fail), SD failures, experiments that cluttered up my machine like the pile of Sergei's I've-done-something-new-today e-mails, I found that keeping everything consistent, correct, and up to date was driving me crazy.

I'm adding a new machine to my current fleet of 3 MCUs, an AI camera and 2 CPUs on Sigyn. It's an NVIDIA Jetson Orin Nano which is going to do most of my next generation of visual AI. I'm also about to add another Raspberry Pi to specifically use April tags to dock my robot to an upcoming charging station. The similarities between Sigyn and the International Space Station are astounding!

Well... after another couple of long days of work and I'm kind of shocked by how well it works so far.

WHAT I BUILT

The system is called Sigyn2 (still not winning any creative naming awards). It's a YAML-based configuration management system specifically designed for multi-robot, multi-computer ROS fleets. Think "infrastructure as code" but actually useful for robotics. It's also a multi-repo distribution, like the Nav2 stack. It's really a click-once to install all the software you need to build one of my robots and then setup all of the CPUs to be fully configured and ready to, well, conquer the world, of course.

Here's what it manages RIGHT NOW (Phase 1 complete):

A FEW YAML FILES
- aliases, computers, cyclone_dds template, network, packages, robots.
- Automatically generates and deploys /etc/hosts to every machine
- Creates /etc/cyclone_dds to every machine with peer definitions and hardware port definition.
- Automatically manages ~/.ssh/config with convenient SSH aliases and the appropriate magic switches defined.
- Smart duplicate removal - removes old stuff in existing configuration files that is covered by the new configurations but won't break your existing, non-duplicate entries.
- Timestamped backups before every change

It does secure ssh communication between machines in a restricted manner (only certain operations are allowed) and doesn't need passwords. It looks at the configuration of every defined machine in a YAML file and builds configurations that know about IP addresses, aliases, machines that should be known but not auto-configured, how to set up ssh configuration, how to set up host definitions, how to create .bashrc and .bash_alias files that are specialized for the purpose and content of each machine. But wait, there's more!

BASH ENVIRONMENT MANAGEMENT
- Fleet-wide bash aliases deployed to ~/.bash_aliases
- Machine-specific environment files with correct ROS_DOMAIN_ID
- Each machine knows its workspace path, RMW implementation, and role
- PlatformIO build shortcuts for Teensy boards
- Auto builds certain aliases by discovery from the YAML files

ROS2 DDS CONFIGURATION
- Machine-specific CycloneDDS network interface configuration
- Proper peer discovery across the entire fleet
- Automatically deployed to /etc/cyclonedds.xml
- Each machine gets the right interface (WiFi vs Ethernet vs whatever)

ROBOT & COMPUTER DEFINITIONS
- robots.yaml defines complete robot systems with their components
- computers.yaml defines hardware platforms and capabilities
- packages.yaml maps package groups to git repositories (uhm, let's pretend that's working today)
- Each robot component knows what packages it needs
- Built-in support for vcstool multi-repo management
- Uses vcstool to manage all the git repos to pull together a complete robot assemblage of code.

VERSION TRACKING & CONSISTENCY
- MD5 hashing of all configuration files
- Detects when configs have changed (local vs repo)
- Fleet-wide consistency checking across all machines
- Scripts to deploy updates to entire fleet

THE PROBLEMS IT SOLVES

Remember all those pain points from my last email? Here's what's different now:

BEFORE: "I need to add a new machine to my network". This involved lots of logging in to setup hosts, setup DDS peers, build authorized_users, update the .ssh/config, check to see if the proper kind of ssh keys were setup for use with gitub, make sure that if platform_io was used that a bunch of things were set up, and more.

AFTER:
- Add 5 lines to network.yaml
- Run: ./scripts/deploy_config.sh --deploy
- Done. Every machine updated. Takes a few seconds to deploy. Everyone is talking nicely to each other. No one has to wear a dunce hat. No one is sitting on a chair facing a corner.

THE "HOLY CRAP IT WORKS" MOMENT

I intentionally deleted /etc/cyclone_dds and removed entries from /etc/hosts on my Jetson. I wanted to see if the scripts would install the needed file and remove duplicate old entries when it installed the correct entries. I ran the deploy script and all machines were checked for consistency. The Jetson got the missing a file, and the hosts file removed the leftover lines and replaced it will correct lines--everything was fixed and I got a report that everything was consistent across all machines.

WHAT'S COMING NEXT (MAYBE)

I'm figuring out Phase 2, and honestly it depends on what pain points hit me next. Top candidates:

- System package tracking (capture what apt packages each machine needs)
- All the fiddly things I have to do, like turning off the <insert curse word here> UI feature that pins windows to the sides or screens or makes them take over the screen if you drag a window to the edge. There's a handful of things I need to change when I set up a new machine.
- Docker integration (yes, I'm circling back to that special hell). It will be absolutely needed for the Jetson. I may also need it for the OAK-D camera as the latest APIs apparently need ROS2/Kilted to work, so I may have a mixture of deployed ROS2 releases, if the API version keys match. Otherwise I may have to bump everything up to Kilted. But, hey, I've got a system that will manage all of that for me.
- Platform-specific setup (Mac Silicon documentation - because that's its own adventure). Even using Parallels VM management and Rosetta 2 for Intel emulation, ROS2/Jazzy and Gazebo do not, out of the box, play nice with Apple Silicon. I've got way too many notes, some of which are still accurate, to try to replicate what I did to make it work. And don't get me going about how LIDAR simulation on a Mac needs special handling.
- Udev rules management (USB device configuration deployment). Each machine has it's own udev needs, but udev rules can often be shared between machines, sometimes with slight modifications.
- Proof of concept with actual ROS packages--I've started partitioning Sigyn so that it can be mostly reused with my new squirrel-inator robot (to be named Titania). I've got a small proof of concept working already

But I'm not in a hurry. Phase 1 is already saving me a lot of grief. It's already found a handful of configuration errors I wouldn't have ever know about.

THE TECH DETAILS (For The Curious)

- Pure Python 3 with PyYAML (no exotic dependencies)
- YAML configurations (human-readable, git-friendly)
- Marker-based updates (SIGYN markers in managed files)
- Idempotent operations (safe to run multiple times)
- Git-based single source of truth
- Works with standard ROS2 tools (vcstool, rosdep, colcon)
- Passwordless sudo for deployment (one-time setup)

It's not fancy. It's not clever. It just solves real problems.

LESSONS LEARNED

1. YAML is your friend. It's readable, it's in git, it's easy to diff.

2. Marker-based updates (like "# BEGIN SIGYN" / "# END SIGYN" in files) let you manage sections without clobbering manual edits elsewhere.

3. Backup EVERYTHING before modifying it. With timestamps. You'll need it.

4. Idempotent operations are crucial. If running a command twice does the same thing as running it once, you can be much more aggressive about automation.

5. Change detection (MD5 hashes) is surprisingly effective. Just hash the YAML files and track what's been applied.

6. Passwordless sudo for specific commands (via /etc/sudoers.d/) makes fleet deployment actually usable.

7. Don't try to solve everything at once. Phase 1 is already incredibly valuable.

Remember: "Everything about robots is hard" (TM), but we can make some things easier.

- Mike

P.S. - Yes, there's an AI_CONTEXT.md file in the repo. When you're working with Claude or ChatGPT and come back to the project later, they can read it and immediately understand the entire context. And the AIs all keep it up to date. This has been absurdly useful. Robots + AI assistants: it's the future, probably. Well, until tomorrow.

Reply all

Reply to author

Forward

0 new messages