Alternate title: I Automated My Robot Fleet and You Can Too, if
you have enough chocolate chip cookies to bribe me.
A few days ago I shared my frustrations about managing multiple
computers, with or without multiple robots. TL;DR version: I was
manually trying to
And a small number of similar issues, like the small issue of what packages I've manually added somewhere but don't remember if I need it on another machine. Every time I've had to rebuild because of NVME failures (yes, they fail), SD failures, experiments that cluttered up my machine like the pile of Sergei's I've-done-something-new-today e-mails, I found that keeping everything consistent, correct, and up to date was driving me crazy.
I'm adding a new machine to my current fleet of 3 MCUs, an AI camera and 2 CPUs on Sigyn. It's an NVIDIA Jetson Orin Nano which is going to do most of my next generation of visual AI. I'm also about to add another Raspberry Pi to specifically use April tags to dock my robot to an upcoming charging station. The similarities between Sigyn and the International Space Station are astounding!
Well... after another couple of long days of work and I'm kind of
shocked by how well it works so far.
WHAT I BUILT
The system is called Sigyn2 (still not winning any creative naming
awards). It's a YAML-based configuration management system
specifically designed for multi-robot, multi-computer ROS fleets.
Think "infrastructure as code" but actually useful for robotics.
It's also a multi-repo distribution, like the Nav2 stack. It's
really a click-once to install all the software you need to build
one of my robots and then setup all of the CPUs to be fully
configured and ready to, well, conquer the world, of course.
Here's what it manages RIGHT NOW (Phase 1 complete):
A FEW YAML FILES
- aliases, computers, cyclone_dds template, network, packages,
robots.
- Automatically generates and deploys /etc/hosts to every machine
- Creates /etc/cyclone_dds to every machine with peer definitions
and hardware port definition.
- Automatically manages ~/.ssh/config with convenient SSH aliases
and the appropriate magic switches defined.
- Smart duplicate removal - removes old stuff in existing
configuration files that is covered by the new configurations but
won't break your existing, non-duplicate entries.
- Timestamped backups before every change
It does secure ssh communication between machines in a restricted manner (only certain operations are allowed) and doesn't need passwords. It looks at the configuration of every defined machine in a YAML file and builds configurations that know about IP addresses, aliases, machines that should be known but not auto-configured, how to set up ssh configuration, how to set up host definitions, how to create .bashrc and .bash_alias files that are specialized for the purpose and content of each machine. But wait, there's more!
BASH ENVIRONMENT MANAGEMENT
- Fleet-wide bash aliases deployed to ~/.bash_aliases
- Machine-specific environment files with correct ROS_DOMAIN_ID
- Each machine knows its workspace path, RMW implementation, and
role
- PlatformIO build shortcuts for Teensy boards
- Auto builds certain aliases by discovery from the YAML files
ROS2 DDS CONFIGURATION
- Machine-specific CycloneDDS network interface configuration
- Proper peer discovery across the entire fleet
- Automatically deployed to /etc/cyclonedds.xml
- Each machine gets the right interface (WiFi vs Ethernet vs
whatever)
ROBOT & COMPUTER DEFINITIONS
- robots.yaml defines complete robot systems with their components
- computers.yaml defines hardware platforms and capabilities
- packages.yaml maps package groups to git repositories (uhm,
let's pretend that's working today)
- Each robot component knows what packages it needs
- Built-in support for vcstool multi-repo management
- Uses vcstool to manage all the git repos to pull together a
complete robot assemblage of code.
VERSION TRACKING & CONSISTENCY
- MD5 hashing of all configuration files
- Detects when configs have changed (local vs repo)
- Fleet-wide consistency checking across all machines
- Scripts to deploy updates to entire fleet
THE PROBLEMS IT SOLVES
Remember all those pain points from my last email? Here's what's
different now:
BEFORE: "I need to add a new machine to my network". This involved
lots of logging in to setup hosts, setup DDS peers, build
authorized_users, update the .ssh/config, check to see if the
proper kind of ssh keys were setup for use with gitub, make sure
that if platform_io was used that a bunch of things were set up,
and more.
AFTER:
- Add 5 lines to network.yaml
- Run: ./scripts/deploy_config.sh --deploy
- Done. Every machine updated. Takes a few seconds to deploy.
Everyone is talking nicely to each other. No one has to wear a
dunce hat. No one is sitting on a chair facing a corner.
THE "HOLY CRAP IT WORKS" MOMENT
I intentionally deleted /etc/cyclone_dds and removed entries from
/etc/hosts on my Jetson. I wanted to see if the scripts would
install the needed file and remove duplicate old entries when it
installed the correct entries. I ran the deploy script and all
machines were checked for consistency. The Jetson got the missing
a file, and the hosts file removed the leftover lines and replaced
it will correct lines--everything was fixed and I got a report
that everything was consistent across all machines.
WHAT'S COMING NEXT (MAYBE)
I'm figuring out Phase 2, and honestly it depends on what pain
points hit me next. Top candidates:
- System package tracking (capture what apt packages each machine
needs)
- All the fiddly things I have to do, like turning off the
<insert curse word here> UI feature that pins windows to the
sides or screens or makes them take over the screen if you drag a
window to the edge. There's a handful of things I need to change
when I set up a new machine.
- Docker integration (yes, I'm circling back to that special
hell). It will be absolutely needed for the Jetson. I may also
need it for the OAK-D camera as the latest APIs apparently need
ROS2/Kilted to work, so I may have a mixture of deployed ROS2
releases, if the API version keys match. Otherwise I may have to
bump everything up to Kilted. But, hey, I've got a system that
will manage all of that for me.
- Platform-specific setup (Mac Silicon documentation - because
that's its own adventure). Even using Parallels VM management and
Rosetta 2 for Intel emulation, ROS2/Jazzy and Gazebo do not, out
of the box, play nice with Apple Silicon. I've got way too many
notes, some of which are still accurate, to try to replicate what
I did to make it work. And don't get me going about how LIDAR
simulation on a Mac needs special handling.
- Udev rules management (USB device configuration deployment).
Each machine has it's own udev needs, but udev rules can often be
shared between machines, sometimes with slight modifications.
- Proof of concept with actual ROS packages--I've started
partitioning Sigyn so that it can be mostly reused with my new
squirrel-inator robot (to be named Titania). I've got a small
proof of concept working already
But I'm not in a hurry. Phase 1 is already saving me a lot of
grief. It's already found a handful of configuration errors I
wouldn't have ever know about.
THE TECH DETAILS (For The Curious)
- Pure Python 3 with PyYAML (no exotic dependencies)
- YAML configurations (human-readable, git-friendly)
- Marker-based updates (SIGYN markers in managed files)
- Idempotent operations (safe to run multiple times)
- Git-based single source of truth
- Works with standard ROS2 tools (vcstool, rosdep, colcon)
- Passwordless sudo for deployment (one-time setup)
It's not fancy. It's not clever. It just solves real problems.
LESSONS LEARNED
1. YAML is your friend. It's readable, it's in git, it's easy to
diff.
2. Marker-based updates (like "# BEGIN SIGYN" / "# END SIGYN" in
files) let you manage sections without clobbering manual edits
elsewhere.
3. Backup EVERYTHING before modifying it. With timestamps. You'll
need it.
4. Idempotent operations are crucial. If running a command twice
does the same thing as running it once, you can be much more
aggressive about automation.
5. Change detection (MD5 hashes) is surprisingly effective. Just
hash the YAML files and track what's been applied.
6. Passwordless sudo for specific commands (via /etc/sudoers.d/)
makes fleet deployment actually usable.
7. Don't try to solve everything at once. Phase 1 is already
incredibly valuable.
Remember: "Everything about robots is hard" (TM), but we can make
some things easier.
- Mike
P.S. - Yes, there's an AI_CONTEXT.md file in the repo. When you're
working with Claude or ChatGPT and come back to the project later,
they can read it and immediately understand the entire context.
And the AIs all keep it up to date. This has been absurdly useful.
Robots + AI assistants: it's the future, probably. Well, until
tomorrow.