ZeroTier route generation from IPAM child subnets — PoC and discussion

11 views
Skip to first unread message

Mohammed Ismail

unread,
Apr 16, 2026, 5:55:43 PM (9 days ago) Apr 16
to open...@googlegroups.com
## Subject: ZeroTier route generation from IPAM child subnets — PoC and discussion

Hi everyone,

We run a multi-tenant WiFi management platform on top of OpenWISP with a growing ZeroTier deployment. We ran into multicast scaling issues with a flat `/8` route and built a working PoC — sharing it here to see if there's interest in upstreaming something like this.

### The Problem

The ZeroTier backend currently generates a single route from the VPN's IPAM subnet:

```python
config["routes"] = [{"target": str(self.subnet), "via": ""}]
```

ZeroTier uses ARP emulation — it converts ARP broadcasts into targeted multicast groups, one per IPv4 address. This works well, but the **Multicast Recipient Limit** (default 32) means that in a large flat network, multicast delivery is capped. With a `/8` route and all members in one broadcast domain, the multicast group for any given IP can't reach all potential peers when the network exceeds the recipient limit. This causes intermittent connectivity issues as ARP responses don't reliably reach all members.

Per-server `/24` routes reduce the L2 scope per subnet. With ~250 members per `/24` instead of the entire network in one flat `/8`, the multicast groups stay within manageable bounds and ARP resolution works reliably.

Additionally:
- `vpn_backends.py` removes `routes` from the ZeroTier JSON schema (no UI exposure)
- `_get_repsonse()` strips `routes` and `ipAssignmentPools` from ZT API responses (no observability)

### What We Did — PoC on zerotier_service.py

We modified `_add_routes_and_ip_assignment()` to generate routes from IPAM child subnets instead of just the parent. Here's the diff:

```diff
-        Adds ZeroTier network routes
-        and IP assignmentpools through OpenWISP subnet
+        Adds ZeroTier network routes from IPAM child subnets (if any)
+        and IP assignment pools through OpenWISP subnet.
+
+        If the VPN's IPAM subnet has child subnets, generates routes for
+        both the parent (cross-subnet routing) and each child (ARP isolation).
+        Otherwise falls back to single route from parent subnet.

-        config["routes"] = [{"target": str(self.subnet), "via": ""}]
+        try:
+            from openwisp_ipam.models import Subnet
+            parent = Subnet.objects.filter(subnet=str(self.subnet)).first()
+            if parent:
+                children = list(parent.child_subnet_set.only('subnet'))
+                if children:
+                    # Parent route ensures cross-subnet reachability
+                    # Child routes (more specific) provide ARP isolation
+                    config["routes"] = [
+                        {"target": str(self.subnet), "via": ""}
+                    ] + [
+                        {"target": str(child.subnet), "via": ""}
+                        for child in children
+                    ]
+                else:
+                    config["routes"] = [{"target": str(self.subnet), "via": ""}]
+            else:
+                config["routes"] = [{"target": str(self.subnet), "via": ""}]
+        except Exception:
+            # Fallback: original behavior if anything fails
+            config["routes"] = [{"target": str(self.subnet), "via": ""}]
```

This gives us a parent `/8` route (cross-subnet reachability) plus per-server `/24` child routes.

**Important caveat about L2 isolation**: The parent `/8` route with `via: ""` (directly attached) means devices still consider the entire `/8` as L2-local. For cross-subnet traffic, devices would still ARP through the flat `/8`, hitting the multicast recipient limit. In our deployment this doesn't matter — our traffic is entirely intra-/24 (hotspots only talk to their server's RADIUS at `.1` and controller at `.254`). But for deployments that need cross-subnet communication, the parent route should use a gateway (`"via": "<router_zt_ip>"`) instead of being directly attached. We kept `via: ""` for simplicity since cross-subnet ARP simply doesn't happen in our case.

Our IPAM structure:

```
10.0.0.0/8 (master subnet — VPN points here)
├── 10.1.1.0/24 (server 1)
├── 10.1.2.0/24 (server 2)
├── 10.1.3.0/24 (server 3)
└── ... (per-server subnets)
```

Each device gets an IP in its server's `/24` child subnet via IPAM `request_ip()`. The PoC picks these up automatically — when we add a new server and create its child subnet, the next VPN save pushes the updated routes.

On our application side, we auto-create IPAM child subnets for new servers via the OpenWISP API and sync per-server IPs. All route management is handled by the modified OpenWISP code above.

**Before the PoC**, we had a 5-minute cron job re-adding `/24` routes directly to the ZeroTier controller API after every OpenWISP-triggered wipe. That worked but created a race condition window where routes were missing after each sync cycle. The PoC eliminates that entirely.

### The Limitation: Per-Device Subnet Distribution via VPN Template

The route generation PoC works great for controller-level routing. But there's a gap we couldn't solve without modifying OpenWISP further: the VPN template still distributes the master subnet to all devices.

Ideally, you'd want device group A to know "you're in 10.1.1.0/24" and device group B to know "you're in 10.1.2.0/24" — for example to configure the ZT interface mask correctly on each device. But this isn't possible today:

1. **`vpn_subnet` is system-defined and read-only.** The `vpn_subnet_{pk}` variable was added in #642/#654 to expose the VPN's CIDR to templates. In `vpn.py`, `get_context()` always sets it to the VPN's single IPAM subnet:
   ```python
   context[context_keys["vpn_subnet"]] = str(self.subnet.subnet)
   ```
   The key is `vpn_subnet_{vpn_pk}` — a system-managed variable, not a template default.

2. **VPN context overwrites group variables in the merge order.** In `config.py`, the context is built like this:
   ```python
   # Line 958: group variables applied
   context.update(self.device._get_group().get_context())
   # Line 960: VPN context applied AFTER — overwrites any group values with same key
   context.update(self.get_vpn_context())
   ```
   Even if you set `vpn_subnet_{pk}` as a group variable, the VPN context replaces it two lines later.

3. **One VPN = one subnet.** A VPN object is tied to a single IPAM subnet. There's no mechanism for the VPN template to look up which child subnet a device's IP actually belongs to.

So standalone OpenWISP can't distribute per-group or per-device child subnets through the VPN template. We work around this in our application layer (we know which subnet each device belongs to), but a pure OpenWISP deployment would be stuck with the master subnet in the template.

### Proposal

Two features would make this work natively in OpenWISP:

1. **Route generation from IPAM child subnets** (the PoC above) — opt-in, backward compatible. Could be a boolean on the VPN model like `zt_routes_from_children`. Should respect ZeroTier's 128-route limit (`ZT_MAX_NETWORK_ROUTES` in `ZeroTierOne.h`). For deployments needing true L2 isolation across subnets, the parent route should support a `via` gateway instead of being directly attached — this would prevent cross-subnet ARP and fully scope multicast groups to each child subnet.

2. **Per-device `vpn_child_subnet` variable** — building on the `vpn_subnet` variable from #642/#654, when a device's assigned IP falls within an IPAM child subnet, expose that child subnet as a new template variable. This would let the VPN template configure each device with its specific `/24` instead of the master `/8`.

Bonus improvements (could be separate):
- **Route observability**: Re-enable `routes` in the VPN JSON schema (currently stripped in `vpn_backends.py`) and in `_get_repsonse()` — right now admins have no visibility into what routes OpenWISP pushes to ZT.
- **Route merging**: Preserve externally-added routes instead of replacing all on every sync — this changes the ownership model though, bigger discussion.

### Context

The ZT backend was implemented during GSoC 2023 (PRs [#778](https://github.com/openwisp/openwisp-controller/pull/778), [#811](https://github.com/openwisp/openwisp-controller/pull/811)) with the assumption that one IPAM subnet = one ZT route. The `vpn_subnet` system variable was later added in [#642](https://github.com/openwisp/openwisp-controller/issues/642)/[#654](https://github.com/openwisp/openwisp-controller/pull/654) to expose the VPN CIDR to templates. Both were reasonable design choices for their scope.

We have this PoC running in production and it's been stable. Happy to contribute a proper PR if there's interest.

### Environment
- OpenWISP Controller: 1.1.1
- OpenWISP IPAM: 0.3.0
- ZeroTier: 1.14.x

Best Regards,
Muhammad.

Federico Capoano

unread,
Apr 17, 2026, 1:45:15 PM (8 days ago) Apr 17
to open...@googlegroups.com
Hi Mohammed Ismail,

Can you clarify if you're using Zerotier for management or for transporting traffic?

I think it's worth creating an issue, but please do not paste this text as is because it's not suited for the issue tracker. Follow the feature request template at this URL: https://github.com/openwisp/openwisp-controller/issues

Best regards
Federico Capoano
OpenWISP OÜ
Kotkapoja tn 2a-10, 10615, Harju maakond, Tallinn, Estonia
VAT: EE101989729


--
You received this message because you are subscribed to the Google Groups "OpenWISP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openwisp+u...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openwisp/CAKOr0YR%2BY%3D%2B4yTCPE-2NC2Tk%3D0wTV8xhT9uxr%3DcS3tYTFme9Mg%40mail.gmail.com.

Mohammed Ismail

unread,
Apr 17, 2026, 2:26:23 PM (8 days ago) Apr 17
to open...@googlegroups.com
Hi Federico Capoano,

Zerotier network carrying mainly Management Traffic and  RADIUS traffic.
I have Openwisp as a controller, and My Portal + FreeRADIUS on another Server.
Now my customer does not interact with Openwisp. only API do.
So I can customize as much as possible. but i thought this might be an actual issue.
why not drop everything on the mailing list, 
sorry for copy paste I am very lazy..
actually it took some time to reach this final version.

What type should I choose ? Bug or something else?

Regards,
Muhammad Ismail.

Federico Capoano

unread,
Apr 17, 2026, 3:08:59 PM (8 days ago) Apr 17
to open...@googlegroups.com
Feature request is more appropriate here.


Reply all
Reply to author
Forward
0 new messages