Just thinking about this...
We use complex patterns in our topics and bindings, and can't always control what our internal users provide...
Plus we have equivalence between our Cassandra db keys, MQTT topics, and the RabbitMQ topics/bindings - so we are always transforming them among different representations.
And we want the topics to be easily readable in the RabbitMQ management UI and elsewhere (no base64).
So we have come down to 2 tactics:
- Percent encoding: internal users understand this, e.g. a period becomes '%2E'. Possibly ambiguous...
- Encoding with Unicode private use code points in the Basic Multilingual Plane. Huh?
Topics/bindings are utf-8, so you can, for example, encode/decode a period to/from U+E000 unambiguously, and so forth for '+' and '#'.
It will display with a funny glyph. A good intern project would be to add distinctive glyphs, e.g. a bolded glyph of the character being replaced.
There are 6400 code points in the 1st private use area - we also use them to represent standard phrases like 'New York Times' or 'your credit card has expired' :)
ml