Hello Kirk,
The approach is a little different in the sense it is designed to make logging so cheap to write and to read, you tend to write/read everything of interest (at least using tools to do this, because you will be writing more than a human can read) It is left to the reader/filters to decide what is of interest. The aim is to log everything to the point where you can recreate the exact state of the system (or in a multi-thread context, a close approximation) This is useful for recovery, monitoring and diagnosis but is also the basis of any down stream system which needs data from your system i.e. it too can recreate the state of any upstream system it needs also. Your logging and messaging are combined.
To your points directly.
1) This forces you eat everything (or at least check some id which you can filter on so you don't have to read the whole message) While this is worse in some ways you can read millions of messages per second this way without creating garbage (or creating very little depending on your needs)
2) When you write the message there is no String manipulation. If there were, the API would be simpler I suspect but I avoid this because being gc-less is a priority for me.
3) There isn't any configuration, but you can piggy back on existing logging frameworks to turn on and off messages if you want, even if you don't use them to do the logging. Given you can write lots of messages and data at low cost, you don't worry about this so much.
The performance numbers are for small messages to demonstrate the overhead of the framework which is much lower than the previous version. I feel confident with this number because I am working on a pretty average laptop and I plan to test larger messages on a new over clocked i7-3970x with a PCI SSD and I am sure I can get that number with longer messages. For longer messages you are dependant on the speed your CPU can copy fields so there is not much more that can be improved there. (I have some ideas for raw copying of memory for objects which only contain primitives which should be faster)
The speed I am looking for should be comparable to the time it takes to clone the memory structure on the same hardware. If I can do that, the cost of writing and reading messages is so low than many of the constraints we place of logging to minimise the performance hit go away.
Regards,
Peter.