On Nov 23, 2009, at 9:37 PM, Valient Gough wrote:
> The technology here is a means, not an ends. The goal wasn't to make a faster logging library, the goal was to make a dynamic logging library. But to make a convincing argument that logging can be left in production code, it is easier if the unused logging is as cheap as possible.
>
> Have you used any Java loggers? A nice features is that most will allow you to set logging level based on class hierarchy. For example, I might turn on debug level for my own hierarchy - net.arg0.*, but only warning level for org.apache.*, etc. That's done at runtime and lets you fine-tune what you want to log.
>
> That's what I wanted in a C++ logger. Not static compile-time selection, but the ability to change logging level if a user passed a "--debug" option on the command line, or potentially if they connected a remote debugger. I want to be able to dynamically enable just logging statements from one file, without enabling all debug. One place that was using rlog had a remote logger for their server. They could connect to it remotely, change logging (on a file by file basis - say turn debug on just for file Foo.cpp), without having to restart the service.
>
> That sort of flexibility changes how you use logging, and the optimizations are just how you get people to try it. That said, most work I do these days is in Java or C#, so I haven't followed what's happened with C++ loggers over the last few years..
Understood. I have been a C++ programmer, then a Java programmer for years, and am now back in the C++ world. I am well aware of the log4j paradigms -- the hierarchical organization of loggers (that maps to the class hierarchy in common practice) is very convenient. I had used Apache's log4cxx port of log4j for a couple of years precisely because they'd ported that aspect of log4j well.
In my spare time I am tinkering with ideas for a logger that would also guarantee *latency* bounds even in the *enabled* state. More like an instrumentation tracer than a human-readable output logger. This explains my perf bias in the very first question.
Things of this kind are commonly attempted by offloading the actual message persistence to a separate thread. However, it has also been observed that such designs still result in high-ish overheads, due to extra thread context switching. But why does a "thread context switch" has a cost, really? Besides the traditional overheads of saving/restoring registers and scheduling maintenance, it is quite possible that these days the *larger* contribution comes from cache interference: thread A is doing something in one memory area, then gets preempted by thread B that will access *different* memory addresses and consequently will pollute the data cache with its own stuff. Then a switch back to thread A which might end up re-loading some of the cache lines it would have kept if not for B. Check out [1].
Now comes the interesting realization: if cache interference can dominate (100s of cycles) then it does not really matter whether there is an actual thread switch. What matters is the switch in memory access pattern. So, in rlog's case, for example, we can imagine this: the CPU is jamming along, doing what it's meant to be doing, then boom: it is asked to load one of the is-enabled memory vars for a conditional branch. If the var is far enough away from the working data set (like in your special cluster segment) this is likely to cause a cache miss. Then the CPU goes back to the "normal" app logic. You see what I mean? The real cost of those branches may be in those sporadic cache interferences. The steady stream of rlog is-enabled checks will maintain a those vars in the cache and will make the effective size of the cache available to the "normal" logic somewhat smaller.
This was on my mind when I posted the first time. The true cost of logging a la rlog/log4cxx style may only manifest itself when embedded in a sea of real code. And it's kind of spread around so it's hard to measure. It kind of seemed to me that the real test would be to take a realistic app, add rlog log macros, and then measure the total cycle counts for a piece of app work with those macros as they are and with them defined to noops. I wouldn't be surprised if the average of the extra cycles per log statement were a bit more than 2...
Cheers,
Vlad
[1]
http://www.cs.rochester.edu/u/cli/research/switch.pdf