Garbage-free Log4j docs preview

1,194 views
Skip to first unread message

Remko Popma

unread,
May 17, 2016, 1:13:09 PM5/17/16
to mechanical-sympathy, Log4J Developers List
Hi all,

First, my thanks to the many people who gave helpful advice and feedback on how to measure Log4j response time on this list some time ago.

We're about to start the Log4j 2.6 release.
If anyone is interested, a preview of the garbage-free logging manual page is here: http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
and a preview of the updated performance page is here: http://home.apache.org/~rpopma/log4j/2.6/performance.html

Feedback welcome!

Remko

Martin Thompson

unread,
May 17, 2016, 1:33:55 PM5/17/16
to mechanica...@googlegroups.com, Log4J Developers List
Hi Remko,

I'd just like to say that it is a great service to the community as a whole that someone is seriously looking at improving logging.

If you keep it up you'll be putting folks like me out of a job :)

Martin...


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benedict Elliott Smith

unread,
May 17, 2016, 1:56:18 PM5/17/16
to mechanica...@googlegroups.com
Regrettably it seems impossible (or at least very annoying) to send to both lists simultaneously...

On 17 May 2016 at 18:55, Benedict Elliott Smith <bene...@apache.org> wrote:
Could I suggest that you run tests for latency and throughput effects of using this in systems with only moderate-to-low numbers of logging calls?  (i.e. the vast majority of systems!)

It's very likely that in such systems messages will not reach a velocity to keep the Dispatcher running, and so logging calls may often (or always) involve a Futex call - which is not a trivial cost.  There will almost certainly be systems out there with anti-ideal characteristics - logging just often enough that these costs materially impact throughput, but not often enough that they suddenly disappear.  

Even though a majority of systems do not need this, because it "async" and the new hotness, and there are no advertised downsides, everyone will try to use it.  It's up to those who know better to make sure these people are informed it isn't a free lunch.  Making sure all of the caveats are reported on the advertising page would be a great start IMO.

Might I also separately suggest you consider filtering events prior to placing them on the queue for processing by the dispatcher?  I've only briefly glanced at the code, but it seems not to be the case currently.


On 17 May 2016 at 18:50, Benedict Elliott Smith <_...@belliottsmith.com> wrote:
Could I suggest that you run tests for latency and throughput effects of using this in systems with only moderate-to-low numbers of logging calls?  (i.e. the vast majority of systems!)

It's very likely that in such systems messages will not reach a velocity to keep the Dispatcher running, and so logging calls may often (or always) involve a Futex call - which is not a trivial cost.  There will almost certainly be systems out there with anti-ideal characteristics - logging just often enough that these costs materially impact throughput, but not often enough that they suddenly disappear.  

Even though a majority of systems do not need this, because it "async" and the new hotness, and there are no advertised downsides, everyone will try to use it.  It's up to those who know better to make sure these people are informed it isn't a free lunch.  Making sure all of the caveats are reported on the advertising page would be a great start IMO.

Might I also separately suggest you consider filtering events prior to placing them on the queue for processing by the dispatcher?  I've only briefly glanced at the code, but it seems not to be the case currently.

Remko Popma

unread,
May 18, 2016, 7:34:43 PM5/18/16
to mechanical-sympathy, _...@belliottsmith.com, Log4J Developers List
Benedict, thanks for your feedback!

With regards to filtering, global filters are already considered before the event is enqueued. Filters configured on the Logger and AppenderRef are applied prior to enqueueing with mixed Sync/Async Loggers but when all loggers are async these filters are applied by the background thread. We can probably improve this, thanks for the suggestion!

Your suggestion to run performance tests under lower loads is reasonable and we will look at this for a future release.
I did experiment with this a little a while back and did see larger response time pauses. Perhaps others with more experience can chime in.

My understanding of how the Disruptor works is that enqueueing the event is lock-free, so applications threads doing the logging should not experience any context switches or suffer latency from Futex calls caused by logging (regardless of the workload). The background thread is another story. Under moderate to low workloads the background thread may (depending on the wait policy) fall asleep and experience delays before waking up when work arrives. However, if there are enough cores to serve all threads I don't see how such background thread delays can impact (cause response time pauses for) the application that is doing the logging. Please correct me if my understanding is incorrect.

My thinking is that using async loggers is good for reactive applications that need to respond quickly to external events. It is especially useful if the application needs to deal with occasional bursts of work (and accompanied bursts of logging). This is where async loggers can deliver value even if the normal workload is low.

Remko

On Wednesday, May 18, 2016 at 2:56:18 AM UTC+9, Benedict Elliott Smith wrote:
Regrettably it seems impossible (or at least very annoying) to send to both lists simultaneously...

On 17 May 2016 at 18:55, Benedict Elliott Smith <bene...@apache.org> wrote:
Could I suggest that you run tests for latency and throughput effects of using this in systems with only moderate-to-low numbers of logging calls?  (i.e. the vast majority of systems!)

It's very likely that in such systems messages will not reach a velocity to keep the Dispatcher running, and so logging calls may often (or always) involve a Futex call - which is not a trivial cost.  There will almost certainly be systems out there with anti-ideal characteristics - logging just often enough that these costs materially impact throughput, but not often enough that they suddenly disappear.  

Even though a majority of systems do not need this, because it "async" and the new hotness, and there are no advertised downsides, everyone will try to use it.  It's up to those who know better to make sure these people are informed it isn't a free lunch.  Making sure all of the caveats are reported on the advertising page would be a great start IMO.

Might I also separately suggest you consider filtering events prior to placing them on the queue for processing by the dispatcher?  I've only briefly glanced at the code, but it seems not to be the case currently.
On 17 May 2016 at 18:33, Martin Thompson <mjp...@gmail.com> wrote:
Hi Remko,

I'd just like to say that it is a great service to the community as a whole that someone is seriously looking at improving logging.

If you keep it up you'll be putting folks like me out of a job :)

Martin...

On 17 May 2016 at 18:13, Remko Popma <remko...@gmail.com> wrote:
Hi all,

First, my thanks to the many people who gave helpful advice and feedback on how to measure Log4j response time on this list some time ago.

We're about to start the Log4j 2.6 release.
If anyone is interested, a preview of the garbage-free logging manual page is here: http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
and a preview of the updated performance page is here: http://home.apache.org/~rpopma/log4j/2.6/performance.html

Feedback welcome!

Remko

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Remko Popma

unread,
May 18, 2016, 7:36:01 PM5/18/16
to mechanical-sympathy, log4...@logging.apache.org
Thank you Martin, appreciated! :-)
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Jerry Shea

unread,
May 18, 2016, 10:10:45 PM5/18/16
to mechanical-sympathy, log4...@logging.apache.org
This is fantastic, thanks for this Remko.

Can I ask a couple of questions?
1. is log4j 2.6 designed to operate garbage-free when used as an SLF4J implementation too, or only with the native API?
2. when do you expect log4j 2.6 to be released GA i.e. ready for production use?

Thansk, Jerry

Remko Popma

unread,
May 19, 2016, 3:10:24 AM5/19/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
On Thursday, May 19, 2016 at 11:10:45 AM UTC+9, Jerry Shea wrote:
> This is fantastic, thanks for this Remko.
>
>
> Can I ask a couple of questions?
> 1. is log4j 2.6 designed to operate garbage-free when used as an SLF4J implementation too, or only with the native API?

The log4j-slf4j-impl binding is trying to be garbage-free, but the SLF4J API only offers up to two parameters for a parameterized message. More than that uses varargs which creates a temporary object for the parameter array. The native Log4j 2.6 API has methods for up to ten unrolled parameters.

Another consideration is that the native Log4j 2 API lets you log Objects, not just Strings. Any Object that implements CharSequence can be logged without creating garbage.

You can also implement the log4j Message interface if you want to separate formatting from your business domain objects.

Finally, the native Log4j 2 API has support for logging Java 8 lambdas (since 2.4).

> 2. when do you expect log4j 2.6 to be released GA i.e. ready for production use?

Barring any showstoppers, I expect it will be available in a week or so.

>
>
> Thansk, Jerry
>


Jerry Shea

unread,
May 19, 2016, 3:57:10 AM5/19/16
to mechanica...@googlegroups.com, log4...@logging.apache.org

Ok thanks!

> On Thursday, 19 May 2016 09:36:01 UTC+10, Remko Popma  wrote:
> Thank you Martin, appreciated! :-)
>
> On Wednesday, May 18, 2016 at 2:33:55 AM UTC+9, Martin Thompson wrote:
> Hi Remko,
>
>
> I'd just like to say that it is a great service to the community as a whole that someone is seriously looking at improving logging.
>
>
> If you keep it up you'll be putting folks like me out of a job :)
>
>
> Martin...
>
>
>
>
> On 17 May 2016 at 18:13, Remko Popma <remko...@gmail.com> wrote:
>
> Hi all,
>
>
> First, my thanks to the many people who gave helpful advice and feedback on how to measure Log4j response time on this list some time ago.
>
>
> We're about to start the Log4j 2.6 release.
> If anyone is interested, a preview of the garbage-free logging manual page is here: http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
> and a preview of the updated performance page is here: http://home.apache.org/~rpopma/log4j/2.6/performance.html
>
>
> Feedback welcome!
>
>
> Remko
>
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/klefjRqlpQE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Remko Popma

unread,
May 19, 2016, 11:55:40 AM5/19/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
After looking more closely I found that the org.slf4j.spi.LocationAwareLogger::log method is currently not implemented in a garbage-free manner in the log4j-slf4j-impl binding. It creates a new message object for each call. If there is interest we can address this in a future release. Other org.slf4j.Logger methods seem okay. 

ki...@kodewerk.com

unread,
May 20, 2016, 3:43:30 AM5/20/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
Hi Remko,

I would have to agree with Martin. Logging in Java is in a very sad state. To date I’ve rarely run into a customer who’s system wasn’t some how negativity impacted by logging. My extreme case is a customer that was facing a 4.5 second time budget where logging accounted for 4.2 seconds (big portion was pressure on the async appender). I will be looking with great interest at this release.

Kind regards,
Kirk Pepperdine

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
signature.asc

Benedict Elliott Smith

unread,
May 22, 2016, 5:34:13 AM5/22/16
to Remko Popma, mechanical-sympathy, Log4J Developers List
Hi Remko,

I realise that I was looking at the old log4j code without realising it, in which there is always a futex involved.

However the behaviour of the disruptor that backs the new code will depend on the wait strategy employed (and I'll note your default is a blocking strategy, for which my prior statement holds*).

Here we obviously get into the territory of people needing to understand the disruptor as well as your logging framework, but since you have a whole section labelled "Trade-offs" in which you draw attention to the improved throughput and latency profile under "Benefits," under "Drawbacks" it might be nice to mention these possible confounders.  Or at least raise a bat signal to go and understand the tradeoffs for the disruptor (which is also regrettably poorly understood).  

It might also be helpful to explicitly call out the configuration used for your benchmarks, and perhaps to run a comparison against some real software - github does not lack for applications using log4j!  Unfortunately I bet this would result in a much less sexy graph.

Finally, I really think people should (in your wiki page) explicitly be discouraged from using this part of your framework unless they know they need it.  It's a fantastic piece of work for people who do need it.  But 99.99% of people would be better off with a logger that just avoids blocking threads when one is already in the process of logging.

(*It looks like you silently ignore misconfiguration, and default to timeout wait strategy also, which is probably going to surprise some people)





To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Remko Popma

unread,
May 22, 2016, 9:42:50 AM5/22/16
to mechanica...@googlegroups.com, log4...@logging.apache.org, _...@belliottsmith.com


On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith wrote:
Hi Remko,

I realise that I was looking at the old log4j code without realising it, in which there is always a futex involved.
Which class is this? Async Loggers in Log4j 2 have always been lock-free. What may be confusing is that Log4j 2 also has an AsyncAppender, which does not use the Disruptor but uses a JDK BlockingArrayQueue. This appender is very similar to how Logback and Log4j 1 do asynchronous logging. AsyncAppender (in Log4j 2, Log4j 1 and Logback, they all use a blocking queue) is sensitive to lock contention and does not scale with more threads.
 

However the behaviour of the disruptor that backs the new code will depend on the wait strategy employed (and I'll note your default is a blocking strategy, for which my prior statement holds*).
Perhaps the key misunderstanding is here. I think I did not explain it well in my previous message. Let me try again.

Async Loggers use the Disruptor and are lock-free. This means that multiple application threads can log concurrently and are not blocked by each other. This is true for very low workloads, medium workloads and high workloads. 

Only if an application logs messages at a very high rate for a long period of time, or uses a slow appender, the Disruptor ringbuffer can fill up and application threads will have to wait until a slot in the ringbuffer becomes available. For the 2.6 release I have added text to the Trade-offs section (the last bullet point about sustained rate) to explain this last point, based on your feedback.

Note that the Disruptor wait strategy is irrelevant to all this. Application threads are not impacted by the wait strategy.


Here we obviously get into the territory of people needing to understand the disruptor as well as your logging framework, but since you have a whole section labelled "Trade-offs" in which you draw attention to the improved throughput and latency profile under "Benefits," under "Drawbacks" it might be nice to mention these possible confounders.  Or at least raise a bat signal to go and understand the tradeoffs for the disruptor (which is also regrettably poorly understood).  

It might also be helpful to explicitly call out the configuration used for your benchmarks,
Tests on the 2.6 performance page are mostly done with JMH benchmarks where parameters are in the javadoc.
Tests on the Async Loggers page are partly with the new ResponseTimeTest class (params in the doc) and partly still shows results from the older PerfTestDriver class. This last one has parameters in the code. You'll need to spend some time to get familiar with it if you want to run it.

and perhaps to run a comparison against some real software - github does not lack for applications using log4j!  Unfortunately I bet this would result in a much less sexy graph.
Log4j 2 has had lock-free Async Loggers from its 2.0-beta5 release, three years ago. If there are any serious performance drawbacks like you seem to think there are I hope I would have heard of them. Fortunately the opposite is true. People report very positive experiences. People who did their own measurements reported it makes their logging as fast as if it was switched off.


Finally, I really think people should (in your wiki page) explicitly be discouraged from using this part of your framework unless they know they need it.  It's a fantastic piece of work for people who do need it. 
Again, I think you are laboring under a misapprehension. I cannot imagine a scenario where lock-free logging would be a bad thing. If you think there is, please show evidence.
 
But 99.99% of people would be better off with a logger that just avoids blocking threads when one is already in the process of logging. 
Avoiding blocking is exactly what the Async Loggers are for. (So I really think there is some misunderstanding somewhere.)
Based on your feedback, I've updated the side nav menu and page title for the 2.6 release to clarify that Async Loggers are lock-free. I hope this will help avoid misunderstandings.
 
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Benedict Elliott Smith

unread,
May 22, 2016, 10:08:37 AM5/22/16
to mechanical-sympathy, Log4J Developers List
It's possible I'm missing some other aspect that insulates the application thread from the disruptor, but from your message it seems to me that you fall into the all-too-common category of people who do not fully understand the semantics of the disruptor.  Application threads (i.e. disruptor producers) very much are affected by the wait strategy.  Under normal use, if the disruptor consumer thread is employing a blocking strategy, there will be futex call to unblock it when work is provided.  i.e. whenever the queue transitions from empty to non-empty, i.e. for most messages with low-to-medium logging rates.

In this situation the enqueueing of a message is also not lock free, as while futexes can in theory be lock-free, the JVM uses a lock to synchronise access to the semaphore that controls its runnable state, so all threads attempting to wake it up concurrently will compete for this lock.


On 22 May 2016 at 15:07, Benedict Elliott Smith <bene...@apache.org> wrote:
It's possible I'm missing some other aspect that insulates the application thread from the disruptor, but from your message it seems to me that you fall into the all-too-common category of people who do not fully understand the semantics of the disruptor.  Application threads (i.e. disruptor producers) very much are affected by the wait strategy.  Under normal use, if the disruptor consumer thread is employing a blocking strategy, there will be futex call to unblock it when work is provided.  i.e. whenever the queue transitions from empty to non-empty, i.e. for most messages with low-to-medium logging rates.

In this situation the enqueueing of a message is also not lock free, as while futexes can in theory be lock-free, the JVM uses a lock to synchronise access to the semaphore that controls its runnable state, so all threads attempting to wake it up concurrently will compete for this lock.



On 22 May 2016 at 14:42, Remko Popma <remko...@gmail.com> wrote:


On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith wrote:
Hi Remko,

I realise that I was looking at the old log4j code without realising it, in which there is always a futex involved.
Which class is this? Async Loggers in Log4j 2 have always been lock-free. What may be confusing is that Log4j 2 also has an AsyncAppender, which does not use the Disruptor but uses a JDK BlockingArrayQueue. This appender is very similar to how Logback and Log4j 1 do asynchronous logging. AsyncAppender (in Log4j 2, Log4j 1 and Logback, they all use a blocking queue) is sensitive to lock contention and does not scale with more threads.
 

However the behaviour of the disruptor that backs the new code will depend on the wait strategy employed (and I'll note your default is a blocking strategy, for which my prior statement holds*).
Perhaps the key misunderstanding is here. I think I did not explain it well in my previous message. Let me try again.

Async Loggers use the Disruptor and are lock-free. This means that multiple application threads can log concurrently and are not blocked by each other. This is true for very low workloads, medium workloads and high workloads. 

Only if an application logs messages at a very high rate for a long period of time, or uses a slow appender, the Disruptor ringbuffer can fill up and application threads will have to wait until a slot in the ringbuffer becomes available. For the 2.6 release I have added text to the Trade-offs section to explain this last point, based on your feedback.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Remko Popma

unread,
May 22, 2016, 11:40:21 AM5/22/16
to mechanical-sympathy, log4...@logging.apache.org, _...@belliottsmith.com


On Sunday, May 22, 2016 at 11:08:37 PM UTC+9, Benedict Elliott Smith wrote:
It's possible I'm missing some other aspect that insulates the application thread from the disruptor, but from your message it seems to me that you fall into the all-too-common category of people who do not fully understand the semantics of the disruptor.  Application threads (i.e. disruptor producers) very much are affected by the wait strategy.  Under normal use, if the disruptor consumer thread is employing a blocking strategy, there will be futex call to unblock it when work is provided.  i.e. whenever the queue transitions from empty to non-empty, i.e. for most messages with low-to-medium logging rates.

In this situation the enqueueing of a message is also not lock free, as while futexes can in theory be lock-free, the JVM uses a lock to synchronise access to the semaphore that controls its runnable state, so all threads attempting to wake it up concurrently will compete for this lock.

Well... Looks like you were right and I was wrong. 

Interesting. Apart from the dent in my pride here, this is actually good news. It means we can do better!

So in the worst-case scenario Async Loggers/Disruptor with the blocking wait strategy give performance equivalent to AsyncAppender with a BlockingArrayQueue. We can expect Async Loggers with Disruptor to outperform AsyncAppender with BlockingArrayQueue when workloads are high or bursts of work keep the background thread busy for some time. At least this is my understanding now. Or did I miss something and are there cases where Async Loggers with Disruptor will perform worse than AsyncAppender with BlockingArrayQueue?

In terms of how to improve, short-term we should probably look at using the PhasedBackoffWaitStrategy as the default, although there will always be workloads that "miss the window". The trade-off is increased CPU. We need to tread carefully here because users reporting high CPU is why we changed from the sleeping wait strategy to blocking. (The default in Log4j 2.6 will be timeout blocking wait to deal with a deadlock issue in Solaris.)

Long-term we should probably explore other data structures.

Exciting times. Thanks for correcting me!



To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Benedict Elliott Smith

unread,
May 22, 2016, 1:47:05 PM5/22/16
to mechanical-sympathy, log4...@logging.apache.org

Your pride should be saved by your gracious recognition of the error :)


However this is the core of my point: you’re clearly an intelligent, conscientious and talented developer, and yet you did not fully understand these concerns.  You are writing core infrastructure many people will deploy, and a majority will understand it much much less well.  So the very best thing you can do is to educate them.  Doubly so if you needed educating yourself!


I would make it very clear that users should only consider asynchronous logging if they are having performance problems caused by their logging.  Only with sub-millisecond latency requirements should async be considered, and below 100 micros you are entering busy-spinning territory.  Or if huge message rates are being seen.  The best default is a blocking strategy, as this has the closest performance profile to synchronous, and will prevent the many users who deploy this without needing it from suffering any major ill-effect.


I would also do a performance comparison with a real application.  Cassandra uses log4j, and the community has an easily utilised performance tool.  You could try both DEBUG and INFO level, and put up some graphs.  This would help users make a more realistic and informed decision.


This may all seem a little disappointing: to reduce the number of people who might use your amazing new technology.  But there will be plenty, who both need it and do not and don't read or heed the warnings.


Regarding improvements: In my opinion, a better algorithm for this majority, of low-to-moderate traffic that is not extremely latency sensitive, is semi-synchronous.  By which I mean for the first log call in a batch to take exclusive ownership of the output and to write its message synchronously; any messages that arrive while it has ownership are passed to the owner to persist, i.e. they are written asynchronously.  To ensure any thread’s ownership is brief, it would impose a time-limit, after which a dedicated consumer could be started if the queue is still not exhausted.  This would have better than synchronous behaviour for low-to-moderate traffic and scale gracefully to high traffic.

Martin Thompson

unread,
May 22, 2016, 1:47:28 PM5/22/16
to mechanica...@googlegroups.com, Log4J Developers List
On 22 May 2016 at 15:08, Benedict Elliott Smith <b.ellio...@gmail.com> wrote:
It's possible I'm missing some other aspect that insulates the application thread from the disruptor, but from your message it seems to me that you fall into the all-too-common category of people who do not fully understand the semantics of the disruptor.  Application threads (i.e. disruptor producers) very much are affected by the wait strategy.  Under normal use, if the disruptor consumer thread is employing a blocking strategy, there will be futex call to unblock it when work is provided.  i.e. whenever the queue transitions from empty to non-empty, i.e. for most messages with low-to-medium logging rates.

In this situation the enqueueing of a message is also not lock free, as while futexes can in theory be lock-free, the JVM uses a lock to synchronise access to the semaphore that controls its runnable state, so all threads attempting to wake it up concurrently will compete for this lock.

Do you mean condition variable rather than futex? Futex is a fast user space mutex, and a mutex provides mutual exclusion. We need to signal the consumer thread that work is available and this requires a condition variable which is accessed under a mutex/futex. The waiting queues on the condition variables could be implemented in a number of ways. I guess you will need to check what your JVM does.

For reference none of the Disruptor strategies are wait-free or lock-free for producers in a concurrency context when you have multiple producers. Producers can block each other as pretty much all multi-producer concurrent algorithms do. A few don't and they are rare and do not necessarily have the best performance. I've made the same mistake with terminology in the past.

If we consider blocking characteristics from the producers perspective when logging and compare the approaches. You are right that in low volume cases a wakeup action might have to performed by the producer to signal the consumer when using the BlockingWaitStrategy. This is the same as the cost with blocking on an ArrayBlockingQueue. This will be on the same order as the time to log to the file system page cache when writing direct. The progressive backoff strategies tend to scale better with the Disruptor because of not having the need to signal between producers and consumers.

If pointing out the issues with the new approach then it is also fair to point out the issues with the existing approach. These include cost of writing the full buffer to disk, lock contention, blocking when the page cache dirty ratio is exceeded, and log rotation, that the producer thread can also experience. Then you can top this all off with the increased cost of file writes in virtualised environments which are becoming evermore common.

Developers liberally apply logging without consideration. If is the most common case I see of applications failing to scale with core count I see. I agree that all alternatives have consequences. The typical approaches also have consequences.

What do you thinking developers should consider when using the default logging approaches or should not consider it at all?

p.s. Do you have a reference for a lock-free implementation of a futex that you refer to? I'd be interested in reading that. If you mean using CAS in the fast path when uncontended it means no OS level lock is applied but it is still not concurrently lock-free as another producer would block and cause the lock to be inflated.

Benedict Elliott Smith

unread,
May 22, 2016, 2:07:44 PM5/22/16
to mechanical-sympathy, log4...@logging.apache.org
I don't mean to pretend that the normal approach is universally better -  the graphs and lengthy write-up of the benefits clearly demarcate the opposite.  I'm trying to make sure there is a balanced case presented, as everyone's excitement for async will no doubt imbalance the reading of even a balanced presentation.

Really all I'm campaigning for is better education.  I made all the same mistakes as we're discussing here, not long ahead of the curve.  It pains me to see the entire developer community stumbling through the same errors independently.

Regarding lock-free futexes, I regretted typing "can in theory be lock-free".  Here I refer to the fact that, in principle, a semaphore condition (all we need here) and wait-queue do not need mutual exclusion to be implemented safely.  The main futex operations wake and wait could be implemented without a lock in this case.  I hand-waved to avoid having to consult the most current linux kernel source code (or any other OS), as it had been a while since I last looked (and I never looked widely enough to decide if in practice in the linux kernel such a property would ever be possible given other constraints).

Benedict Elliott Smith

unread,
May 22, 2016, 2:22:42 PM5/22/16
to mechanical-sympathy, log4...@logging.apache.org
I guess really what I should say is that I am conflating the term futex with the interaction with the scheduler, which is almost certainly erroneous, but I'm not totally familiar with the lingo, nor all of the available primitives.

Martin Thompson

unread,
May 22, 2016, 2:26:20 PM5/22/16
to mechanica...@googlegroups.com
On 22 May 2016 at 19:07, Benedict Elliott Smith <b.ellio...@gmail.com> wrote:
I don't mean to pretend that the normal approach is universally better -  the graphs and lengthy write-up of the benefits clearly demarcate the opposite.  I'm trying to make sure there is a balanced case presented, as everyone's excitement for async will no doubt imbalance the reading of even a balanced presentation.

Really all I'm campaigning for is better education.  I made all the same mistakes as we're discussing here, not long ahead of the curve.  It pains me to see the entire developer community stumbling through the same errors independently.

I think it is great that we have people trying to make advances. Some will succeed and some will fail. Hopefully we can learn from the failures, plus learn and benefit from the successes. Healthy debate can contribute. Learning is why I love this group. :-)
 

Benedict Elliott Smith

unread,
May 22, 2016, 2:34:38 PM5/22/16
to mechanical-sympathy
Yes, and I don't mean to discourage that!

But the faster the stuff that's been mostly settled is widely disseminated, the faster new failures/successes can be had.

--

Remko Popma

unread,
May 23, 2016, 8:38:44 AM5/23/16
to mechanical-sympathy, log4...@logging.apache.org


On Monday, May 23, 2016 at 2:47:05 AM UTC+9, Benedict Elliott Smith wrote:

Your pride should be saved by your gracious recognition of the error :)


However this is the core of my point: you’re clearly an intelligent, conscientious and talented developer, and yet you did not fully understand these concerns.  You are writing core infrastructure many people will deploy, and a majority will understand it much much less well.  So the very best thing you can do is to educate them.  Doubly so if you needed educating yourself!

;-) 


I would make it very clear that users should only consider asynchronous logging if they are having performance problems caused by their logging.  Only with sub-millisecond latency requirements should async be considered, and below 100 micros you are entering busy-spinning territory.  Or if huge message rates are being seen.  The best default is a blocking strategy, as this has the closest performance profile to synchronous, and will prevent the many users who deploy this without needing it from suffering any major ill-effect.

I think I missed a step here. You made it plausible that under low workload, the Disruptor may not be better than ArrayBlockingQueue. Understood.

But I don't understand why users should only configure async logging in rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes an effort to release this lock as soon as possible (locking only while copying a byte array from a thread-local buffer to the I/O device), but as Martin pointed out, in addition to the lock contention, other things may cause response time pauses when writing to the I/O device.

So which is "better": holding a lock to signal a java.util.concurrent.locks.Condition, versus holding a lock to write a byte array to the I/O device? The latter sounds like it has more risk of latency spikes. So I'm not sure I agree with your assessment that async logging has more risk of major ill effects.

Under low loads the upside of Async Loggers may not be as high as the graph suggests but I don't see any real downsides either. Please let me know if I'm missing something.
 


I would also do a performance comparison with a real application.  Cassandra uses log4j, and the community has an easily utilised performance tool.  You could try both DEBUG and INFO level, and put up some graphs.  This would help users make a more realistic and informed decision.

I agree it may be interesting for users to see a case study of some sort with a real application. Can you point me to the tool you are referring to?
 


This may all seem a little disappointing: to reduce the number of people who might use your amazing new technology.  But there will be plenty, who both need it and do not and don't read or heed the warnings.

That's okay. It is garbage-free logging that is new in the 2.6 release, not the Async Loggers. The Disruptor-based Async Loggers have been used in the wild for about three years now. The response has generally been positive so I'm not panicking just yet. 
I do intend to follow up though and I take your point that under low load this technology may not give as much benefit as under high load. (Again, please correct me if there is a bigger problem and I'm missing any other downsides/risks.)
 


Regarding improvements: In my opinion, a better algorithm for this majority, of low-to-moderate traffic that is not extremely latency sensitive, is semi-synchronous.  By which I mean for the first log call in a batch to take exclusive ownership of the output and to write its message synchronously; any messages that arrive while it has ownership are passed to the owner to persist, i.e. they are written asynchronously.  To ensure any thread’s ownership is brief, it would impose a time-limit, after which a dedicated consumer could be started if the queue is still not exhausted.  This would have better than synchronous behaviour for low-to-moderate traffic and scale gracefully to high traffic.

This is really interesting. It reminds me of one of the co-operative algorithms I read about in the Art of Multiprocessor Programming (I tried to look it up but could not find it last night).

It may be an idea to enhance the Disruptor for lower workloads: I guess the question is whether it is possible to incorporate a cheap check before calling waitStrategy.signalAllWhenBlocking() in MultiProducerSequence, and avoid calling that method if another thread is already signalling the consumer thread.
...and it turns out this already exists and is called LiteBlockingWaitStrategy. Good stuff! Perhaps all that is needed is combine the TimeoutBlockingWaitStrategy with this one and we can improve things nicely for lower workloads.

Does anyone on this list have any experience with LiteBlockingWaitStrategy? I see it was used in projectreactor.io, would be interested to hear how that worked, either on list of off-list. 

Mikael Ståldal

unread,
May 23, 2016, 9:15:02 AM5/23/16
to Log4J Developers List, mechanical-sympathy
> But I don't understand why users should only configure async logging in rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes an effort to release this lock as soon as 
> possible (locking only while copying a byte array from a thread-local buffer to the I/O device), but as Martin pointed out, in addition to the lock contention, other things may cause response time 
> pauses when writing to the I/O device.

There is also another reason for using async logging, to protect the app from errors and unpredictable response time from a network based appender. However, for that the ArrayBlockingQueue based AsyncAppender is sufficient.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.




---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-...@logging.apache.org
For additional commands, e-mail: log4j-d...@logging.apache.org



--
MagineTV

Mikael Ståldal
Senior software developer

Magine TV
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not copy or deliver this message to anyone. In such case, 
you should destroy this message and kindly notify the sender by reply email.   

Mikael Ståldal

unread,
May 23, 2016, 9:15:43 AM5/23/16
to Log4J Developers List, mechanical-sympathy
Can the support for Intel TSX in Java 8u20 be relevant here?


On Sun, May 22, 2016 at 3:42 PM, Remko Popma <remko...@gmail.com> wrote:


On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith wrote:
Hi Remko,

I realise that I was looking at the old log4j code without realising it, in which there is always a futex involved.
Which class is this? Async Loggers in Log4j 2 have always been lock-free. What may be confusing is that Log4j 2 also has an AsyncAppender, which does not use the Disruptor but uses a JDK BlockingArrayQueue. This appender is very similar to how Logback and Log4j 1 do asynchronous logging. AsyncAppender (in Log4j 2, Log4j 1 and Logback, they all use a blocking queue) is sensitive to lock contention and does not scale with more threads.
 

However the behaviour of the disruptor that backs the new code will depend on the wait strategy employed (and I'll note your default is a blocking strategy, for which my prior statement holds*).
Perhaps the key misunderstanding is here. I think I did not explain it well in my previous message. Let me try again.

Async Loggers use the Disruptor and are lock-free. This means that multiple application threads can log concurrently and are not blocked by each other. This is true for very low workloads, medium workloads and high workloads. 

Only if an application logs messages at a very high rate for a long period of time, or uses a slow appender, the Disruptor ringbuffer can fill up and application threads will have to wait until a slot in the ringbuffer becomes available. For the 2.6 release I have added text to the Trade-offs section to explain this last point, based on your feedback.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-...@logging.apache.org
For additional commands, e-mail: log4j-d...@logging.apache.org

Gil Tene

unread,
May 23, 2016, 10:57:56 AM5/23/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
I doubt that using TSX will have any positive effect on synchronous logging. Synchronous logging has actual data contention (usually on a common queue or log), so optimistic lock elision on synchronized blocks will not get you anything, as the speculation will fail most/all of the time. It only helps in cases where the synchronized blocks did not actually contend on any data.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-...@logging.apache.org
For additional commands, e-mail: log4j-d...@logging.apache.org

Mikael Ståldal

unread,
May 24, 2016, 3:56:57 AM5/24/16
to Log4J Developers List, mechanical-sympathy
Do we know why Console has so bad performance (compared to File)? Is this true also if you redirect STDOUT to a file?

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-...@logging.apache.org
For additional commands, e-mail: log4j-d...@logging.apache.org

Benedict Elliott Smith

unread,
May 24, 2016, 5:13:37 AM5/24/16
to mechanical-sympathy
But "response time pauses" are only an issue if you are latency sensitive, which was one of my explicit criteria.  Most users aren't latency sensitive.

Holding a lock is not inherently expensive; in fact it is very cheap on the scale of things we are discussing.  What is expensive is interacting with the scheduler, and "writing to an IO device" (typically just writing to the page cache, so actually not very expensive - again, unless you care about outlier latency).  In one situation you do just one of these things; in the other you do both.  The synchronous behaviour is worse if the lock is highly contended, but that is one of the other criteria: there is a performance/throughput bottleneck in the logging calls.  The semi-synchronous behaviour would avoid this situation, leaving only latency sensitivity.

NB: There is another possible criteria not mentioned, of course:  Having spare cores, and the logging work being more expensive than the scheduler interaction.  I've not benchmarked log4j, but I'm pretty sure this is not the case for anything resembling a typical log message.

The "LiteBlockingWaitStrategy" does not address this.  In the majority of situations (i.e. a log message rate of < 1-10/ms, as a very rough bound) its behaviour would be extremely similar to any other blocking wait strategy, since at this rate you would on average expect only about one unpark to be inflight at once - with all the usual caveats of depending on overall system load etc etc.

Semi-synchronous is simply a producer transitioning to a consumer to process its own message, as waking up the consumer is costlier than doing so.  When it finds work piling up, it decides the cost of waking up the consumer is no longer negative, and does so.

I will also note that for most users these extra costs aren't particularly material either; as soon as your log rate goes down to < 0.1/ms, the extra costs start to become very small as a proportion of total work. So there's definitely no reason to panic.  But these users are still not deriving the advertised benefits from async.


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Mikael Ståldal

unread,
May 24, 2016, 10:27:33 AM5/24/16
to Log4J Developers List, mechanical-sympathy
Could we avoid that by using FileDescriptor.out / FileDescriptor.err instead of System.out / System.err ?

On Tue, May 24, 2016 at 4:09 PM, Matt Sicker <boa...@gmail.com> wrote:
All the PrintStream.write() methods have locks.
--
Matt Sicker <boa...@gmail.com>

Remko Popma

unread,
May 24, 2016, 1:22:43 PM5/24/16
to mechanical-sympathy, _...@belliottsmith.com, Log4J Developers List


On Tuesday, May 24, 2016 at 6:13:37 PM UTC+9, Benedict Elliott Smith wrote:
But "response time pauses" are only an issue if you are latency sensitive, which was one of my explicit criteria.  Most users aren't latency sensitive.

Holding a lock is not inherently expensive; in fact it is very cheap on the scale of things we are discussing.  What is expensive is interacting with the scheduler, and "writing to an IO device" (typically just writing to the page cache, so actually not very expensive - again, unless you care about outlier latency).  In one situation you do just one of these things; in the other you do both.  The synchronous behaviour is worse if the lock is highly contended, but that is one of the other criteria: there is a performance/throughput bottleneck in the logging calls.  The semi-synchronous behaviour would avoid this situation, leaving only latency sensitivity.
So to summarize, you are saying that for low workload situations, in addition to synchronous and asynchronous logging, we should consider another algorithm, which is semi-synchronous. This algorithm may not be suitable for latency sensitive applications but has other desirable properties.


NB: There is another possible criteria not mentioned, of course:  Having spare cores, and the logging work being more expensive than the scheduler interaction.  I've not benchmarked log4j, but I'm pretty sure this is not the case for anything resembling a typical log message.

The "LiteBlockingWaitStrategy" does not address this.  In the majority of situations (i.e. a log message rate of < 1-10/ms, as a very rough bound) its behaviour would be extremely similar to any other blocking wait strategy, since at this rate you would on average expect only about one unpark to be inflight at once - with all the usual caveats of depending on overall system load etc etc.
This one I didn't get: I understood your previous messages to mean that your argument against async logging is that the producer thread needs to take a lock to wake up the consumer thread. My understanding is that an uncontended lock is very fast. Taking a lock is only a problem if it is contended. LiteBlockingWaitStrategy ensures the lock is not contended. While not the only way, this is certainly one way to address the concern you expressed against async logging.

Semi-synchronous may be another way to solve the problem, and it may or may not be better (we'd need more data on how it behaves in the field).


Semi-synchronous is simply a producer transitioning to a consumer to process its own message, as waking up the consumer is costlier than doing so.  When it finds work piling up, it decides the cost of waking up the consumer is no longer negative, and does so.

I will also note that for most users these extra costs aren't particularly material either; as soon as your log rate goes down to < 0.1/ms, the extra costs start to become very small as a proportion of total work. So there's definitely no reason to panic.  But these users are still not deriving the advertised benefits from async.
Ok, understood. So the main concern is that the advertised benefits may not materialize under low loads. There are no other downsides or major ill effects. Please correct me if I'm wrong. If there is a major downside I would like to document it on the Log4j web site. (But please don't wait too long, we'll start the release soon.)

 


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Remko Popma

unread,
May 24, 2016, 1:30:43 PM5/24/16
to mechanical-sympathy, log4...@logging.apache.org

On Tuesday, May 24, 2016 at 11:27:33 PM UTC+9, Mikael Ståldal wrote:
Could we avoid that by using FileDescriptor.out / FileDescriptor.err instead of System.out / System.err ?
On Tue, May 24, 2016 at 4:09 PM, Matt Sicker <boa...@gmail.com> wrote:
All the PrintStream.write() methods have locks.
On 24 May 2016 at 02:56, Mikael Ståldal <mikael....@magine.com> wrote:
Do we know why Console has so bad performance (compared to File)? Is this true also if you redirect STDOUT to a file?
and a preview of the updated performance page is here: http://home.apache.org/~rpopma/log4j/2.6/performance.html

I think these are really interesting questions, and I am curious if anyone on this list can share their insights
 

Gil Tene

unread,
May 24, 2016, 1:49:58 PM5/24/16
to mechanical-sympathy, _...@belliottsmith.com, log4...@logging.apache.org
Maybe I'm missing something: who cares what the performance of the algorithm at low loads? Why would we want to invest in an algorithm that is more efficient only when it doesn't matter?

I can see arguments for using varying forms of waiting mechanisms (as in spinning vs. yielding vs. blocking vs. back off hybrids), based on tradeoffs between responsiveness requirement and burned CPU time.  But not for adding a mechanism that only "wins" when things are slow and performance doesn't matter. Why take on the internal complexity and/or the external complexity in APIs and docs?

ki...@kodewerk.com

unread,
May 24, 2016, 2:00:59 PM5/24/16
to mechanica...@googlegroups.com
Hi Gil,

I have had customers that deploy applications in satellite offices where loads are lower and that has translated into higher latencies and they cared because they were in first past the line transactions. So I would say a good scale down test is as useful for a framework as a scale up test is.

Regards,
Kirk

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
signature.asc

Richard Warburton

unread,
May 24, 2016, 2:02:33 PM5/24/16
to mechanica...@googlegroups.com, Log4J Developers List
Hi,

Could we avoid that by using FileDescriptor.out / FileDescriptor.err instead of System.out / System.err ?

On Tue, May 24, 2016 at 4:09 PM, Matt Sicker <boa...@gmail.com> wrote:
All the PrintStream.write() methods have locks.

On 24 May 2016 at 02:56, Mikael Ståldal <mikael....@magine.com> wrote:
Do we know why Console has so bad performance (compared to File)? Is this true also if you redirect STDOUT to a file?

There's a couple of other things to consider:

 * Console writes in Java flush on every newline.
 * The terminal that is being written to might also spend time rendering text, if your benchmark actually writes STDOUT. I've seen terminals eat 3x more CPU rendering logs than a program was using to produce them.

regards,

  Richard Warburton

Benedict Elliott Smith

unread,
May 24, 2016, 2:25:00 PM5/24/16
to Remko Popma, mechanical-sympathy, Log4J Developers List
There is a "major" downside for some users - as an example, imagine a user with a single core server that performs a high rate of logging, and does so once every 100us.  This user is interested in throughput only.  There is reasonable chance of a material slow down with an async appender.  Perhaps as much as 10%; or more, dependent on kernel, vm, core count, core affinity and way in which the runnable status is transitioned - for old paravirtualised boxes running old linux kernels, we could be talking a much worse degradation.

A similar calculus can be done for many cores, but the likely impact in throughput reduces as the involved application core count rises (for a fixed application log rate), or the likely cost of lock contention increases (for a fixed per-thread log rate), both of which mitigating the downside.

The number of users facing a major downside is likely very low (and dependent on how you define "major"), but there are a range of scenarios that vary from a meaningful to an irrelevant downside.

I understood your previous messages to mean that your argument against async logging is that the producer thread needs to take a lock to wake up the consumer thread

The problem is not the taking of a lock to wake-up anybody, the problem is the waking up.  LiteBlockingWaitStrategy still has to transition the runnable state of the consumer for every log message, and this is costly with or without contention.  Semi-synchronous avoids this cost except when it is known to be helpful and can be amortized.  The overall cost to the application will be lower than for async, even in fairly high traffic use cases.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Benedict Elliott Smith

unread,
May 24, 2016, 2:36:43 PM5/24/16
to mechanical-sympathy, Log4J Developers List
Gil, the (simple) algorithm I proposed is more general purpose than either synchronous or asynchronous.  The only situation in which you would want to force fully async would be if you were highly latency sensitive, and this would simply mean the algorithm skips a step - i.e. a branch or two and a boolean property would likely do it.  

There would be no reason I can think of to maintain a synchronous appender, so it would likely reduce the variations of both doc and code, while improving the performance profile.

I don't really see the downside.


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Greg Young

unread,
May 24, 2016, 3:15:56 PM5/24/16
to mechanica...@googlegroups.com
While I get your point, just because I log only occasionally does not
mean I am under low load. Jitter may be important here.
>>>>>>>>>>>>>>>> from it, send an email to mechanical-symp...@googlegroups.com.
>>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>>> Google Groups "mechanical-sympathy" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>>>> it, send an email to mechanical-symp...@googlegroups.com.
>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "mechanical-sympathy" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to mechanical-symp...@googlegroups.com.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mechanical-sympathy" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to mechanical-symp...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-symp...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Studying for the Turing test

Remko Popma

unread,
May 24, 2016, 7:43:30 PM5/24/16
to mechanical-sympathy, remko...@gmail.com, log4...@logging.apache.org, _...@belliottsmith.com


On Wednesday, May 25, 2016 at 3:25:00 AM UTC+9, Benedict Elliott Smith wrote:
There is a "major" downside for some users - as an example, imagine a user with a single core server that performs a high rate of logging, and does so once every 100us.  This user is interested in throughput only.  There is reasonable chance of a material slow down with an async appender.  Perhaps as much as 10%; or more, dependent on kernel, vm, core count, core affinity and way in which the runnable status is transitioned - for old paravirtualised boxes running old linux kernels, we could be talking a much worse degradation.
Okay. I added some text to the trade-offs section for this.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Avi Kivity

unread,
May 25, 2016, 3:19:19 AM5/25/16
to mechanica...@googlegroups.com

As an additional anecdote, we had issues with ScyllaDB incurring higher latencies under low loads, and had to patch seastar to prevent this.

Remko Popma

unread,
May 25, 2016, 10:06:28 AM5/25/16
to mechanical-sympathy


On Wednesday, May 25, 2016 at 4:19:19 PM UTC+9, Avi Kivity wrote:

As an additional anecdote, we had issues with ScyllaDB incurring higher latencies under low loads, and had to patch seastar to prevent this.

Can you explain a bit more about what caused this and how you resolved it?

Remko Popma

unread,
May 25, 2016, 10:09:48 AM5/25/16
to mechanical-sympathy, log4...@logging.apache.org
Is rendering the bottleneck? I'm just wondering what causes this. The 50x difference with writing to a file is staggering... (I haven't benchmark File I/O when flushing on every write, may be interesting to try.)

Mikael Ståldal

unread,
May 25, 2016, 10:18:42 AM5/25/16
to Log4J Developers List, mechanical-sympathy
There are use cases for logging to standard out/err and redirecting to a file or something else than an actual console. It would be good if we would optimize so that is not 50x slower than logging directly to file.

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-...@logging.apache.org
For additional commands, e-mail: log4j-d...@logging.apache.org

ki...@kodewerk.com

unread,
May 25, 2016, 10:52:09 AM5/25/16
to mechanica...@googlegroups.com, Log4J Developers List
Hi,


On May 25, 2016, at 4:18 PM, Mikael Ståldal <mikael....@magine.com> wrote:

There are use cases for logging to standard out/err and redirecting to a file or something else than an actual console. It would be good if we would optimize so that is not 50x slower than logging directly to file.

There is no way to do this. As Richard said the performance of *any* app, not just a Java app, writing directly to stdout will be significantly slower than if you write directly to a file. This is trait of Windows as well as Linux, Unix and just about every other OS I’ve every used that interacts directly with a terminal.

Kind regards
Kirk Pepperdine

signature.asc

Avi Kivity

unread,
May 25, 2016, 11:19:27 AM5/25/16
to mechanica...@googlegroups.com



On 05/25/2016 05:06 PM, Remko Popma wrote:


On Wednesday, May 25, 2016 at 4:19:19 PM UTC+9, Avi Kivity wrote:

As an additional anecdote, we had issues with ScyllaDB incurring higher latencies under low loads, and had to patch seastar to prevent this.

Can you explain a bit more about what caused this and how you resolved it?

Seastar (the thread-per-core async engine behind ScyllaDB) uses polling for both high throughput and low latency.  But people don't like to see smoke rising out of their servers when the machine is idle, so we introduced a sleep mode [1] that falls back to epoll_wait() when there is no work to do.

Seastar connects cores with 1p1c queues.  If you're always polling, the producer can just update the queue index (with a store-release) and continue.  But if the consumer might be asleep, then the producer has to read a variable indicating this, and it has to read it _after_ writing the queue index.  Between the write and the read you need a barrier, and a read-after-write barrier happens to be the most expensive barrier there is.

Since we don't want the expensive barrier for what is mostly unneeded (under load the consumer won't be sleeping), we don't code the barrier at the producer side, but instead we inject it from the consumer into the producer at the time we are preparing to go to sleep (which is fairly rare in comparison).  We do this by calling mprotect() [2], lowering permissions on some private page, which forces the OS to broadcast IPIs to all other processors, with the side effect of executing a memory barrier instruction on those remote processors.  This is incredibly expensive, but we're idling anyway, so we don't care.

The problem happens if our threads are all going to sleep. mprotect() takes a process-wide lock, so all those threads will serialize on it.  If, in addition, the lock holder gets preempted by some auxiliary process (say ksoftirqd), then all of those threads going to sleep will get delayed by a few tens of milliseconds.  If a new request comes at this time, it will have to wait until those threads manage to get to sleep, for them to be able to wake up.

The fix [3] was simply to detect if multiple threads are attempting to issue that mprotect(), and to back off if that happens.

[1] https://github.com/scylladb/seastar/commits/9f9182e4cef251ffbb7bd542e507a016bb5d9b5f

[2] https://github.com/scylladb/seastar/commit/18b8aa1dc0d3fb863fd3072016ef3ee6b4357093

[3] https://github.com/scylladb/seastar/commit/2185f377e317c517e173c099222ad96546500a1a

--

Mikael Ståldal

unread,
May 26, 2016, 3:41:34 AM5/26/16
to Log4J Developers List, mechanical-sympathy
http://12factor.net/logs

(I don't fully agree with this, but it would be nice if we can support it in an efficient way.)

On Wed, May 25, 2016 at 6:50 PM, Matt Sicker <boa...@gmail.com> wrote:
Plus there's the fact that some people are encouraging all logging to happen to stdout (in containers) for simpler log configuration and handling. It would be nice if we can support that scenario better.
--
Matt Sicker <boa...@gmail.com>

ki...@kodewerk.com

unread,
May 26, 2016, 4:32:27 AM5/26/16
to mechanica...@googlegroups.com, Log4J Developers List
Hi,

Sorry this might read a bit provocative then intended. In general I applaud this effort and to be fair I’ve not only picked on Log4J in the past. IM(not so)HO, all logger and in particular JDK logging has been a disaster from both an API and underlying design POV.


On May 26, 2016, at 9:41 AM, Mikael Ståldal <mikael....@magine.com> wrote:


I actually see little wrong with this approach.  It is rare that I run into an application that isn’t bottlenecked on logging when you combine all of the problem that come from logging. The JDK logger, because of it’s design, is responsible for class loader leaks. The API’s in all loggers encourage programming styles that are responsible for excessive memory pressure. Though I gripe about the obscene amount of work that logging frameworks engage in, generally it’s not the logging framework that is directly responsible for memory pressure, it’s what has to be done to use the logging framework that is the source of the problem.

To tackle this I’ve been pushing simple solutions such that the one pushed here. Loggers should only push messages directly to a sync point and do no more than that. Anything that injects extra work into a threads workflow should be discarded. This includes activities such as decorating and/or reformatting running through appenders.. and so on.... Even record ordering (if it needs to be done) should happen else where. Additionally, I give the ( very unpopular) advice) that logging should use binary/machine readable formats (a practice adopted by Twitter and a number of other players) in favor of eye readable formats (which tend to be neither human nor machine readable). In messaging, of which logging is a form of messaging, information density counts and human readable formats tend to be exceptionally information sparse. When you combine these techniques you can get tremendous (orders of magnitude) improvements in performance. As much as I’m a fan of the disruptor, I fail to see how building on top of it solves any of the above mentioned problems.

Kind regards,
Kirk

signature.asc

Remko Popma

unread,
May 26, 2016, 9:47:40 AM5/26/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
Kirk,

Logging efficiently in a binary format has been on my todo list for a while.
We did some preliminary analysis here: https://issues.apache.org/jira/browse/LOG4J2-1305
Your thoughts on this effort would be much appreciated.

Remko

ki...@kodewerk.com

unread,
May 26, 2016, 11:00:44 AM5/26/16
to mechanica...@googlegroups.com
Hi Remko,

A simple way to start is just let me send in a byte buffer instead of a string. Then don’t touch the byte buffer just drop it to disk. It’s up to me to be able to read it though if people want eye readable then an implementation of more or less that can take hints from a schema to read the binary file would be helpful. The first one of these I wrote was in Smalltalk. The data was serialized into a block that I could send a message to and the whole thing inflated it’s self. Made managing tooling around schema versioning much much easier. For it to work this way you have to keep the schema with the data. Or you’ll have to some how manage schema’s. To be honest this is a problem that I push out of the framework into the supporting tooling. That requires some forethought by developers. But quite honestly I find that things tend to go much better when developers think about logging rather than just log.

Regards,
Kirk

On May 26, 2016, at 3:47 PM, Remko Popma <remko...@gmail.com> wrote:

Kirk,

Logging in binary format has been on my todo list for a while.
signature.asc

Remko Popma

unread,
May 29, 2016, 8:36:21 AM5/29/16
to mechanical-sympathy


On Friday, May 27, 2016 at 12:00:44 AM UTC+9, Kirk Pepperdine wrote:
Hi Remko,

A simple way to start is just let me send in a byte buffer instead of a string.
The Log4j 2 API already lets you log any object. Don't use the SLF4J API since it only lets you log Strings.
 
Then don’t touch the byte buffer just drop it to disk.
We can add a Layout for this to Log4j (I created LOG4J2-1397 to track your request), but if you needed this right now, be aware that you can already accomplish this with a custom layout, for example, something like this

This example layout only logs the ByteBuffer though. Would you not also want a timestamp, and maybe other things like log level, logger name and other attributes?
 
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Philippe Marschall

unread,
Jun 3, 2016, 3:42:51 AM6/3/16
to mechanical-sympathy, log4...@logging.apache.org


On Tuesday, May 17, 2016 at 7:13:09 PM UTC+2, Remko Popma wrote:
Hi all,

First, my thanks to the many people who gave helpful advice and feedback on how to measure Log4j response time on this list some time ago.

We're about to start the Log4j 2.6 release.
If anyone is interested, a preview of the garbage-free logging manual page is here: http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
and a preview of the updated performance page is here: http://home.apache.org/~rpopma/log4j/2.6/performance.html

Forgive me my ignorance. Maybe I'm missing something or misunderstanding something. If I want to a custom domain object like this

LOG.info("something happened with {}", aDomainObject);

then #toString() will be called on that object which very likely will result in allocation. If I don't want this my domain object has to implement CharSequence. I image this to be quite though to do without allocation eg. how can I know the value of #charAt(10)? I would have to "render" the entire object without allocation for every #charAt call.

What I would really need is an alternative to #toString() that takes an Appendable. Ideally that would be a JDK interface.

public interface Loggable {

  void logTo(Appendable log);

}

Cheers
Philippe

Remko Popma

unread,
Jun 3, 2016, 11:50:36 AM6/3/16
to mechanica...@googlegroups.com, log4...@logging.apache.org
Philippe, I completely agree. The only thing is that no such interface exists in the JDK, so Log4j 2.6 includes a custom interface:

package org.apache.logging.log4j.util;
public interface StringBuilderFormattable {
/**
* Writes a text representation of this object into the specified {@code StringBuilder},
* ideally without allocating temporary objects.
     *
* @param buffer the StringBuilder to write into
*/
void formatTo(StringBuilder buffer);
}
 
We chose StringBuilder because the Appendable API is not rich enough.
Any Object that is logged, either standalone or as a parameter, if it does not implement CharSequence but it does implement StringBuilderFormattable then Log4j 2.6 will call formatTo(StringBuilder) instead of toString() on it. You can use this to render the domain object to text without creating temporary objects.

I hope this is useful.

Kind regards, Remko
 
Cheers
Philippe

Philippe Marschall

unread,
Jun 5, 2016, 6:39:25 AM6/5/16
to mechanical-sympathy, log4...@logging.apache.org
This is exactly what I'm looking for, I somehow missed that part at the bottom. Thank you for our work, hopefully this will get more exposure in the future.

Cheers
Philippe 

Remko Popma

unread,
Jun 5, 2016, 7:14:45 AM6/5/16
to mechanical-sympathy, log4...@logging.apache.org
My fault: this was not properly documented. I've updated the docs and we will update the site with the upcoming 2.6.1 bugfix release.
Glad this is useful, feedback and feature requests always welcome! -Remko 

Cheers
Philippe 

Anthony Maire

unread,
Jun 6, 2016, 5:16:52 AM6/6/16
to mechanical-sympathy, log4...@logging.apache.org
Hi Remko

Great job ! 
As Martin said, it's really a good thing for the high-performance community that someone is trying to improve existing logging frameworks.
The company I'm working for even started to write it's own custom SLF4J implementation, and now we are evaluating if it's better for us to finish this internal project or to write some custom filters and switch to log4j.

I think there is one possible improvement with boxed primitive types: for the moment we want to use only SLF4J api (as a consequence, we're aiming for a low allocation rate, but not necessarily no allocation at all), so the "Unboxer" mecanism provided in log4j is not an option for us to deal with primitive types.
Is this possible in a future release to have a special cases in ParameterFormatter.appendSpecialTypes() for boxed primitive types ? Currently, when using parametrized message with both Object and primitive parameters, I haven't find a "SLF4J-compatible" way to avoid this allocation when formatting.

Kind Regards,
Anthony

Remko Popma

unread,
Jun 6, 2016, 9:48:12 AM6/6/16
to mechanical-sympathy, log4...@logging.apache.org


On Monday, June 6, 2016 at 6:16:52 PM UTC+9, Anthony Maire wrote:
Hi Remko

Great job ! 
As Martin said, it's really a good thing for the high-performance community that someone is trying to improve existing logging frameworks.
The company I'm working for even started to write it's own custom SLF4J implementation, and now we are evaluating if it's better for us to finish this internal project or to write some custom filters and switch to log4j.
Would you like to hear my totally unbiased opinion/advice? :-)
I'm not a fan of SLF4J... It means you can never use the rich feature set of the underlying logger. Like no lambdas! No CharSequences or plain domain objects, only Strings.
Not sure why people would want to use it in enterprise software. 
Maybe, well, likely, I'm biased :-) but I honestly don't see the advantage.
If the concern is the ability to change your mind later, there is always the log4j-to-slf4j adapter...
 

I think there is one possible improvement with boxed primitive types: for the moment we want to use only SLF4J api (as a consequence, we're aiming for a low allocation rate, but not necessarily no allocation at all), so the "Unboxer" mecanism provided in log4j is not an option for us to deal with primitive types.
Is this possible in a future release to have a special cases in ParameterFormatter.appendSpecialTypes() for boxed primitive types ? Currently, when using parametrized message with both Object and primitive parameters, I haven't find a "SLF4J-compatible" way to avoid this allocation when formatting.

If I understand correctly, you're less concerned with the varargs and the auto-boxed primitives, but would like to avoid calling toString() on the objects that resulted from the auto-boxing (please correct me if I misunderstood). Is something like the below what you have in mind? Would you mind raising a feature request on the Log4j2 issue tracker for this?

private static boolean appendSpecialTypes(final Object o, final StringBuilder str) {
    ...
} else if (o instanceof Long) {
str.append(((Long) o).longValue());
return true;
    } else if (o instanceof Integer) {
str.append(((Integer) o).intValue());
return true;
    } else if (o instanceof Double) {
str.append(((Double) o).doubleValue());
        return true;
} // similarly for float, short and boolean.
...
}


Kind Regards,
Anthony

Anthony Maire

unread,
Jun 6, 2016, 10:38:26 AM6/6/16
to mechanica...@googlegroups.com, log4...@logging.apache.org

Le lundi 6 juin 2016 15:48:12 UTC+2, Remko Popma a écrit :

Would you like to hear my totally unbiased opinion/advice? :-)
I'm not a fan of SLF4J... It means you can never use the rich feature set of the underlying logger. Like no lambdas! No CharSequences or plain domain objects, only Strings.
Not sure why people would want to use it in enterprise software. 
Maybe, well, likely, I'm biased :-) but I honestly don't see the advantage.
If the concern is the ability to change your mind later, there is always the log4j-to-slf4j adapter...
 

Using SLF4J is not a point that I'm allowed to change unfortunately :)
 

If I understand correctly, you're less concerned with the varargs and the auto-boxed primitives, but would like to avoid calling toString() on the objects that resulted from the auto-boxing (please correct me if I misunderstood). Is something like the below what you have in mind? Would you mind raising a feature request on the Log4j2 issue tracker for this?

private static boolean appendSpecialTypes(final Object o, final StringBuilder str) {
    ...
} else if (o instanceof Long) {
str.append(((Long) o).longValue());
return true;
    } else if (o instanceof Integer) {
str.append(((Integer) o).intValue());
return true;
    } else if (o instanceof Double) {
str.append(((Double) o).doubleValue());
        return true;
} // similarly for float, short and boolean.
...
}

That's exactly what I meant. Of course I'm concerned with varargs and auto-boxing, however since I need to use SLF4J, I don't think I really have an option to avoid these. At least I can avoid the toString() with this change. I will raise a feature request :)
 

Remko Popma

unread,
Jun 7, 2016, 7:03:26 PM6/7/16
to mechanical-sympathy


On Friday, May 27, 2016 at 12:00:44 AM UTC+9, Kirk Pepperdine wrote:
Hi Remko,

A simple way to start is just let me send in a byte buffer instead of a string. Then don’t touch the byte buffer just drop it to disk. It’s up to me to be able to read it though if people want eye readable then an implementation of more or less that can take hints from a schema to read the binary file would be helpful. The first one of these I wrote was in Smalltalk. The data was serialized into a block that I could send a message to and the whole thing inflated it’s self. Made managing tooling around schema versioning much much easier. For it to work this way you have to keep the schema with the data. Or you’ll have to some how manage schema’s. To be honest this is a problem that I push out of the framework into the supporting tooling. That requires some forethought by developers. But quite honestly I find that things tend to go much better when developers think about logging rather than just log.
Kirk, some interesting discussion has started on the Log4j ticket for your idea to directly log byte buffers. Would you mind swinging by and adding your feedback?

Regards,
Kirk

On May 26, 2016, at 3:47 PM, Remko Popma <remko...@gmail.com> wrote:

Kirk,

Logging in binary format has been on my todo list for a while.
We did some preliminary analysis here: https://issues.apache.org/jira/browse/LOG4J2-1305
Your thoughts on this effort would be much appreciated.

Remko

On Thursday, May 26, 2016 at 5:32:27 PM UTC+9, Kirk Pepperdine wrote:
Hi,

Sorry this might read a bit provocative then intended. In general I applaud this effort and to be fair I’ve not only picked on Log4J in the past. IM(not so)HO, all logger and in particular JDK logging has been a disaster from both an API and underlying design POV.


On May 26, 2016, at 9:41 AM, Mikael Ståldal <mikael....@magine.com> wrote:


I actually see little wrong with this approach.  It is rare that I run into an application that isn’t bottlenecked on logging when you combine all of the problem that come from logging. The JDK logger, because of it’s design, is responsible for class loader leaks. The API’s in all loggers encourage programming styles that are responsible for excessive memory pressure. Though I gripe about the obscene amount of work that logging frameworks engage in, generally it’s not the logging framework that is directly responsible for memory pressure, it’s what has to be done to use the logging framework that is the source of the problem.

To tackle this I’ve been pushing simple solutions such that the one pushed here. Loggers should only push messages directly to a sync point and do no more than that. Anything that injects extra work into a threads workflow should be discarded. This includes activities such as decorating and/or reformatting running through appenders.. and so on.... Even record ordering (if it needs to be done) should happen else where. Additionally, I give the ( very unpopular) advice) that logging should use binary/machine readable formats (a practice adopted by Twitter and a number of other players) in favor of eye readable formats (which tend to be neither human nor machine readable). In messaging, of which logging is a form of messaging, information density counts and human readable formats tend to be exceptionally information sparse. When you combine these techniques you can get tremendous (orders of magnitude) improvements in performance. As much as I’m a fan of the disruptor, I fail to see how building on top of it solves any of the above mentioned problems.

Kind regards,
Kirk


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages