request logCorrelation between servers

336 views
Skip to first unread message

Mark Struberg

unread,
Sep 30, 2016, 2:10:01 PM9/30/16
to MicroProfile
Hi folks!

I have another topic I would love to get coordinated between different vendors if you are interested: logCorrelationId

What is it? It is a unique Id which is generated on the 'boundaries' of a system (eg. a user clicks on some button triggering an AJAX call) and then gets handed over from one server to another. Whenever a log output is performed this unique logCorrelationId will get printed out. I usually do this via log4j2 MDC. Slf4j supports it in the API as well. 

Whenever the first microservice invokes another box via REST (or even SOAP) it hands over this id as HTTP header with the name X_LOG_CORRELATION_ID.
The other box then takes this logCorrelationId and reuses it. That way you can trace a request through your whole system.

I've written a blog post on this topic early this year 

I'm doing this since quite a few years but I'm sure I'm not the only one. Does swarm or spring boot have a similar mechanism already? Is there something we can standardise on?

It actually doesn't matter how one implements this and which logging framework each microservice uses.
The most important thing in my opinion (or maybe even the only one) is to agree on the HTTP header parameter name!

If there is a request with this very http header parameter -> re-use it as logCorrelationId for this thread. If none -> create a fresh uuid as logCorrelationId.

Of course for eventing systems we need to find another way to pass the token.

Wdyt?

LieGrue,
strub

Mark Struberg

unread,
Sep 30, 2016, 2:13:38 PM9/30/16
to MicroProfile
PS: a few days ago Christan Schneider (of Apache CXF and Camel fame) did read my blog post and pointed me to a similar functionality in CXF which could be used for this:

Ondrej Mihályi

unread,
Sep 30, 2016, 3:07:56 PM9/30/16
to MicroProfile
I second this idea, it really helps track what is going on in the distributed system.
But I would not limit it only to logging, as the information can be used for example in monitoring tools and even inside business logic. I would rename it to X_REQUEST_CORRELATION_ID, X_ACTION_CORRELATION_ID or similar.

In messaging and asynchronous systems, this could be extended by another header, which would add a message correlation id (X_MSG_CORRELATION_ID), so that when a response comes back to the requesting system, it can be matched with the initial message to continue processing. This is very important to build reactive and highly scalable systems.

-- Ondrej

Dňa piatok, 30. septembra 2016 20:13:38 UTC+2 Mark Struberg napísal(-a):

Ken Finnigan

unread,
Sep 30, 2016, 3:12:27 PM9/30/16
to Ondrej Mihályi, MicroProfile
+1 to the idea, but also agree it needs to be non log specific.

Especially if we want to integrate some kind of call tracing in the future. It would be preferable to have a single ID that's passed.

I'd also say it doesn't necessarily need to be a boundary thing. We could say that even within a system the ID should be set when a call/request is received, if not already present.
But we could just leave that up to implementors to choose to do or not.

Ken

--
You received this message because you are subscribed to the Google Groups "MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile+unsubscribe@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microprofile/00718f06-c048-4a31-a5a0-d7ae4d31fca7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Struberg

unread,
Sep 30, 2016, 3:27:52 PM9/30/16
to MicroProfile
I don't care much about the name. I implemented this at a few customers with that name. But it wouldnt be a big problem to change this. The name is carefully choosen but certainly not an 'industry standard' yet ;)

I did choose the name because it was long and unique enough to not create clashes with other headers and short enough to not cause buffers to overflow.

Btw, another nice use case:
By leveraging the logCorrelationId in the HTTP header my colleagues and I have even been able to exactly pin down network connection aborts due to timeout. You can grab that known HTTP header on your F5 or even capture the whole traffic on a fiber splitter on the boundary to an external data center with wireshark etc. And later on you can follow how each request went through the whole network. 
The benefit over purely network based solutions is that you also get the whole info about the business logic. Because each and evry log line of your business logic involved in the business request also contains the same logCorrelationId.

Werner Keil

unread,
Sep 30, 2016, 3:35:06 PM9/30/16
to MicroProfile, ondrej....@gmail.com
Beside a lot of commercial closed-source security tools in this area, for the wider "Event Correlation" (that of logs a subset) Open Source Tools are mentioned here

Including jBPM/Drools at least 3 of those should still be Open Source.

Cheers,
Werner

Mark Struberg

unread,
Sep 30, 2016, 3:42:01 PM9/30/16
to MicroProfile, ondrej....@gmail.com
Hi Ken!

What about X_RQ_CORRELATION_ID or just X_RQ_CORRELATION_ID ? Ondrej, wdyt?

At a customer I also introcuded an interface LogCorrelationUtil where you can pass in a principalId, a correaltionId etc. It automatically generates a fresh one if there was none detected etc. Could make a nice microprofile API. We use this to e.g. propagate the correlationId to @Asynchronous calls etc inside your business logic

> I'd also say it doesn't necessarily need to be a boundary thing. We could say that even 
> within a system the ID should be set when a call/request is received, if not already present.

Well 'boundary' is the first system which doesn't get this header. 

And yet another trick:
In our process-engine we set the process-id as correlationId. Thus we can also fallow all the stuff which got triggered as result of a process execution all over our distributed systems. Whenever you have an incident in a process you just type the process-id (camunda generates a nice cam-uuid one) into Splunk and you get all the info what happened due to this very process all over the place.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.

To post to this group, send email to microp...@googlegroups.com.

Ondrej Mihályi

unread,
Sep 30, 2016, 5:44:44 PM9/30/16
to MicroProfile, ondrej....@gmail.com
X_RQ_CORRELATION_ID is good, we can even shorten it to X_RQ_CORREL_ID.

Providing utility methods to create and read an id would be nice. Having it generated each time when it is not present ensures that it is generated for all boundary requests.

I would even consider providing a custom JAX-RS ClientBuilder to add the correlation id automatically and a ContainerRequestFilter filter to read the id from request and store it, so that the custom builder can pick it up and forward it to other services.

--Ondrej

Dňa piatok, 30. septembra 2016 21:42:01 UTC+2 Mark Struberg napísal(-a):

Werner Keil

unread,
Oct 1, 2016, 1:41:11 PM10/1/16
to MicroProfile, ondrej....@gmail.com
A neutral name like X_RQ_CORRELATION_ID or X_RQ_CORREL_ID sounds good. Doesn't restrict it to Log Event Correlation only.

Werner

Anatole Tresch

unread,
Oct 13, 2016, 2:50:59 PM10/13/16
to MicroProfile
This feature IMO is an absolute must for modern environments. I reimplemented the mechanism used in Credit Suisse a couple of years ago and also proposed it as Java EE features in the public EE EG session about 3 years ago. It hat rather good support, so if we can clarify our ideas and have a running solution, I think this will be a well adopted feature. Also for configuration it is important, so I can ensure traceability of application configuration to a certain state of configuration (which could be a remote one, of course).

Anatole Tresch

unread,
Oct 13, 2016, 2:54:56 PM10/13/16
to MicroProfile
My question is, if a simple UUID is enough. Basically the more generic concept (we also implemented at Credit Suisse successfully), was a distributed execution context. Of course identified by a UUID. But the mechanism could be leveraged a lot by adding a Context provider, where I can access the corresponding Context based on the UUID (as a String-String Map), which basically is equivalent to access of a (probably distributed) subconfiguration tree, e.g. defined as

context.UUID.*


Am Freitag, 30. September 2016 21:27:52 UTC+2 schrieb Mark Struberg:

Anatole Tresch

unread,
Oct 13, 2016, 2:58:49 PM10/13/16
to MicroProfile
Follow up:

Use cases for which such a feature (context) has proven to be useful:

- security classifications
- use case based security and configuration
- stage support
- test and canary service execution
- cost accounting

Obviously many of these aspects can also be handled via http headers. But the context API gives more flexibility my being independent and tranparent of the current technology used, be it http, RMI ow whatever else...

James Stansell

unread,
Oct 14, 2016, 7:57:37 PM10/14/16
to MicroProfile, ondrej....@gmail.com
Note that the X prefix for parameters (including HTTP header names) is not necessary and is generally not recommended.

https://tools.ietf.org/html/rfc6648

-james.

Heiko Braun

unread,
Oct 15, 2016, 1:59:18 PM10/15/16
to MicroProfile
Can you elaborate on the use cases? I don't yet see how a correlation id relates to the use cases you mention.

Gary Brown

unread,
Oct 17, 2016, 5:00:27 AM10/17/16
to MicroProfile
If interested in logging against a common id, across a set of invoked services, would it be better to consider adopting the opentracing standard as part of microprofile?

Main site: http://opentracing.io/

Java API for an individual span: https://github.com/opentracing/opentracing-java/blob/master/opentracing-api/src/main/java/io/opentracing/Span.java

So would provide tracing of a call across distributed services, but would also allow log messages to be associated with that trace id.

Regards
Gary

Ondrej Mihályi

unread,
Oct 17, 2016, 6:05:50 AM10/17/16
to MicroProfile
Gary, thanks for the information about opentracing.

I think we should try to adopt any existing standard, if it solves a particular usecase. Especially, promoting standards that span Java ecosystem, like opentracing, fits the idea of interoperability with other non-microprofile technologies, which is very important in polyglot and heterogeneous character of most microservice architectures.

 --Ondrej

Dňa pondelok, 17. októbra 2016 11:00:27 UTC+2 Gary Brown napísal(-a):

Werner Keil

unread,
Oct 27, 2016, 7:37:58 AM10/27/16
to MicroProfile
Yep, seems Microprofile isn't the only "Cool Kid" on the schoolyard when it comes to Microservices ;-D

And Cloud Native Foundation was discussed among options here, too. The way they do certain things may be a bit more "Github style" than what Eclipse requires, e.g.
>Copyright 2016 The OpenTracing Authors
may sound fine there, but unless Eclipse Foundation really accepted something totally different now, the file headers should say the company or person who contributed, not just "Microprofile Authors".

I last saw him at Twitter a year or so back, but Adrian Cole was initial creator of projects like http://jclouds.apache.org/ so he did "Microservices" before somebody called it that way;-)

Werner

James Carman

unread,
Nov 22, 2016, 9:20:22 AM11/22/16
to MicroProfile, ondrej....@gmail.com
The "X-" prefix and the like has been deprecated, correct?  

https://tools.ietf.org/html/rfc6648

James Carman

unread,
Nov 22, 2016, 9:24:07 AM11/22/16
to MicroProfile
In Microbule, we will propagate a "trace id" if it's passed in as a header on the request.  Otherwise, we generate one.  We also generate for every request a "request id" so you can bunch log messages together that are *only* for the duration of one request on one service (we've found that seeing everything form everywhere can sometimes be noisy).  Both of these values are added to the MDC via SLF4J.  The header names are configurable of course, so you can adapt to any environment, in case folks already have existing standards.  

Erin Schnabel

unread,
Nov 29, 2016, 12:59:08 PM11/29/16
to MicroProfile
+1 to opentracing.

Emily Jiang

unread,
Jan 10, 2017, 5:47:31 PM1/10/17
to MicroProfile
Revisit this thread... What is the conclusion on this? Will an API needed for this? Both log correlation and trace correlation are interesting aspects. Who will work on this?

Emily

Praveendra Singh

unread,
Jan 11, 2017, 9:27:54 AM1/11/17
to MicroProfile
I have been using correlation id technique for quiet sometime. The idea is very good. it enables us to track down all the services involved in completing the operation. Apart from the corrId, we also used the serviceID as an another parameter which enables us to find out where the call started. This new parameter enables a given microservice to know the originator without getting access to logs of other services.

UUID is a way to get the corrId. Sometimes we want to store this corrId in DB for correlating the messages being emitted out (via message brokers) to another services as part of the same operation/api call. What we wanted is to put back the corrId as part of the message payload so that it gets tracked down. It helps in debugging and auditing as well.
So now we got a need to store UUID/corrId, it means we need to create index on this column for easy search. The B+ tree index built on UUID doesn't perform well and hence we had to optimize it a bit (https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/).

There are several use cases of corrID which I can think of right now:

  1.  API tracking - send corrId as part of API request header
  2. Event Tracking - send corrId as part of event payload
  3. Data Linking - the processor of the event/API needs to persist the corrId as part of their event sourcing table and hence provides an easy linking of the data across services.
 
This is how I approached to the solution of the corrId.
 There are 3 components of corrId design:
  1. servlet filter - reads corrId header and pushes to MDC (slf4j's logback or log4j providers). at the end of the filter, it cleans up the entry from the MDC.
  2. logback/log4j uses the key from MDC and prints in all the log messages
  3. client layer: if service is using any client library to make a call to another services then it uses library's request-interceptor to inject corrID (from MDC) to request header

Reply all
Reply to author
Forward
0 new messages