I'm using jacocoagent tcpclient mode. Is it possible to retry tcp connection when it lost it's connection?

98 views
Skip to first unread message

il2s...@gmail.com

unread,
Sep 28, 2020, 9:33:59 PM9/28/20
to JaCoCo and EclEmma Users

I couldn't find 

retry tcp connection in "tcpclient" mode when it lost it's connection.


For Example,

    TCP Server   <-------  jacocoagent(tcpclient mode)

1. Conneciton established
2. TCP Server down
3. Connection lost
4. TCP Server restart
5. No retry

Have any idea for this ?

Thank you.


Marc Hoffmann

unread,
Sep 29, 2020, 12:23:24 AM9/29/20
to jac...@googlegroups.com
Hi,

there is no reconnect mode. And I doubt it’s a good idea to add one. As the event “TCP Server down” can happen at any point in time there are arbitrary error conditions in the client incl. data loss .

Regards,
-marc


--
You received this message because you are subscribed to the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jacoco+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jacoco/d7ef4354-1fef-4ee0-b7dc-18782e81e6c1o%40googlegroups.com.

Nathan Wray

unread,
Oct 15, 2020, 1:20:46 PM10/15/20
to JaCoCo and EclEmma Users
Funnily enough this is exactly the question I was researching so fortunate to see it asked and answered so recently.

Our use case (proposed use case) is to stand up a central server (coverage collection server) to request/receive JaCoCo data periodically and store it, probably on a rolling window basis. 

We would deploy the agents against Tomcat apps in tcpclient mode, with sessionIDs identifying which application is being instrumented. These tomcat servers (tcpclient) would be used for automated and manual test cases, sometimes over the course of days, and the servers would be expected to be "up" without restarts for long periods of time. 

Presumably the collection server could at some point be bounced or fail, at which point all of the tcpclients would be hung and need to be restarted. The idea I'd been considering as I look though the code was to add an optional "retry" setting specific to tcpclient that would cause TcpOutputClient to create a new TcpConnection on failure (possibly after some delay, which conceptually could be configurable). It seems relatively clear to me how to retrofit this. For my use case at least, I would ignore any data that happened to be in transit during a failure. I have not looked at what would happen to the visitor if the server were killed during data transmission.

Mark, if I were to do this work, add the test cases, update the documentation, would this be a pull request that would be considered? Or is this functionality that you do not consider a fit for the project?
What is the right process to surface this for consideration?

Thank you
Nathan


Marc Hoffmann

unread,
Oct 15, 2020, 2:00:03 PM10/15/20
to JaCoCo and EclEmma Users
Hi Nathan,

thanks for the detailed description of you use case!

I’m still not convinced it is a good idea to add distributed/resilient to the coverage agent. There are simply too many failure scenarios to test and get right. Especially in the case of JVM shutdown you don’t want to have wait/retry loops.

In you case running the agents in TCP client mode and pool them periodically with temporary connections looks like the more robust setup.


Using this API you can create your own distributed upload logic and package that with your application.

Regards,
-marc



   

Nathan Wray

unread,
Oct 15, 2020, 2:16:05 PM10/15/20
to JaCoCo and EclEmma Users
Marc, thank you for your reply. And let me say I've been a long time fan of EclEmma. So thank you for your work.

Regarding your suggestion: 
      "In you case running the agents in TCP client mode and pool them periodically with temporary connections looks like the more robust setup.

I think you're suggesting the Tomcat app servers be run in TcpServer mode, and then polled periodically from a central Client collector. This would definitely work and solves the long-term connection issue. What appealed to me about running the many app servers as TcpClient is that I don't need prior knowledge of which servers exist at the central server. From that point of view, a new Tomcat application (using TcpClient) could be created and would connect to the collection server without any configurations needing to be updated. 

Likewise polling the mbean interface would require a list of which addresses to visit. Not to mention our corporate policy has JMX locked down in our farm.

I'll think about this further and see which makes the most sense. Thanks again for your help.
Nathan





Marc Hoffmann

unread,
Oct 15, 2020, 2:23:15 PM10/15/20
to jac...@googlegroups.com
Hi Nathan,

thanks for the kind words!

I see the advantage of the instances connecting to the central service. That’s why I was suggesting you implement that logic on your own (you can even do HTTP POST requests).

By the API I’m not talking about JMX. The API can be directly used from within Java code deployed with your application. This code could e.g. periodically receive JaCoCo exec data and post it to the central server:

  IAgent agent = RT.getAgent();
  byte[] exec = agent.getExecutionData(true);
  // POST exec to your central service

This does not depend a established connection.

Regards,
-marc



Nathan Wray

unread,
Oct 15, 2020, 3:41:16 PM10/15/20
to JaCoCo and EclEmma Users
If I understand you correctly, this would be a small bit of code I'd create and package into my apps along with the JaCoCo jars. 

I thought about a similar pattern but the drawback I see is that build dependency change and the introduction of JaCoCo-specific code and jars into production. The javaagent approach lets me instrument only test without any prerequisites that I need to enforce out to the different development teams, and doesn't impact production. So that's a preferred attribute of a good solution.

Thank you
Nathan



Marc Hoffmann

unread,
Oct 15, 2020, 3:49:54 PM10/15/20
to JaCoCo and EclEmma Users
Correct!

That could be a separate WAR that you deploy on the test Tomcats.


Nathan Wray

unread,
Oct 20, 2020, 11:51:48 AM10/20/20
to JaCoCo and EclEmma Users
Marc, I went ahead and implemented what I had in mind for reconnect. If you have time please review, I'd appreciate any feedback. We're intending to use this internally in the meantime.


Thank you
Nathan

Evgeny Mandrikov

unread,
Oct 20, 2020, 12:17:03 PM10/20/20
to JaCoCo and EclEmma Users
Hi Nathan,

Like Marc, I feel that such functionality is better to be implemented outside of JaCoCo.
If you don't like the approach of a special WAR suggested by Marc in
and prefer agent, then you can implement your own agent that will be talking with JaCoCo agent.

For example
after compilation

javac -cp jacoco-0.8.6/lib/jacocoagent.jar -d . Agent.java

of the following Agent.java

package example;

class Agent {
    public static void premain(String agentArgs, java.lang.instrument.Instrumentation inst) {
        System.out.println("JaCoCo version: " + org.jacoco.agent.rt.RT.getAgent().getVersion());
    }
}

and creation of jar

jar cfm agent.jar manifest.txt example

using the following manifest.txt

Premain-Class: example.Agent

execution of
java -javaagent:jacoco-0.8.6/lib/jacocoagent.jar -javaagent:agent.jar -help
will print version of JaCoCo.
Similarly your agent can request dumps and send them out.

Regards,
Evgeny

Nathan Wray

unread,
Oct 20, 2020, 12:42:54 PM10/20/20
to JaCoCo and EclEmma Users
Evengy, respectfully the tcpclient functionality already exists in the Agent. The problem I've tried to address here is one of brittleness. The tcpclient fails if the server is not ready at startup or has any hiccups during execution. The changes here simply allow it to recover at no real cost in complexity.

Thank you
Nathan

Nathan Wray

unread,
Oct 20, 2020, 2:01:24 PM10/20/20
to jac...@googlegroups.com
I believe Marc's concerns with implementing the change were primarily around Thread and delay complexity, I hope I've answered that in code by utilizing the TaskTimer in daemon mode. Additionally I've restricted the reconnection to cases where an IOException have been thrown, allowing for normal shutdown to happen as expected.

I'm not disagreeing that there are multiple other approaches (there always are). But I believe tcpclient is a very useful pattern and the best fit for my use case. Making tcpclient more robust increases its usefulness significantly.



You received this message because you are subscribed to a topic in the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jacoco/QsyvL7bUDng/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jacoco+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jacoco/657417cd-b47e-47ef-896d-ed3603740e69n%40googlegroups.com.

Evgeny Mandrikov

unread,
Oct 20, 2020, 3:53:01 PM10/20/20
to jac...@googlegroups.com
On Tue, Oct 20, 2020 at 6:42 PM Nathan Wray <nw...@detroitsci.com> wrote:
The changes here simply allow it to recover at no real cost in complexity.

Here is what is usually and maybe even here underestimated / not considered - every additional feature has at least costs of future answers to user questions about it ;)

IMO one of the main concerns is about

On Tue, Sep 29, 2020 at 6:23 AM Marc Hoffmann <hoff...@mountainminds.com> wrote:
data loss

On Thu, Oct 15, 2020 at 7:20 PM Nathan Wray <nw...@detroitsci.com> wrote:
ignore any data that happened to be in transit during a failure

while for your use-case data loss might be acceptable, this is not necessarily the case for others and might be really hard to diagnose for us when some users will face this.

And I'm not sure it was addressed because

On Thu, Oct 15, 2020 at 8:00 PM Marc Hoffmann <hoff...@mountainminds.com> wrote:
I’m still not convinced it is a good idea to add distributed/resilient to the coverage agent. There are simply too many failure scenarios to test and get right. Especially in the case of JVM shutdown you don’t want to have wait/retry loops.

Please trust our experience - JVM shutdown hooks are not easy to get done right, especially in the case of multiple threads, thread groups, timers.
For example to me not obvious why your changes do not add any synchronization, aren't timer thread modifies "connection" field that shutown thread reads?
And I don't see any new tests, while existing are failing.

Moreover, I don't understand why you're asking to review changes, despite the fact that in all responses project maintainers aren't in favor of this idea in general and suggest using JaCoCo APIs instead.
Not every feature has to be in core, especially if it can be implemented using already provided APIs - why not start developing your idea as a separate project without modifications in JaCoCo, so that you'll be able to learn by feedback from users about the robustness of implementation and approach in general, we'll be happy to mention it in https://www.jacoco.org/jacoco/trunk/doc/integrations.html

Regards,
Evgeny

Nathan Wray

unread,
Oct 20, 2020, 6:15:22 PM10/20/20
to jac...@googlegroups.com
Thanks for your reply Evegeny. I added this change because I intend to use it regardless. I was hoping I could answer some of Marc's concerns by having him review the change, but this is the build we'll be using internally.

The shutdown hook with the daemon timer should be as stable as it was with the daemon worker thread in terms of a shutdown hook. 

I wasn't aware that any of the tests had failed but I'll take a look. I said in my request that I'd be happy to add tests and documentation pending feedback, but based on what has already been said I wasn't optimistic this would get picked up.

If you consider the case where the server fails during data transfer, are you any worse off if the client tries to reconnect 3 seconds later? The server has already failed and the data is already (potentially) lost, adding retry didn't cause that. You could argue that retry makes it less obvious that only some of the data was uploaded. I'd be open to logging the exception before scheduling the reconnect if that makes any difference to you.

I can also add synchronization to shared member changes if that would satisfy your concerns, but I think I have my answer. Sincerely thank you for your feedback and for maintaining this project.

Best regards
Nathan



--
You received this message because you are subscribed to a topic in the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jacoco/QsyvL7bUDng/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jacoco+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages