Feature request: Detection of duplicate test coverage in JaCoCo

918 views
Skip to first unread message

Jakub Schwan

unread,
Jun 26, 2017, 3:04:42 AM6/26/17
to JaCoCo and EclEmma Users

With increasing number of tests, their execution time increases as well. Currently, various tools provide metrics of test code coverage. These metrics however do not help identify duplicate test coverage, that leads only to unnecessary increase of test execution time without any benefit to effective code coverage. Information about duplicate coverage and execution time of duplicate tests would be a valuable help for subsequent refactoring of test code base.

JaCoCo is very useful tool and with this new functionality we can provide more helpful information about the test coverage.

 

Possible solution:

  1. Extend the data file format to include data about

    • Caller (test invoking the piece of code e.g.: line, branch, method, class,…)

    • Time spent in the piece of code

  2. Enhance the code coverage tool to store information described in point 1

  3. Enhance the tool’s reporting capabilities to present information about

    • Duplicate code coverage

    • Time spent in running tests

I have no problem to discuss about better solutions.

 

What I want to do:

 

I want to implement solution and write a bachelor thesis at my university (Faculty of Informatics, Masaryk University, Brno, Czech Republic) about this.

 

Expected result:

 

Functional solution and report capability to present information.




How do you like this idea? Best regard,
Jakub Schwan

Marc R. Hoffmann

unread,
Jun 26, 2017, 5:14:18 AM6/26/17
to jac...@googlegroups.com
Hi Jakub,

thanks for the proposal and the detailed description! Some remarks your possible solution:
  • JaCoCo is a code coverage tool with low overhead and scales even for very large test sets. Collecting additional information like call traces and execution times are clearly out of scope.
  • To clearly manage expectations: Of course you can implement this based on the free JaCoCo. But unless you come up with a solution with very little additional complexity and overhead it is very unlikely that we will incorporate this into the JaCoCo main development line.
  • JaCoCo simply sees Java classes, it has no notion of e.g. JUnit test cases.
But I think there is a slightly different solution which should solve your problem (this solution is already implemented in SonarQube to some extend):
  • Implement a JUnitExecutionListener which uses the public JaCoCo runtime API.
  • The execution listener creates a separate session id and dump for every test case.
  • Implement a new Analyzer based on the JaCoCo analysis APIs which combines multiple sessions into a single report. Note that the exec files already contains dump timestamps.

Cheers,
-marc
--
You received this message because you are subscribed to the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jacoco+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jacoco/cdd5ecf2-a2b8-4488-8e13-4b3928fb0a6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jakub Schwan

unread,
Jun 27, 2017, 3:11:26 AM6/27/17
to JaCoCo and EclEmma Users
Hi Marc,

thanks for response. I'm awere that JaCoCo has low overhead and scales and I don't want to increase overhead. And if I come up with a solution, then this new functionality should be turned off for default run and can be turn on by some new system property.

If JaCoCo sees only Java classes, then JaCoCo only know, that these classes were triggered, but nothing more, right?

And thanks for different solution. I will think about it.

Regards,
Jakub

Marc Hoffmann

unread,
Jun 27, 2017, 12:18:28 PM6/27/17
to jac...@googlegroups.com

Ciao Jakub,

> If JaCoCo sees only Java classes, then JaCoCo only know, that these classes were triggered, but nothing more, right?

JaCoCo only see classes which have been loaded during execution. For each class probes are inserted to record which instructions and branches have been executed at least once. Our documentation gives an overview of the implementation strategies:

http://www.jacoco.org/jacoco/trunk/doc/implementation.html

http://www.jacoco.org/jacoco/trunk/doc/flow.html

Regards,
-marc

--
You received this message because you are subscribed to the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jacoco+un...@googlegroups.com.

jsc...@redhat.com

unread,
Sep 14, 2017, 7:49:01 AM9/14/17
to JaCoCo and EclEmma Users

Hello Marc,


First, I want to apology for a delay of my response.


I started discussion about how to detect test duplication by using of JaCoCo. Maybe I didn’t describe this functionality well. The goal is to extend coverage report by source calls.

I see there two major benefits. With this new functionality we will be able to know from where was covered code called (from which tests) and we could find duplicated test scenarios easier, it can be used for pruning duplicate tests and by that, decreasing test execution time of any project using JaCoCo.



I prepared a simple demo of how to get source code call in JaCoCo run. https://github.com/jakubschwan/jacoco-demo


I know that this new feature changes JaCoCo functionality and it should be by default not activated. So this new functionality will be added as extension of JaCoCo and can be turned on and off by a system property. That means that by default JaCoCo will have the same behaviour as before.


Next steps

  1. I will start working on changes in API based on the demo and implementing the functionality afterwards. To avoid introducing new bugs, every pull request will contain tests.

  2. As this enhancement, when turned on, may have impact on performance, I will provide performance comparison of  JaCoCo runtime before changes, after changes and after changes with functionality turn on.


Do you agree with this plan? If you have any concerns please let me know.


Best regards

Jakub Schwan

Marc Hoffmann

unread,
Sep 14, 2017, 12:57:03 PM9/14/17
to jac...@googlegroups.com

Hi Jakub,

I'm curious to see your solution or a description of the aproach.

Regards,
-marc

jsc...@redhat.com

unread,
Sep 21, 2017, 8:22:41 AM9/21/17
to JaCoCo and EclEmma Users

Hello Marc,


Here is my description of the approach.


  • Add the option to choose a mode of JaCoCo run. This option could be handled by a system property named e.g. org.jacoco.source.detection.enabled with possible values true or false configured in the maven pom file. Default value should be false and JaCoCo execution will be the same as before.

  • Identify test calling source code.

    • We can get this data from actual stack trace when code was visited in a coverage counters spots (places analyzed by JaCoCo). An extension of the ClassProbeAdapter and changes in org.jacoco.core.internal are needed.

    • Verify new functionality for all analytical methods.

  • Data storing

    • Implementation of storing extension in package org.jacoco.data.core. We need to store data about from where was code called. Saving should be fast and duplicities of calls should be ignored.

  • Display data

    • Extend actual report with data about source calling (changes in org.jacoco.report package),

    • Data merge - in the jacoco report we can see coverage from package to code. For a package it would be nice to display all tests. So I would like to show what tests cover each particular package.


Best regards

Jakub

Marc Hoffmann

unread,
Sep 21, 2017, 11:38:22 AM9/21/17
to jac...@googlegroups.com

Hi Jakub,

  • We can get this data from actual stack trace when code was visited in a coverage counters spots (places analyzed by JaCoCo). An extension of the ClassProbeAdapter and changes in org.jacoco.core.internal are needed.

 

I don't think this is the correct place: ClassProbeAdapter is used at the time of instrumentation or analysis. It is not invoked at execution time (only indirectly when using on-the-fly-instrumentation and classes are loaded the first time).

Execution is recorded by bytecode probes inserted by the instrumentation process, please see http://www.jacoco.org/jacoco/trunk/doc/flow.html

Regards,
-marc

--
You received this message because you are subscribed to the Google Groups "JaCoCo and EclEmma Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jacoco+un...@googlegroups.com.

michae...@gmail.com

unread,
Nov 24, 2017, 3:43:38 AM11/24/17
to JaCoCo and EclEmma Users
Hi Marc, hi Jakub,

recently I started my work on a tool to find dublicate tests in a large test base. Base idea is to collect execution pathes of all tests through the system under test. Then in an anlysis phase execution pathes are compared and dublicates determined (mostly just like Marc wrote above).

The problem with this: To find dublicates it's not enough to know which probes have been traversed at all, but you need to know exactly the exact flow through the probes, i.e. a test doing first A then B is some different than doing first B then A. A test doing 1 time A is something different than a test doing 2 times A and so on.

My question: Is it now possible get the "probe-flow" out of jacoco. If not, do you think it fits into concept?

Regards,
Michael

Marc Hoffmann

unread,
Nov 24, 2017, 9:59:05 AM11/24/17
to jac...@googlegroups.com
Hi Michael,

> Is it now possible get the "probe-flow" out of jacoco

No, probes are simple boolean flags.

> If not, do you think it fits into concept?

The instrumentation techniques used by JaCoCo could also be used for a
more complex runtime which records every probe execution in its
sequence. This will add significant runtime overhead, that's why this is
not in scope of the JaCoCo project.

Regards,
-marc
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages