[DISCUSS] Adding static code analysis into our review process

80 views
Skip to first unread message

Florian Hockmann

unread,
Aug 29, 2018, 9:50:47 AM8/29/18
to JanusGraph developers
JanusGraph currently uses Coverity as a static code analysis tool. While this already led to some findings, it can unfortunately not be used to analyse code changes from pull requests. I wondered whether we could improve our review process with a service that analyses changes in pull requests and directly comments on the PRs to report its findings. This hopefully makes reviews more efficient as reviewers don't have to comment on obvious style issues for example and it could find problems before they are added to production branches.

I already searched for a service that we could use. In my opinion, it should fulfil these requirements:
  1. Support for the languages we currently use / might add in the near future: Java, C#, Python, JavaScript / TypeScript.
  2. Integration into code reviews in GitHub.
  3. Free for open source projects with enough scans per day (some services limit these to 5 per day which could be not enough).
GitHub already lists services that can be used for code reviews here. Codacy seems to be the only service of those listed there that fulfils all three requirements. I wanted to try out Codacy and therefore already let it analyse forks of JanusGraph and the version of JanusGraph.Net that is currently in review:
So, what do others think about the idea to integrate a code analysis service into our review process in general? And are there any other suggestions for such a service besides Codacy?

Florian Hockmann

unread,
Mar 1, 2019, 8:08:03 AM3/1/19
to JanusGraph developers
We now have Codacy activated for all repos in the JanusGraph organization. The dashboard for the main repo can be found here:


Codacy automatically checks pull requests to verify that they don't introduce new issues. By default, it only adds that information as a check, just like how Travis reports the build result. I now also configured Codacy to add inline comments directly where a new issue is introduced in the PR and also a summary comment.
We can of course change that if we want less integration of Codacy in the PR process.

Codacy will likely complain about a lot of issues that we don't really care about, so false positives basically. I suggest therefore that we don't treat Codacy as a gateway keeper that prevents pull requests from being merged, especially in the beginning.
Instead, we should determine which checks don't make sense for us so we can disable them. So, if you spot any things Codacy complains about but it shouldn't in your opinion, then just write here and we can remove them.

Codacy will hopefully make the review process easier as reviewers don't have to look for issues that Codacy can already find and contributors get quicker feedback if they introduce new issues.

Committers / TSC members: If you want to be able to configure Codacy for JanusGraph, then please just send me an email and I will add you: f...@florian-hockmann.de. (I already invited some of you where I have your email addresses already.)

Oleksandr Porunov

unread,
Mar 1, 2019, 10:35:36 AM3/1/19
to JanusGraph developers
Thank you Florian for integrating DCO and code analysis!

Dmitry Kovalev

unread,
Mar 14, 2019, 8:51:57 PM3/14/19
to JanusGraph developers
My recent PR https://github.com/JanusGraph/janusgraph/pull/1483 must be one of the first victims of this change, and I thought I should provide some feedback from the contributor's perspective while it is fresh :)

Without heading straight into discussion about the actual usefulness of 99.9% of the issues that this Codacy tool has found, I would like to check what people think about the way they are presented.

I think that the idea of posting each "issue" as a comment on PR leads to this bot completely hijacking any meaningful discussion(s) on the PR. And it is clearly unable to clean up after itself when the "issues" are addressed. 

I can see that some of the "stale" "issues"/comments are deleted by "JanusGraph" (another bot?), but some remain intact even though they are actually addressed, and it seems like it duplicates the same comments with every new commit if the committer "dared" to ignore them, so it has a potential to snowball to insane levels unless it is pacified completely, and each and every "issue" is resolved to its satisfaction.

I hope that the mode where each and every "issue" it finds is posted as a comment can be switched off, otherwise IMO this bot does much more harm than good, and is basically just a big nuisance. There is a link to full report from the "check" entry, and I think this is the only thing required on the PR page.

Coming back to the usefulness of the "issues" it reports - so far I have seen 3 broad categories:
1) "issues" which are absolutely useless in terms of code quality or readability, but at least are easy to "fix"
2) "issues" which could be helpful (such as method complexity) if they were tuned to more reasonable thresholds
3) "issues" which directly contradict the intention/design, and dismiss it as something to avoid, without providing any meaningful explanation or alternative

I am sure that it is also capable of detecting actual issues such as coding errors, deficiencies and potential runtime errors - but as currently configured, they are likely to be buried under the immense amounts of useless spam it generates

Any thoughts?

Many thanks,
Dmitry

Dmitry Kovalev

unread,
Mar 14, 2019, 9:16:53 PM3/14/19
to JanusGraph developers
Just to add even more colour - apparently it sends an email for each "issue" comment, so in addition to completely clogging the PR page it also has just sent 36(!) emails to me - I think this  puts it firmly in the SPAM territory :)

Florian Hockmann

unread,
Mar 15, 2019, 10:29:22 AM3/15/19
to JanusGraph developers
Thanks for your feedback on this, Dmitry!


I can see that some of the "stale" "issues"/comments are deleted by "JanusGraph" (another bot?), but some remain intact even though they are actually addressed, and it seems like it duplicates the same comments with every new commit if the committer "dared" to ignore them, so it has a potential to snowball to insane levels unless it is pacified completely, and each and every "issue" is resolved to its satisfaction.

Are you sure about that? Could you provide examples?
I'm only asking because I haven't seen this behaviour yet, but it may of course be possible that I just overlooked it since we just had a few PRs with Codacy enabled.

I hope that the mode where each and every "issue" it finds is posted as a comment can be switched off

Yes, we can turn that off. If others also find this annoying, then we will just turn this part off.
I think these inline comments have some value because they are easily visible to actual reviewers who don't have to repeat comments Codacy already made.

Regarding the usefulness of the issues, I think it would be good to have some concrete examples to discuss this. How about you just comment on issues Codacy complained about that you find useless? Then we can evaluate whether the rule doesn't make sense for us in general or whether it's just a FP because Codacy misses context but where the rule can still make sense in other cases.
In general, contributors don't have to address all issues Codacy complains about. Just because such a tool makes a comment doesn't mean that anything has to be changed. It's expected that a tool like this produces false positives. That is also why we don't use Codacy to determine whether a PR can be merged or not.

I unfortunately didn't have the time to really inspect your PR, but Codacy complained for example about a method simply for being too long and complex.
This method has over 100 lines an arguably some complexity. I really haven't looked at the method in detail, but I'm pretty sure that, as a reviewer, I would also ask whether this method could be split into multiple smaller methods as it's really hard to understand and therefore maintain such a long and complex method.
So, this is an example where Codacy can already comment on issues before a reviewer finds time to review the PR.

Since we're short on reviewers in general, I welcome every help we can get from automation for the review process. A reviewer shouldn't have to spend time on things that we can also just automate.
But on the other hand such a tool shouldn't be annoying for contributors, so it's good that we're having this discussion.


Just to add even more colour - apparently it sends an email for each "issue" comment

You are referring to the mails GitHub sends, right? These are simply sent for every comment and it should only happen if you either watch the whole repo or are subscribed to notifications of the pull request.
So, there isn't much we can do about these emails if we want to keep the inline comments (which is up for debate of course).

In general, it would be good to hear more opinions on this matter. Do people find the inline comments useful or would you prefer if we turned them off?

Dmitry Kovalev

unread,
Mar 16, 2019, 8:49:55 AM3/16/19
to JanusGraph developers
Hi Florian,

On the examples of "stale" codacy-bot issues not being deleted/resolve automatically - please have a look at the PR I mentioned below. The ones which are manually resolved by me are fixed in the code but were left open by the bot. I've been thinking why it would close some but not others, and one theory I have is that my force-pushing the DCO sign-off could have interfered. It could be that not being able to find the sha it was reviewing before prevented it from detecting that it was fixed correctly.

For the examples of "snowballing" - upon a closer look, it seems that I misinterpreted it a little bit - apologies. It has reported exact same "issues" multiple times, including in the same file, for each occurrence, and the appearance of new comments on PR timeline could have interleaved with my commit fixing them, leading me to believe that it was posting the same "issue" more than once. Or possibly again the pushing of the same commit with different sha for DCO sign-off also played a role. So this point is a little bit inconclusive.

To be clear, I still think that while it could work ok for small commits and bugfixes with < 3 codacy "issues", in general posting a comment for each "issue" occurrence identified by this bot just clogs the PR discussion, and won't be helping actual reviewers either, because they will have trouble even locating their own comments in the mess of the bot messages.

The link to the full report it adds in the checks area is IMO absolutely enough. Reviewers can just ask authors to address all codacy issues they agree with, and then discuss those left. To avoid asking this manually every time, the PR "template" can be amended to include this proposition to the authors. In this context it could be good if a specific occurrence of an issue could be commented on, or "moved to comments" by a click - then the author can explain why they don't want to fix it, or reviewers can bring authors attention to it. But all of them pushed as comments by default is not such a good idea IMO.

On examples of usefulness of the issues - one example is its complaints about package-local visibility (which in my case was totally intentional to allow testing but not "external" usage). It has basically forced me to make those items public, which I don't mind really, but I equally don't understand what was the issue in the first place, and how can this help either readability or anything else.

But to be honest I am not sure if we want to start a discussion of these things here, because it has a potential to become very time consuming without real outcome. Over time the reviewers may reach their own consensus on which "issues" are useful and which are not, and hopefully it will be possible to tune the bot accordingly.

For my PR specifically, I fixed the "issues" which were trivial, even if I didn't see any point in them, and left the rest of them for now.  My hope is that when it is going to be actually reviewed by a human, the reviewers will let me know which of the "issues" they want to be fixed in order to approve.

Many thanks,
Dmitry

Florian Hockmann

unread,
Mar 20, 2019, 4:40:42 AM3/20/19
to JanusGraph developers
The link to the full report it adds in the checks area is IMO absolutely enough. Reviewers can just ask authors to address all codacy issues they agree with

But for that, the reviewer has to open the Codacy report in a separate windows and then correlate which issue affects which part of the contribution while reviewing that. To be honest, I wouldn't do that as a reviewer as it's too much manual work which is exactly we want to have less of thanks to automation. Having the comments directly in the diff view on GitHub shows them exactly in the right place while reviewing. If Codacy for example comments about an unused variable, then I won't have to comment on that again (or even look for it).

It has basically forced me to make those items public, which I don't mind really, but I equally don't understand what was the issue in the first place, and how can this help either readability or anything else.

You shouldn't change anything because a tool suggests to do so, if you don't agree with the suggestion. (The same basically applies to human reviewers, only that you can ask them for their reasoning.) If you don't agree with a Codacy rule in general, then please just answer to the comment or mention it here as we can then disable that rule if we agree that we don't want it. If Codacy is just wrong in a specific case, then we just ignore its suggestions.
But you should really never just blindly follow a suggestion from such a tool if you disagree with it.

Since Codacy seems to be causing some confusion with its suggestions, do you have a suggestion how we could make it clear that Codacy's comments are really just suggestions? Would it help if we added something about Codacy in CONTRIBUTING.md?

Chris Hupman

unread,
Mar 21, 2019, 4:08:38 PM3/21/19
to JanusGraph developers
I'm definitely on the side of wanting as much automation for PR reviews as we can. As for the number of emails is it possible to configure Codacy to do a single review instead of a review comment per issue?

For documentation we could probably just put a line in the Pull Request instead of, or in addition to CONTRIBUTING.md template that references that if the submitters disagree with Codacy to please ask questions in the conversation tab. 

Florian Hockmann

unread,
Mar 22, 2019, 10:42:25 AM3/22/19
to JanusGraph developers
As for the number of emails is it possible to configure Codacy to do a single review instead of a review comment per issue?

That sounds like a good idea for an improvement. I didn't find any option to configure how Codacy adds these inline comments, so I submitted that as a feature request to them. (I have no idea though what they do with these feature requests.)

And yes, the PR template is probably the best place as it already contains a hint to check if the build fails on Travis.

Chris Hupman

unread,
Mar 28, 2019, 12:45:50 PM3/28/19
to JanusGraph developers
After getting over 100 emails in the past 2 days on the janusgraph-python PR, I must admit I am much more sympathetic to Dmitry's complaint about email spam from in-line comments. The email spam is definitely an issue on very large pull requests, but seems to be very acceptable for PRs under 100 lines of code. Thanks for submitting the feature request Florian. 

Dmitry Kovalev

unread,
Mar 28, 2019, 1:03:37 PM3/28/19
to janusgr...@googlegroups.com
I will bet a few pints that when it comes to actually reviewing that big PR by a human, both reviewer and author will also find that having 100 comments from a bot on the PR is as annoying as having 100 emails from that bot in their inbox :)

Cheers,
Dmitry 

PS. speaking of pints - if anyone from the community happens to be in/visit London - feel free to drop me a line, would be glad to meet 

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/b5fab26b-3b28-49bb-9d16-1c2094eba969%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Hockmann

unread,
Mar 28, 2019, 1:14:27 PM3/28/19
to JanusGraph developers
Thanks for submitting the feature request Florian.

I got a response from them that they forwarded that request to their product team.

Not sure if everyone here is aware of this, but you can also adjust your personal notification settings on GitHub to not get these emails any more. You can either turn off email notifications completely and instead use the web notifications that are shown when you browse GitHub (via Settings > Notifications) or you can just block individual users which also works for bot users. I did that for Codacy and also our CLA bot and it works really well. Codacy even documents that here.

having 100 comments from a bot on the PR

While I basically agree with you here, I wouldn't use that PR as an example to use to decide on the usefulness of these comments as the PR is just way too big to efficiently review it in general. The 100 comments from Codacy are just another symptom for this problem in my opinion.

But I really don't want to argue for ever on this here. If more contributors want these inline comments to be turned off than contributors who find them useful, then we will turn them off.

Am Donnerstag, 28. März 2019 18:03:37 UTC+1 schrieb Dmitry Kovalev:
I will bet a few pints that when it comes to actually reviewing that big PR by a human, both reviewer and author will also find that having 100 comments from a bot on the PR is as annoying as having 100 emails from that bot in their inbox :)

Cheers,
Dmitry 

PS. speaking of pints - if anyone from the community happens to be in/visit London - feel free to drop me a line, would be glad to meet 

To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.

Oleksandr Porunov

unread,
Mar 28, 2019, 1:30:20 PM3/28/19
to JanusGraph developers
I find inline comments from Codacy convenient. Of course it would be much better if those comments would be grouped in a single review rather then separate comments but frankly, most PRs are much smaller and don't have such issues in my humble opinion. I don't have strong opinion on this issue. I am OK with both decisions either turning them off or leaving it as is.  

Chris Hupman

unread,
Mar 28, 2019, 2:04:31 PM3/28/19
to JanusGraph developers
I'm going to go with blocking notifications from codacy and I'll open a good first issue to add the link for blocking to CONTRIBUTING.md. The in-line comments are helpful for review and I think providing contributors with the info to opt out is the best option.

In response to Dmitry's last comment, if I can find an excuse to have IBM send me to London I will gladly join you for a pint. The janusgraph-python PR already had 300+ comments and 100+ commits before Codacy so it was already an extreme source of email spam. Codacy just nudged it over the Unicorn line. 

Dmitry Kovalev

unread,
Apr 3, 2019, 11:12:09 AM4/3/19
to JanusGraph developers
Apologies for bumping up this thread again - just wanted to update on one aspect of codacy behaviour which previously was inconclusive.

I have just had a confirmation that codacy bot actually DOES report same issue once again if a commit touches same file. See my PR after recent commit (which touched only commented lines, no actual code changed) - it went and added the same comments for all the files I touched all over again, without removing or outdating previous ones. 

It means its comments do snowball if someone dares to not fix them immediately, and chooses to commit some other unrelated changes instead. Right now on my PR I have multiple copies of the same comment about the same lines of code, and will probably continue adding more copies until the "issue" it sees there is "fixed". This makes them harder to ignore even if there was consensus that they are just recommendations.


As to the size of PR affecting the number of comments, codacy or otherwise - I agree that it is bigger than a typical bugfix or a small feautre PR. But can you suggest a meaningful way to split up a (relatively) large contribution such as new backend into multiple small PRs? I assume that "code drops" with incomplete functionality are going to be even harder in terms of finding reviewers, since they will need to be prepared to believe that "it will all make sense in next PR"?

If you do have any ideas in general, and how to split my PR in particular, and you do think that doing it is going to help speed up the reviews (as measured in real time, and across all PRs in the same contribution) - then I would be happy to look at splitting it up! 

If, on the other hand, the total review time will be multiplied by the number of individual PRs (purely because people just don't have time to even start looking at them) then it wouldn't make any real time saving.

To be clear, none of this is meant as criticism, I am just trying to provide a contributor's point of view ( and also see if there is anything I can do to help speed up my own review in particular :) )

Thanks,
Dmitry

Florian Hockmann

unread,
Apr 3, 2019, 12:28:01 PM4/3/19
to JanusGraph developers
As to the size of PR affecting the number of comments, codacy or otherwise - I agree that it is bigger than a typical bugfix or a small feautre PR. But can you suggest a meaningful way to split up a (relatively) large contribution such as new backend into multiple small PRs?

I mainly referred to the Python library with my earlier comment regarding the size of the PR. There, it would probably have been possible to first create a smaller PR with just a minimal level of functionality (which shouldn't be a critique to Debasish here, just my opinion after seeing how the review exploded).
When a completely new backend is provided, then I really don't see a way to split that into individual PRs. Sure, it would be possible to add incomplete functionality and complete it with follow-up PRs, but do we want to merge something that isn't usable yet?

The only option might be in your case to not exchange the existing in-memory backend completely with your new one, but to apply incremental improvements. But I'm saying this without having looked at your PR yet. So, I can't really say if that could work at all or not.

BTW, don't wonder that it takes so long until you get reviews. All committers only work part time on JanusGraph or even in their spare time and most (if not all) only have knowledge about a subset of the JanusGraph code base which makes it hard to review a new backend.
I'll try to review your PR, but I want to concentrate first on the upgrade to TinkerPop 3.4.0 as that's blocking us from releasing JanusGraph 0.4.0.

To be clear, none of this is meant as criticism, I am just trying to provide a contributor's point of view

I think that's clear and I definitely welcome your input here. It wouldn't hurt in general to have more involvement from contributors in development related discussions in my opinion.
Reply all
Reply to author
Forward
0 new messages