Is Criticality Score a Proxy for Popularity?

39 views
Skip to first unread message

Nuthan Munaiah

unread,
Jan 22, 2021, 12:44:17 PM1/22/21
to wg-securing-cr...@googlegroups.com, Chris Horn, Nuthan Munaiah
Hello,

My name is Nuthan Munaiah and I am a Senior Researcher at Secure Decisions. I and my colleague, Chris Horn, were on the Securing Critical Projects WG meeting on January 14, 2021 when the Open Source Project Criticality Score Program was mentioned. The program seems similar to a project I worked on as a Graduate Student at RIT called reaper (paper and dataset). In an email conversation with Abhishek Arya, we found that the Criticality Score Program implements some of the parameters that reaper uses.

When looking at the results from the Criticality Score Program, we wondered if the criticality score is merely a proxy for popularity. As it turned out, there are a few comments [1,2,3] alluding to this question on the Hacker News DiscussionWe evaluated this hypothesis by assessing the correlation between criticality score of a repository and its popularity (quantified using GitHub Stargazers). The outcome from the analysis (shown in the table below) was interesting and we thought the Group could benefit from the insights as well.

| Language   |     ρ    | Effect   |      p      | Significant |
|------------|----------|----------|-------------|-------------|
| rust       | 0.417577 | Moderate | 7.66115e-10 |     Yes     |
| ruby       | 0.404109 | Moderate |  2.9531e-09 |     Yes     |
| c#         | 0.382657 | Moderate | 2.24522e-08 |     Yes     |
| javascript | 0.368158 | Moderate | 8.16308e-08 |     Yes     |
| java       | 0.337799 | Moderate | 9.99069e-07 |     Yes     |
| c++        | 0.321293 | Moderate | 3.50299e-06 |     Yes     |
| php        | 0.287965 | Weak     | 3.55208e-05 |     Yes     |
| go         | 0.284187 | Weak     | 4.53817e-05 |     Yes     |
| c          | 0.255176 | Weak     | 0.000265666 |     Yes     |
| shell      | 0.222957 | Weak     |  0.00150682 |     Yes     |
| python     | 0.169501 | Weak     |   0.0164191 |     Yes     |

Interpretation: Yes, criticality score of a repository is positively correlated with its popularity but the effect is not as strong as some of the comments [1,2,3] from the Hacker News Discussion seems to suggest.

Thank you,
Nuthan Munaiah

References

[1] "The methodology is pretty silly. It rewards activity and popularity." https://news.ycombinator.com/item?id=25385795
[2] "I like this idea, which pops up here and there occasionally, but this particular "criticality score" appears to measure popularity, rather than criticality." https://news.ycombinator.com/item?id=25385562
[3] "I may have misread but the fatal error in the metric to me is that popularity of a project increases its criticality when it should decrease." https://news.ycombinator.com/item?id=25388443

Abhishek Arya

unread,
Jan 25, 2021, 12:41:51 PM1/25/21
to wg-securing-critical-projects, Michael Scovetta, Chris Horn, Nuthan Munaiah, Nuthan Munaiah
On Fri, Jan 22, 2021 at 9:44 AM Nuthan Munaiah <nm6...@rit.edu> wrote:
Hello,

My name is Nuthan Munaiah and I am a Senior Researcher at Secure Decisions. I and my colleague, Chris Horn, were on the Securing Critical Projects WG meeting on January 14, 2021 when the Open Source Project Criticality Score Program was mentioned. The program seems similar to a project I worked on as a Graduate Student at RIT called reaper (paper and dataset). In an email conversation with Abhishek Arya, we found that the Criticality Score Program implements some of the parameters that reaper uses.

When looking at the results from the Criticality Score Program, we wondered if the criticality score is merely a proxy for popularity. As it turned out, there are a few comments [1,2,3] alluding to this question on the Hacker News DiscussionWe evaluated this hypothesis by assessing the correlation between criticality score of a repository and its popularity (quantified using GitHub Stargazers). The outcome from the analysis (shown in the table below) was interesting and we thought the Group could benefit from the insights as well.

| Language   |     ρ    | Effect   |      p      | Significant |
|------------|----------|----------|-------------|-------------|
| rust       | 0.417577 | Moderate | 7.66115e-10 |     Yes     |
| ruby       | 0.404109 | Moderate |  2.9531e-09 |     Yes     |
| c#         | 0.382657 | Moderate | 2.24522e-08 |     Yes     |
| javascript | 0.368158 | Moderate | 8.16308e-08 |     Yes     |
| java       | 0.337799 | Moderate | 9.99069e-07 |     Yes     |
| c++        | 0.321293 | Moderate | 3.50299e-06 |     Yes     |
| php        | 0.287965 | Weak     | 3.55208e-05 |     Yes     |
| go         | 0.284187 | Weak     | 4.53817e-05 |     Yes     |
| c          | 0.255176 | Weak     | 0.000265666 |     Yes     |
| shell      | 0.222957 | Weak     |  0.00150682 |     Yes     |
| python     | 0.169501 | Weak     |   0.0164191 |     Yes     |

Interpretation: Yes, criticality score of a repository is positively correlated with its popularity but the effect is not as strong as some of the comments [1,2,3] from the Hacker News Discussion seems to suggest.

Thanks Nuthan for doing this analysis. Criticality score does not use stars at all, but our sample set of 5K repos (per lang) is derived from top starred repos. We can further remove this correlation if we can run the criticality score algo on a bigger sample set (50-100K repos or even all :). Right, we are severely limited by the GitHub Api limit (both rate limit and 1K search api limit).

+cc Michael (MS) who was running in the same issues with running scorecards/security metrics dash at scale. Can we get some accounts whitelisted for security research?

 

Thank you,
Nuthan Munaiah

References

[1] "The methodology is pretty silly. It rewards activity and popularity." https://news.ycombinator.com/item?id=25385795
[2] "I like this idea, which pops up here and there occasionally, but this particular "criticality score" appears to measure popularity, rather than criticality." https://news.ycombinator.com/item?id=25385562
[3] "I may have misread but the fatal error in the metric to me is that popularity of a project increases its criticality when it should decrease." https://news.ycombinator.com/item?id=25388443

--
You received this message because you are subscribed to the Google Groups "wg-securing-critical-projects" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wg-securing-critical...@googlegroups.com.
To post to this group, send email to wg-securing-cr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wg-securing-critical-projects/ema3da7ea8-0e68-49c7-8fb5-8d2b0f9f1742%40npt-1465.
For more options, visit https://groups.google.com/d/optout.

Dan Lorenc

unread,
Jan 25, 2021, 2:17:43 PM1/25/21
to Abhishek Arya, wg-securing-critical-projects, Michael Scovetta, Chris Horn, Nuthan Munaiah, Nuthan Munaiah
Thanks for sharing! Would you be interested in adding the results and any methodology to the GitHub repo?

Dan Lorenc

Nuthan Munaiah

unread,
Jan 25, 2021, 3:17:34 PM1/25/21
to wg-securing-critical-projects
Absolutely. Is https://github.com/ossf/criticality_score the repository you would like me to contribute to? Should I update the README.md or create a Wiki?

Dan Lorenc

unread,
Jan 25, 2021, 3:38:40 PM1/25/21
to Nuthan Munaiah, wg-securing-critical-projects
Yup, adding text in that repo would be perfect. I haven't used GitHub wikis very often, if you think that would be better than markdown checked into the repo that's fine with me!

Reply all
Reply to author
Forward
0 new messages