Criticality score - GitHub Top 100K repos

157 views
Skip to first unread message

Serkan Holat

unread,
Feb 9, 2021, 3:26:54 PM2/9/21
to wg-securing-critical-projects
Hi everyone,

I was running the "criticality score" script against GitHub for a while, and managed to reach 100K repos (from top to 235 stars).

Here are the results:

Couple of remarks:
- I also included license & star fields. We don't use stars in calculating the score, but it was handy to have it as a reference. About the licenses, I'm wondering whether we can use them as a parameter to determine the score; projects with OSI approved and/or more permissive licenses deserves more attention?
- I didn't use "ignored keywords", so the list contains the repos that have "docs", "interview" & "tutorial" keywords. Among these repos, "Azure docs" is the highest ranked repo (ranked 11). I didn't dive into it yet but it might be interesting to see how it gets ranked this high.
- The repos were processed between January ~ February. I'm planning to add "processed on" field to be more accurate on the dates.
- The script got an error on 13 repos. I manually checked 5 of the repos and added them to the list. I will continue with the rest and update the list. I will send a PR if I can see something that we can improve in the script.
- I also cross checked the repos against the lists that were published earlier (https://commondatastorage.googleapis.com/ossf-criticality-score/index.html). I could find all the repos except the ones that have lower than 235 stars.

It would be nice to run the script regularly to track the changes in the list. If I'd find time, I'd continue with building an API/website to show these changes in time (Top OSS?). So, I'm curious what your plans are. I'm planning to join to the next meeting this Thursday, so hopefully we can discuss these details.

Best,
Serkan

Abhishek Arya

unread,
Feb 10, 2021, 1:34:31 AM2/10/21
to wg-securing-critical-projects, Serkan Holat
On Tue, Feb 9, 2021 at 12:26 PM Serkan Holat <serka...@gmail.com> wrote:
Hi everyone,

I was running the "criticality score" script against GitHub for a while, and managed to reach 100K repos (from top to 235 stars).

Here are the results:

Thanks for spending a month creating this list. It would be great to put this on gs://ossf-criticality-score and then link from https://github.com/ossf/criticality_score#public-data. Can you upload this using gsutil (gave you object create access) and send a PR to add a link at the end of that section.
 

Couple of remarks:
- I also included license & star fields. We don't use stars in calculating the score, but it was handy to have it as a reference. About the licenses, I'm wondering whether we can use them as a parameter to determine the score; projects with OSI approved and/or more permissive licenses deserves more attention?

Sure, adding those fields is fine. Most licenses are very permissive, do you see a major change with adding license weight. Anyone else has thoughts on this?
 
- I didn't use "ignored keywords", so the list contains the repos that have "docs", "interview" & "tutorial" keywords. Among these repos, "Azure docs" is the highest ranked repo (ranked 11). I didn't dive into it yet but it might be interesting to see how it gets ranked this high.

Azure-docs are rapidly churning, like 10K contributors and thousands of issues opened and closed. We can keep it, but these never looked interesting to me in any case.
 
- The repos were processed between January ~ February. I'm planning to add "processed on" field to be more accurate on the dates.

Not needed. We need to scale our system to run on these periodically.
 
- The script got an error on 13 repos. I manually checked 5 of the repos and added them to the list. I will continue with the rest and update the list. I will send a PR if I can see something that we can improve in the script.

Sounds good.
 
- I also cross checked the repos against the lists that were published earlier (https://commondatastorage.googleapis.com/ossf-criticality-score/index.html). I could find all the repos except the ones that have lower than 235 stars.

Great!
 

It would be nice to run the script regularly to track the changes in the list. If I'd find time, I'd continue with building an API/website to show these changes in time (Top OSS?). So, I'm curious what your plans are. I'm planning to join to the next meeting this Thursday, so hopefully we can discuss these details.

We are still stuck due to GitHub API quota limitations. Also, we want to work on https://github.com/ossf/criticality_score/issues/82 before scaling things up. Contributions welcome!

 

Best,
Serkan

--
You received this message because you are subscribed to the Google Groups "wg-securing-critical-projects" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wg-securing-critical...@googlegroups.com.
To post to this group, send email to wg-securing-cr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wg-securing-critical-projects/a421e94b-be48-48f1-8b5d-23e1173ebb8cn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Lorenc

unread,
Feb 10, 2021, 8:45:08 PM2/10/21
to Abhishek Arya, wg-securing-critical-projects, Serkan Holat
Thanks for doing this Serkan! This is awesome.

Dan Lorenc

Serkan Holat

unread,
Feb 28, 2021, 11:28:32 AM2/28/21
to wg-securing-critical-projects
Hi Abhishek,

I finished working on the repositories that I got exceptions and updated my list with small changes.

I tried to upload the latest file to the bucket but since I don't have a "delete access", I couldn't overwrite the previous file. The changes are not that big, so we can ignore it, but I wanted to inform you first.

Abhishek Arya

unread,
Feb 28, 2021, 12:27:45 PM2/28/21
to Serkan Holat, wg-securing-critical-projects
On Sun, Feb 28, 2021 at 8:28 AM Serkan Holat <serka...@gmail.com> wrote:
Hi Abhishek,

I finished working on the repositories that I got exceptions and updated my list with small changes.

I tried to upload the latest file to the bucket but since I don't have a "delete access", I couldn't overwrite the previous file. The changes are not that big, so we can ignore it, but I wanted to inform you first.

I have deleted gs://ossf-criticality-score/all.csv, feel free to update it now
 

Serkan Holat

unread,
Feb 28, 2021, 6:35:19 PM2/28/21
to wg-securing-critical-projects
Reply all
Reply to author
Forward
0 new messages