GitHub - thoppe/Federal-Github-Landscape-Analysis: Survey of all Federal GitHub organizations

16 views
Skip to first unread message

John Scott

unread,
Mar 20, 2025, 1:55:19 PMMar 20
to mil...@googlegroups.com



Federal GitHub Landscape Analysis

Key take-aways

  • Open-source government innovation thrives in the U.S. Federal government, with at least 775 federal organizations hosting over 25,000 repositories.
  • Releasing code publicly increases transparency and trust in government operations, enabling a more participatory government. Over 189,000 unique users have left more than 322,000 stars (a community measure of approval).
  • A small subset of users has been critical to the Federal open-source ecosystem, individually contributing to hundreds—and, in some cases, nearly a thousand—different repositories. These top users have collectively pushed nearly half a million commits.
  • A full archival copy of the U.S. Federal GitHub is stored for archival and research purposes (as of January 4, 2025).
  • The open-source landscape is diverse but leans towards repositories focused on scientific and cybersecurity topics.
  • The most popular GitHub repository is the NSA's Ghidra, a software reverse engineering framework. It ranks as the 188th most popular repository on all of GitHub.
  • Other notable repositories include NASA's build-it-yourself Mars rover, the Public Sans font, a web design framework, Obama-era White House API standards, and the multiphysics object-oriented simulation environment (MOOSE).

Project Description

As of January 2025, there is no consolidated inventory of all U.S. Federal open-source code repositories. This project surveys all Federal open-source code across GitHub.

Despite growing adoption of open-source practices, there is no single, comprehensive hub for U.S. Federal agencies to manage and track source code—from research projects to software and web development. Code.gov was launched to address this need but has faced challenges with compliance and broader adoption. This list maintained by GitHub offers better coverage but mixes in non-Federal organizations and omits many smaller offices or research arms within the government.

This project seeks to expand the open-source movement in government by:

  1. Conducting a comprehensive survey of Federal open-source code on GitHub through direct identification of relevant organizations.
  2. Analyzing the ecosystem, including trends in programming languages and collaboration networks.
  3. Archiving Federal repositories to ensure continued accessibility, especially in anticipation of potential disruptions during administrative transitions.

Methodology and Reporting

Given over 192.1 million GitHub users, evaluating every account is infeasible. Instead, this project focuses on organizations that associate themselves with a .gov domain in their registered email or listed URL. There are 8,003,003 organizations listed on GitHub as of December 2024. Among them 3,203 organizations indicated a .gov domain in at least one of the following fields: emailblogdescriptioncompanylocation, or name. From these 1,599 .gov-affiliated organizations that were US-based, 1,151 organizations with at least one public repository remained.

Organizations further refined into US Federal by human curation

Organizations were categorized by their primary ownership. If an organization was perceived as government-run or self-identified as such, it was included. The 404 errors likely resulted from phishing organizations set up to imitate a government agency (they often had very recent creation dates). Government research programs (e.g., MoTrPAC) were counted when it appeared that the governing entity was the US government.

Limitations: This project does not cover repositories hosted outside of GitHub or those not connected to a .gov organization.

Within these 775 U.S. Federal organizations, there are 25,276 repositories in total. After excluding forks and repositories without at least one GitHub star, 12,468 repositories remained.

Within these repositories there were 27,382 unique contributors. Some of these are clearly provisioned automated bots, while others indicate extremely prolific human users.

Topics were determined by examining the top 1,500 repositories and then applied using a structured GPT query.

CategoryCumulative Stars
🌟 Open Source Software Development232781
🛡️ Cybersecurity and Threat Analysis93242
📊 Data Integration and Analytics66825
🚀 High-Performance Computing and Simulation47508
🪐 Space and Planetary Exploration Technologies45059
🎨 Web and Design Standards42425
🤖 Embedded Systems and Robotics36247
🌍 Geospatial and Earth Observation Technologies29900
🌱 Environmental and Energy Applications24976
🧠 Artificial Intelligence and Machine Learning24069

Other interesting tables

Category counts by total repositories🌟 Top 20 repositories by stars📦 Top 20 repositories by size🔧 Top 20 users by total unique repo contributions🚀 Top 20 users by total overall repo contributions✨ Top 20 users by stargazer count📈 Total Programming Language Usage

Archival copies

All repositories have been archived via a "shallow copy." Full data is available upon request.

TO DO

  • Finish the overlap analysis with CODE.GOV
  • Determine a location for deep and shallow copies to support the archiving effort.
-------------------------------------------
John Scott
Reply all
Reply to author
Forward
0 new messages