Tightening security in the sympy repo

42 views
Skip to first unread message

Oscar Benjamin

unread,
Mar 29, 2026, 3:12:16 PM (3 days ago) Mar 29
to sympy
Hi all,

Following the recent hacking of trivy and litellm I have reviewed
security arrangements in the sympy/sympy repo and yesterday I made a
bunch of changes to tighten things up. This email is just to explain
what those changes are and how it relates to the other hacks.

For the most part I don't think that SymPy developers will notice the
changes that I have made apart from the CODEOWNERs file (I will
explain below) that limits merging a PR to the GitHub Actions workflow
files so this is mostly just FYI. Those of you who also have other
public repos that you maintain should consider doing similar things
there as well.

I imagine that some people may think that their projects or github
accounts are not "important" enough to get hacked and that you don't
need to worry about these things. If so then I think that you should
reconsider that for a couple of reasons:

- With AI agents it is now feasible to try to hack much lower-value targets.
- Actually SymPy is a significant value target and if you have any
level of access to sympy/sympy then that makes you a target along with
all of your other public projects.

You can read about the trivy and litellm hacks here:
https://www.theregister.com/2026/03/24/1k_cloud_environments_infected_following/
https://www.theregister.com/2026/03/24/trivy_compromise_litellm/

The trivy repo was hacked by a bot called hackerbot-claw which also
hacked a bunch of other projects. You can read about those here:
https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation

My understanding is that the trivy repo was exploited because of a
vulnerable GitHub Actions workflow. This allowed an attacker to open a
malicious pull request and use the CI run from the pull request to
exfiltrate a repository secret. That secret was a GitHub personal
access token (PAT) which presumably belonged to one of the maintainers
and gave full control of the repo.

The vulnerable actions workflow that made this possible was triggered
by the pull_request_target event rather than the usual pull_request or
push events that trigger normal CI runs (for linting, tests, etc). The
pull_request_target event runs with the target branch (e.g. the master
branch) version of the workflow file so a PR author cannot edit that
file and affect its behaviour in the CI runs before their PR is
merged. Because of this the pull_request_target workflow has a higher
permission level and has access to the repository secrets. I'm not
sure exactly what the attacker needed to do to extract the secret but
just using pull_request_target is something to be careful about.

The sympy repo previously had a workflow called comment-on-pr that was
vulnerable to exactly the same kind of issue that trivy had although
using the workflow_run event rather than pull_request_target. The
workflow_run event was used so that the workflow would have
permissions to comment on a PR but it also meant that the workflow had
full access to the secrets. Exactly how the workflow was vulnerable is
explained in the write-up from the researcher:
https://securitylab.github.com/advisories/GHSL-2024-322_Sympy/

His suggestion to fix the vulnerability was to change something subtle
in the line:
run: echo ${{ steps.pr_number_reader.outputs.pr_number }}
However I decided just to delete the workflow because that seems safer
than depending on anyone (including me who introduced the bug) getting
subtle details correct at pain of complete repo takeover.

What I realised yesterday though is that there were many old branches
in the sympy repo. Any branch in a public repo is a potential target
for malicious PRs. Some of those old branches still had the
comment-on-pr workflow although they did not have the exact
vulnerability. For safety I just deleted all old branches so that we
don't need to worry about updating them for security issues.

Some old branches were release series branches like 1.9, 1.10 etc.
Some were revert branches that come from clicking the "revert PR"
button in GitHub (we should stop using that). Note that branches in
forks are fine, the problem is that "revert PR" opens a branch in the
main repo for the PR. Some branches were just random branches that
people created at some point in time. I have deleted all branches
except for master and 1.14.

The reason that the trivy attack was so bad was that the attacker got
control of a personal access token (PAT) of a maintainer which gave
them full control. We should generally try to avoid allowing such
tokens to exist, much less being stored in an obvious place. There are
currently four tokens stored in sympy's repo and I don't believe any
of them gives that kind of access but we should probably review and
rotate them just in case. Those four tokens are for allowing sympy's
GitHub Actions CI runs to connect to CircleCI, to push the nightly
wheels and an ssh key for the sympy docs website. Pushing to PyPI uses
trusted publishing so it does not need a long-lived secret that would
give an attacker direct access to PyPI.

All sympy maintainers should check if they have created any personal
access tokens that are not yet expired and that might give access to
the sympy repo. You can find these by clicking your icon at the top
right of github's website and then "Settings", scroll down to
"Developer Settings" and check both "fine grained tokens" and "tokens
(classic)". Note that if you have put that token into another repo
then an attacker could try to attack that repo to get your PAT and
then use that to attack sympy.

I'm a bit unclear on this part but it sounds like after the trivy repo
was taken offline the maintainers tried to fix it and rotated all the
secrets but it seems that during the process the attacker was at some
point still able to use the PAT to steal one of the new secrets that
could grant access to the repo. That meant that the maintainers
thought that they had secured everything but the attacker was still
lurking with something like a PAT. This is why these things are so
dangerous and why it is just better if they just don't exist i.e. if
you can find any other way of doing something that does not need to
manage secrets.

A week or so later the attackers used the stolen secret to put
malicious code into the trivy repo. They pushed in malicious versions
of all of trivy's previous releases and then pushed new git tags
replacing all of the version tags like v1, v2 etc to point to the
malicious code. This is how litellm got hacked because their GitHub
Actions CI configuration pinned the version of trivy by git tag
something like:
uses: trivy/trivy@v5
That means that when the attackers pushed a new v5 tag pointing at the
malicious code the litellm CI jobs would pull in that malicious code
immediately on the next CI run. That malicious code extracted secrets
from the CI job and gave the attackers access to the litellm repo or
at least some secret from there.

In turn that enabled the attackers to push two malicious releases of
litellm to PyPI and then those were ultimately installed by end users
who would have all of their personal secrets extracted. The second of
these two malicious releases had an infinite recursion bug which is
how it was discovered and removed so quickly (after about 45 minutes).
I'm sure it would have taken a lot longer to discover this if not for
the bug in the attacker's code.

The litellm repo would have been fine if they had been pinning the
trivy action by git SHA something like:
uses: trivy/trivy@494b9575c70ff8f1cef56386a6b471def9bbae88 # v5
An SHA in git is a cryptographically secure hash of all of the code in
the commit and all previous commits (and author, time and other
things). What that means is that it is effectively impossible for an
attacker to create a malicious commit that has the same SHA as a
legitimate commit. This means that SHA hashes are an immutable
reference to the exact code whereas a git tag like v5 is a mutable
reference.

Yesterday I merged these four PRs to harden sympy's CI setup:

https://github.com/sympy/sympy/pull/29578
https://github.com/sympy/sympy/pull/29581
https://github.com/sympy/sympy/pull/29584
https://github.com/sympy/sympy/pull/29585

The changes are

- Split jobs that pull in unpinned dependencies (e.g. from PyPI) from
jobs that have access to secrets.
- Separate jobs that need different sets of unpinned dependencies.
- Pin all actions by SHA hash (and configure the repo to require this)
- Give each CI job only the minimum permission (e.g. read not write)
that it needs.
- Set CODEOWNERS to protect itself and also the CI configuration files.

I also changed a bunch of repo settings that are mostly invisible.
Essentially lots of things that sympy does not use were enabled and I
have disabled them.

The only change here that I would expect any sympy
contributors/maintainers to notice is that I have enabled the "require
approval from code owners" repo setting. What this means is that if a
PR touches files that are listed in CODEOWNERS then only the github
users named in CODEOWNERS have permission to approve the PR before it
can be merged. Note that this changes the meaning of the file
significantly because before it just meant that the listed names would
be notified about a PR but now the PR can only be merged if "approved"
by those people.

For now the file looks like:

/.github/CODEOWNERS @oscarbenjamin @asmeurer
/.github/workflows/ @oscarbenjamin @asmeurer
/.circleci/ @oscarbenjamin @asmeurer
/sympy/physics/biomechanics/ @moorepants
/sympy/physics/mechanics/ @moorepants
/sympy/physics/vector/ @moorepants
/sympy/*laplace* @hanspi42

The first three lines are the reason that I did this, which is to
protect the CI configuration. The other lines were left over from
before but I don't know if Jason and Hanspeter would still want to be
marked as code owners given the changed meaning of the file now so we
should review that.

I only put Aaron and myself there for the actions workflows but that
is just because I wanted to make sure there were at least two names
listed there. I don't mean that as an exclusion to other maintainers
and in fact I think we do need some more names there so that others
can approve PRs. If anyone wants to add themselves there then just
open a PR to change the file.

What I do want though is that if anyone is going to be code owner for
the CI config files then I want them to acknowledge some understanding
of the security implications. That does not mean that you need to
understand everything but just to be able to think a bit about
security and distinguish between "this is obviously not any security
problem" vs "actually I am not 100% sure so won't merge without a
second opinion".

There are some other things that we could do to make things more
secure such as pinning versions of things that are installed from PyPI
or requiring reviews on all PRs. Requiring review on all PRs would
disallow self-merging and changes what buttons you have to click to
merge a PR so I didn't want to change that without discussion. It
would also be good to enable some kind of security scanning (although
ironically that is what litellm was using trivy for). Another thing
that we could do is require two-factor authentication for all sympy
members but that might be disruptive.

Also in other news there was recently a malicious package uploaded to
PyPI called sympy-dev:
https://vulners.com/osv/OSV:MAL-2026-450
It is apparently common now for attackers to try to identify package
names that are common AI hallucinations and then to upload malware at
those names. That package was apparently discovered and removed by
PyPI admins without notifying anyone here. It shows though that there
are people who would want to hack sympy if they could.

Also for anyone interested this is a similar NumPy discussion about
improving security:
https://github.com/numpy/numpy/issues/29178
They are apparently planning to have a complete separate repo for
publishing to PyPI.

--
Oscar

gu...@uwosh.edu

unread,
Mar 29, 2026, 4:33:57 PM (3 days ago) Mar 29
to sympy
Oscar,

First, thank you for your efforts to maintain the security of Sympy. It is important work.
Second, thank you for the warnings of things to be aware of for those who maintain other repos.

This is quite scary. I hope that maintaining the security of the sympy repo does not become too burdensome.

Jonathan

Aaron Meurer

unread,
Mar 29, 2026, 4:45:26 PM (3 days ago) Mar 29
to sy...@googlegroups.com
Should this issue be closed? I think it related to some similar things. 

If having branches in the repo is a vulnerability we should probably revisit this issue about disabling branch creation 
https://github.com/sympy/sympy/issues/23471. It's really annoying that GitHub puts branches on the main repo to revert a PR. 

We've had a policy for a while that peoples push access should be revoked if they haven't pushed for a year for security purposes. We haven't really enforced it but I think we should consider doing so. If someone has their access revoked and they come back they can get it again. It's just to limit the number of people who can potentially be hacked and push to the repo. 

I believe GitHub has already implemented a policy a while ago requiring anyone with push access to high impact repositories (including sympy) to have security keys. So I don't think we need to worry about that. 

Aaron Meurer 


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxTsf_YAJKQfi4KVd_SJy_Tx8fOy2Fub2iy-q6g0DMDBVw%40mail.gmail.com.

Oscar Benjamin

unread,
Mar 29, 2026, 7:24:51 PM (2 days ago) Mar 29
to sy...@googlegroups.com
On Sun, 29 Mar 2026 at 21:45, Aaron Meurer <asme...@gmail.com> wrote:
>
> Should this issue be closed? I think it related to some similar things.
> https://github.com/sympy/sympy/issues/27202

I forgot about that issue. I'll reply about the other things later but
just to note for now that I ran zizmor again just now and it found
about 100 things of which 2 were classed as "high".

One of those was a false positive complaining about the use of
pull_request_target in checkconflict.yml but that one is fine. That
particular workflow is limited enough that it is okay to use
pull_request_target provided no one adds more stuff in the workflow
(there is a comment explaining this).

The other high complaint was an exploitable vulnerability that I just fixed in
https://github.com/sympy/sympy/pull/29601

The issue there was the code had

run: |
gh extension install actions/gh-actions-cache
gh actions-cache delete ${{ env.DURATIONS_CACHE_NAME }} \
-R ${{ github.repository }} \
-B ${{ github.head_ref }} \

However github.head_ref is the branch name for the pull request. An
attacker can create a weird branch name for shell injection here. It
is also happening in a job that has the actions:write permission set
unlike most other jobs. I'm not sure what exactly an attacker can do
with that.

There have been reports of attacks using the branch name in the past
to push to PyPI:
https://lwn.net/Articles/1001215/

I've fixed that quickly on the master and 1.14 branches. I think that
shows that we should be running zizmor in CI. Even if zizmor has false
positives, if it can find a real vulnerability that quickly and we
aren't even checking it then an attacker can use it to identify the
vulnerability trivially.

--
Oscar
Reply all
Reply to author
Forward
0 new messages