AI generated pull requests

152 views
Skip to first unread message

Oscar Benjamin

unread,
Oct 25, 2025, 6:46:10 PM (13 days ago) Oct 25
to sympy
Hi all,

I am increasingly seeing pull requests in the SymPy repo that were
written by AI e.g. something like Claude Code or ChatGPT etc. I don't
think that any of these PRs are written by actual AI bots but rather
that they are "written" by contributors who are using AI tooling.

There are two separate categories:

- Some contributors are making reasonable changes to the code and then
using LLMs to write things like the PR description or comments on
issues.
- Some contributors are basically just vibe coding by having an LLM
write all the code for them and then opening PRs usually with very
obvious problems.

In the first case some people use LLMs to write things like PR
descriptions because English is not their first language. I can
understand this and I think it is definitely possible to do this with
LLMs in a way that is fine but it needs to amount to using them like
Google Translate rather than asking them to write the text. The
problems are that:

- LLM summaries for something like a PR are too verbose and include
lots of irrelevant information making it harder to see what the actual
point is.
- LLMs often include information that is just false such as "fixes
issue #12345" when the issue is not fixed.

I think some people are doing this in a way that is not good and I
would prefer for them to just write in broken English or use Google
Translate or something but I don't see this as a major problem.

For the vibe coding case I think that there is a real problem. Many
SymPy contributors are novices at programming and are nowhere near
experienced enough to be able to turn vibe coding into outputs that
can be included in the codebase. This means that there are just spammy
PRs with false claims about what they do like "fixes X", "10x faster"
etc where the code has not even been lightly tested and clearly does
not work or possibly does not even do anything.

I think what has happened is that the combination of user-friendly
editors with easy git/GitHub integration and LLM agent plugins has
brought us to the point where there are pretty much no technical
barriers preventing someone from opening up gibberish spam PRs while
having no real idea what they are doing.

Really this is just inexperienced people using the tools badly which
is not new. Low quality spammy PRs are not new either. There are some
significant differences though:

- I think that the number of low quality PRs is going to explode. It
was already bad last year in the run up to GSOC (January to March
time) and I think it will be much worse this year.
- I don't think that it is reasonable to give meaningful feedback on
PRs where this happens because the contributor has not spent any time
studying the code that they are changing and any feedback is just
going to be fed into an LLM.

I'm not sure what we can do about this so for now I am regularly
closing low quality PRs without much feedback but some contributors
will just go on to open up new PRs. The "anyone can submit a PR model"
has been under threat for some time but I worry that the whole idea is
going to become unsustainable.

In the context of the Russia-Ukraine war I have often seen references
to the "cost-exchange problem". This refers to the fact that while
both sides have a lot of anti-air defence capability they can still be
overrun by cheap drones because million dollar interceptor missiles
are just too expensive to be used against any large number of incoming
thousand dollar drones. The solution there would be to have some kind
of cheap interceptor like an automatic AA gun that can take out many
cheap drones efficiently even if much less effective against fancier
targets like enemy planes.

The first time I heard about ChatGPT was when I got an email from
StackOverflow saying that any use of ChatGPT was banned. Looking into
it the reason given was that it was just too easy to generate
superficially reasonable text that was low quality spam and then too
much effort for real humans to filter that spam out manually. In other
words bad/incorrect answers were nothing new but large numbers of
inexperienced people using ChatGPT had ruined the cost-exchange ratio
of filtering them out.

I think in the case of SymPy pull requests there is an analogous
"effort-exchange problem". The effort PR reviewers put in to help with
PRs is not reasonable if the author of the PR is not putting in a lot
more effort themselves because there are many times more people trying
to author PRs than review them. I don't think that it can be
sustainable in the face of this spam to review PRs in the same way as
if they had been written by humans who are at least trying to
understand what they are doing (and therefore learning from feedback).
Even just closing PRs and not giving any feedback needs to become more
efficient somehow.

We need some sort of clear guidance or policy on the use of AI that
sets clear explanations like "you still need to understand the code".
I think we will also need to ban people for spam if they are doing
things like opening AI-generated PRs without even testing the code.
The hype that is spun by AI companies probably has many novice
programmers believing that it actually is reasonable to behave like
this but it really is not and that needs to be clearly stated
somewhere. I don't think any of this is malicious but I think that it
has the potential to become very harmful to open source projects.

The situation right now is not so bad but if you project forwards a
bit to when the repo gets a lot busier after Christmas I think this is
going to be a big problem and I think it will only get worse in future
years as well.

It is very unfortunate that right now AI is being used in all the
wrong places. It can do a student's homework because it knows the
answers to all the standard homework problems but it can't do the more
complicated more realistic things and then students haven't learned
anything from doing their homework. In the context of SymPy it would
be so much more useful to have AI doing other things like reviewing
the code, finding bugs, etc rather than helping novices to get a PR
merged without actually investing the time to learn anything from the
process.

--
Oscar

gu...@uwosh.edu

unread,
Oct 25, 2025, 9:06:55 PM (13 days ago) Oct 25
to sympy
Here's a brainstorming idea for how to implement something to address your valid concerns.
How about the following policy?
No review of a pull request will occur unless it meets certain minimum requirements:
1) It passes all pre-existing tests;
2) It includes test coverage for all new code;
3) It includes tests covering any bug fixes.

I can see how to implement #1 automatically. Could #2 be implemented using one of the coverage testing tools? My experience with those is limited. It also would require some work to make sure new tests cover all changed code. I think this would clear out a lot of the very low quality, doesn't work or does nothing code. However, I see a couple of problems as well:
1) What happens if the bug fix is to an erroneous test?
2) This does not address low quality descriptions of the PR and its goals.
3) People who are just learning the code base will need a way to get help on running and fixing issues with testing. I think contributors might have to be in the position of asking for help on this list with issues of that sort or maybe there should be a specific venue for that.

Just some ideas to help start the ball rolling.

Jonathan

Jason Moore

unread,
Oct 26, 2025, 2:16:04 AM (13 days ago) Oct 26
to sy...@googlegroups.com
Hi Oscar,

Thanks for raising this. I agree, this problem will grow and it is not good. I think we should have a policy about LLM generated contributions. It would be nice if a SYMPEP was drafted for one.

Having a standard way to reject spam PRs would be helpful. If we could close a PR and add a label to trigger sympybot to leave a comment that says "This PR does not meet SymPy's quality standards for AI generated code and comments, see policy <link>" could be helpful. It still requires manual steps from reviewers.

I also share the general concern expressed by some in the scipy ecosystem here:


which is that LLMs universally violate copyright licenses of open source code. If this is true, then PRs with LLM generated code are polluting SymPy's codebase with copyright violations.

Jason


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxQ1ntG0EWBGihrXErLhGuABHH7Kt5RmGJvp9bHcqaC5%3DQ%40mail.gmail.com.

Jason Moore

unread,
Oct 26, 2025, 2:27:53 AM (13 days ago) Oct 26
to sy...@googlegroups.com

Oscar Benjamin

unread,
Oct 26, 2025, 3:24:40 PM (12 days ago) Oct 26
to sy...@googlegroups.com
On Sun, 26 Oct 2025 at 01:06, 'gu...@uwosh.edu' via sympy
<sy...@googlegroups.com> wrote:
>
> Here's a brainstorming idea for how to implement something to address your valid concerns.
> How about the following policy?
> No review of a pull request will occur unless it meets certain minimum requirements:
> 1) It passes all pre-existing tests;
> 2) It includes test coverage for all new code;
> 3) It includes tests covering any bug fixes.

I think that what will happen is that the author will pass these
instructions to the LLM agent and then the agent will generate some
code that superficially resembles meeting these criteria. Then the PR
description will have a bunch of emoji-filled bullet points
redundantly stating that it meets those criteria.

I'm not going to point to the specific PR but I closed one where the
description had a statement in it like

"You can run the tests with `pytest sympy/foo/bar`"

That is literally an instruction from the LLM to the user for how they
can test the generated code and if you actually run the test command
it clearly shows that the code doesn't work. It was still submitted in
that form as a spam PR though.

Of course the tests in CI did not pass and it is not hard to see the
problem in that case but other cases can be more subtle than this. It
is not hard to generate code that passes all existing tests, includes
coverage etc while still being gibberish and this is really the
problem with using LLMs. There isn't any substitute in this situation
for actual humans doing real thinking.

--
Oscar

Oscar Benjamin

unread,
Oct 26, 2025, 3:30:05 PM (12 days ago) Oct 26
to sy...@googlegroups.com
Yes, the copyright is a big problem. I don't think I would say that
LLMs universally violate copyright e.g. if used for autocompleting an
obvious line of code or many other tasks. There are certain basic
things like x += 1 that cannot reasonably be considered to be under
copyright even if they do appear in much code. Clearly though an LLM
can produce a large body of code where the only meaningful
interpretation is that the code has been "copied" from one or two
publicly available codebases.

The main difficulty I think with having a policy about the use of LLMs
is that unless it begins by saying "no LLMs" then it somehow needs to
begin by acknowledging what a reasonable use can be which means
confronting the copyright issue up front.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAP7f1AhXNE-UapwEm1bQW9de%3Di%2BWixFen5sTp8MsCMScsqA-%3Dg%40mail.gmail.com.

Aaron Meurer

unread,
Oct 30, 2025, 2:08:12 PM (8 days ago) Oct 30
to sy...@googlegroups.com
I like the Ghostty policy, which is that AI coding assistance is
allowed, but it must be disclosed
https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md#ai-assistance-notice.
It should also be our policy that the person submitting the code is
ultimately responsible for it, regardless of what tools were used to
create it.

I think it would be a mistake to ban AI usage entirely because AI can
be very useful if used properly, i.e., you review the code it writes
before submitting it.

For me the copyright question doesn't really pass the smell test, at
least for the majority of the use-cases where I would use AI in SymPy.
For example, if I use AI to generate some fix for some part of SymPy,
say the polynomials module, then where would that fix have "come from"
for it to be a copyright violation? Where else in the world is there
code that looks like the SymPy polynomials module? Most code in SymPy
is very unique to SymPy. The only place it could have possibly come
from is SymPy itself, but if SymPy already had it then the code
wouldn't be needed in the first place (and anyways that wouldn't be a
copyright violation). I think there's a misconception that LLMs can
only generate text that they've already seen before, and if you
believe that misconception then it would be easy to believe that
everything generated by an LLM is a copyright violation. But this is
something that is very easily seen to not be true if you spend any
amount of time using coding tools.

As for PR descriptions, I agree those should always be hand-written.
But that's always been a battle, even before AI. And similarly almost
no one writes real commit messages anymore.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxSW_5u4Qvj5kddZUQzzNdkteTZ5GJX46D_c3Gko87Dj%2Bg%40mail.gmail.com.

Jason Moore

unread,
Oct 30, 2025, 2:50:55 PM (8 days ago) Oct 30
to sy...@googlegroups.com
I don't think it is a terrible idea to simply have a "no LLMs" policy at this point in time. We can always change it in the future as things get clearer. People will still use them in their LLM enhanced editors, of course, and we can never detect the basic uses of the tools. But if people submit large chunks of text and code that have hallmarks of full generation from an LLM, then we can reject and point to the policy.

As for the smell test and misconceptions about what an LLM can produce, this may depend on whether you think only a literal copy of something violates copyright or if also a derivative of something violates copyright. I think the essential question lies in whether the code a LLM produces is a derivative of copyrighted code. There are many past court cases ruling that derivatives are copyright violations in the US and the OSS licenses almost all state that derivatives fall under the license. I doubt the LLM can produce a fix to the polynomials module if the only training data was the polynomials module. An LLM relies entirely on training on a vast corpus of works and generating code from all of that large body. Now, is that output then a derivative of one, some, or all of the training data? That is to be determined by those that rule on laws (hopefully). Given that we have spent about 40 years collectively trying to protect open source code with copyright licenses, it seems terribly wrong that if you can make your copy source large enough that you no longer have to abide by the licenses.

Paul Ivanov and Matthew Brett have done a good job explaining this nuance here: https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md

My personal opinion is that the LLMs should honor the licenses of the training set and if they did, then all is good. I have no idea how they can solve that from a technical perspective, but the companies are simply ignoring copyright and claiming they are above such laws and that all that they do is fair use. We plebes do not get that same ruling.

Jason


Francesco Bonazzi

unread,
Nov 5, 2025, 5:50:23 PM (2 days ago) Nov 5
to sympy
Maybe it should be made mandatory to disclose any usage of LLM when opening PRs.

Banning usage of LLM completely is a bit extreme, but it may be necessary if vibe spammers keep flooding github with useless PRs.
 
Reply all
Reply to author
Forward
0 new messages