AI generated pull requests

488 views
Skip to first unread message

Oscar Benjamin

unread,
Oct 25, 2025, 6:46:10 PMOct 25
to sympy
Hi all,

I am increasingly seeing pull requests in the SymPy repo that were
written by AI e.g. something like Claude Code or ChatGPT etc. I don't
think that any of these PRs are written by actual AI bots but rather
that they are "written" by contributors who are using AI tooling.

There are two separate categories:

- Some contributors are making reasonable changes to the code and then
using LLMs to write things like the PR description or comments on
issues.
- Some contributors are basically just vibe coding by having an LLM
write all the code for them and then opening PRs usually with very
obvious problems.

In the first case some people use LLMs to write things like PR
descriptions because English is not their first language. I can
understand this and I think it is definitely possible to do this with
LLMs in a way that is fine but it needs to amount to using them like
Google Translate rather than asking them to write the text. The
problems are that:

- LLM summaries for something like a PR are too verbose and include
lots of irrelevant information making it harder to see what the actual
point is.
- LLMs often include information that is just false such as "fixes
issue #12345" when the issue is not fixed.

I think some people are doing this in a way that is not good and I
would prefer for them to just write in broken English or use Google
Translate or something but I don't see this as a major problem.

For the vibe coding case I think that there is a real problem. Many
SymPy contributors are novices at programming and are nowhere near
experienced enough to be able to turn vibe coding into outputs that
can be included in the codebase. This means that there are just spammy
PRs with false claims about what they do like "fixes X", "10x faster"
etc where the code has not even been lightly tested and clearly does
not work or possibly does not even do anything.

I think what has happened is that the combination of user-friendly
editors with easy git/GitHub integration and LLM agent plugins has
brought us to the point where there are pretty much no technical
barriers preventing someone from opening up gibberish spam PRs while
having no real idea what they are doing.

Really this is just inexperienced people using the tools badly which
is not new. Low quality spammy PRs are not new either. There are some
significant differences though:

- I think that the number of low quality PRs is going to explode. It
was already bad last year in the run up to GSOC (January to March
time) and I think it will be much worse this year.
- I don't think that it is reasonable to give meaningful feedback on
PRs where this happens because the contributor has not spent any time
studying the code that they are changing and any feedback is just
going to be fed into an LLM.

I'm not sure what we can do about this so for now I am regularly
closing low quality PRs without much feedback but some contributors
will just go on to open up new PRs. The "anyone can submit a PR model"
has been under threat for some time but I worry that the whole idea is
going to become unsustainable.

In the context of the Russia-Ukraine war I have often seen references
to the "cost-exchange problem". This refers to the fact that while
both sides have a lot of anti-air defence capability they can still be
overrun by cheap drones because million dollar interceptor missiles
are just too expensive to be used against any large number of incoming
thousand dollar drones. The solution there would be to have some kind
of cheap interceptor like an automatic AA gun that can take out many
cheap drones efficiently even if much less effective against fancier
targets like enemy planes.

The first time I heard about ChatGPT was when I got an email from
StackOverflow saying that any use of ChatGPT was banned. Looking into
it the reason given was that it was just too easy to generate
superficially reasonable text that was low quality spam and then too
much effort for real humans to filter that spam out manually. In other
words bad/incorrect answers were nothing new but large numbers of
inexperienced people using ChatGPT had ruined the cost-exchange ratio
of filtering them out.

I think in the case of SymPy pull requests there is an analogous
"effort-exchange problem". The effort PR reviewers put in to help with
PRs is not reasonable if the author of the PR is not putting in a lot
more effort themselves because there are many times more people trying
to author PRs than review them. I don't think that it can be
sustainable in the face of this spam to review PRs in the same way as
if they had been written by humans who are at least trying to
understand what they are doing (and therefore learning from feedback).
Even just closing PRs and not giving any feedback needs to become more
efficient somehow.

We need some sort of clear guidance or policy on the use of AI that
sets clear explanations like "you still need to understand the code".
I think we will also need to ban people for spam if they are doing
things like opening AI-generated PRs without even testing the code.
The hype that is spun by AI companies probably has many novice
programmers believing that it actually is reasonable to behave like
this but it really is not and that needs to be clearly stated
somewhere. I don't think any of this is malicious but I think that it
has the potential to become very harmful to open source projects.

The situation right now is not so bad but if you project forwards a
bit to when the repo gets a lot busier after Christmas I think this is
going to be a big problem and I think it will only get worse in future
years as well.

It is very unfortunate that right now AI is being used in all the
wrong places. It can do a student's homework because it knows the
answers to all the standard homework problems but it can't do the more
complicated more realistic things and then students haven't learned
anything from doing their homework. In the context of SymPy it would
be so much more useful to have AI doing other things like reviewing
the code, finding bugs, etc rather than helping novices to get a PR
merged without actually investing the time to learn anything from the
process.

--
Oscar

gu...@uwosh.edu

unread,
Oct 25, 2025, 9:06:55 PMOct 25
to sympy
Here's a brainstorming idea for how to implement something to address your valid concerns.
How about the following policy?
No review of a pull request will occur unless it meets certain minimum requirements:
1) It passes all pre-existing tests;
2) It includes test coverage for all new code;
3) It includes tests covering any bug fixes.

I can see how to implement #1 automatically. Could #2 be implemented using one of the coverage testing tools? My experience with those is limited. It also would require some work to make sure new tests cover all changed code. I think this would clear out a lot of the very low quality, doesn't work or does nothing code. However, I see a couple of problems as well:
1) What happens if the bug fix is to an erroneous test?
2) This does not address low quality descriptions of the PR and its goals.
3) People who are just learning the code base will need a way to get help on running and fixing issues with testing. I think contributors might have to be in the position of asking for help on this list with issues of that sort or maybe there should be a specific venue for that.

Just some ideas to help start the ball rolling.

Jonathan

Jason Moore

unread,
Oct 26, 2025, 2:16:04 AMOct 26
to sy...@googlegroups.com
Hi Oscar,

Thanks for raising this. I agree, this problem will grow and it is not good. I think we should have a policy about LLM generated contributions. It would be nice if a SYMPEP was drafted for one.

Having a standard way to reject spam PRs would be helpful. If we could close a PR and add a label to trigger sympybot to leave a comment that says "This PR does not meet SymPy's quality standards for AI generated code and comments, see policy <link>" could be helpful. It still requires manual steps from reviewers.

I also share the general concern expressed by some in the scipy ecosystem here:


which is that LLMs universally violate copyright licenses of open source code. If this is true, then PRs with LLM generated code are polluting SymPy's codebase with copyright violations.

Jason


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxQ1ntG0EWBGihrXErLhGuABHH7Kt5RmGJvp9bHcqaC5%3DQ%40mail.gmail.com.

Jason Moore

unread,
Oct 26, 2025, 2:27:53 AMOct 26
to sy...@googlegroups.com

Oscar Benjamin

unread,
Oct 26, 2025, 3:24:40 PMOct 26
to sy...@googlegroups.com
On Sun, 26 Oct 2025 at 01:06, 'gu...@uwosh.edu' via sympy
<sy...@googlegroups.com> wrote:
>
> Here's a brainstorming idea for how to implement something to address your valid concerns.
> How about the following policy?
> No review of a pull request will occur unless it meets certain minimum requirements:
> 1) It passes all pre-existing tests;
> 2) It includes test coverage for all new code;
> 3) It includes tests covering any bug fixes.

I think that what will happen is that the author will pass these
instructions to the LLM agent and then the agent will generate some
code that superficially resembles meeting these criteria. Then the PR
description will have a bunch of emoji-filled bullet points
redundantly stating that it meets those criteria.

I'm not going to point to the specific PR but I closed one where the
description had a statement in it like

"You can run the tests with `pytest sympy/foo/bar`"

That is literally an instruction from the LLM to the user for how they
can test the generated code and if you actually run the test command
it clearly shows that the code doesn't work. It was still submitted in
that form as a spam PR though.

Of course the tests in CI did not pass and it is not hard to see the
problem in that case but other cases can be more subtle than this. It
is not hard to generate code that passes all existing tests, includes
coverage etc while still being gibberish and this is really the
problem with using LLMs. There isn't any substitute in this situation
for actual humans doing real thinking.

--
Oscar

Oscar Benjamin

unread,
Oct 26, 2025, 3:30:05 PMOct 26
to sy...@googlegroups.com
Yes, the copyright is a big problem. I don't think I would say that
LLMs universally violate copyright e.g. if used for autocompleting an
obvious line of code or many other tasks. There are certain basic
things like x += 1 that cannot reasonably be considered to be under
copyright even if they do appear in much code. Clearly though an LLM
can produce a large body of code where the only meaningful
interpretation is that the code has been "copied" from one or two
publicly available codebases.

The main difficulty I think with having a policy about the use of LLMs
is that unless it begins by saying "no LLMs" then it somehow needs to
begin by acknowledging what a reasonable use can be which means
confronting the copyright issue up front.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAP7f1AhXNE-UapwEm1bQW9de%3Di%2BWixFen5sTp8MsCMScsqA-%3Dg%40mail.gmail.com.

Aaron Meurer

unread,
Oct 30, 2025, 2:08:12 PMOct 30
to sy...@googlegroups.com
I like the Ghostty policy, which is that AI coding assistance is
allowed, but it must be disclosed
https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md#ai-assistance-notice.
It should also be our policy that the person submitting the code is
ultimately responsible for it, regardless of what tools were used to
create it.

I think it would be a mistake to ban AI usage entirely because AI can
be very useful if used properly, i.e., you review the code it writes
before submitting it.

For me the copyright question doesn't really pass the smell test, at
least for the majority of the use-cases where I would use AI in SymPy.
For example, if I use AI to generate some fix for some part of SymPy,
say the polynomials module, then where would that fix have "come from"
for it to be a copyright violation? Where else in the world is there
code that looks like the SymPy polynomials module? Most code in SymPy
is very unique to SymPy. The only place it could have possibly come
from is SymPy itself, but if SymPy already had it then the code
wouldn't be needed in the first place (and anyways that wouldn't be a
copyright violation). I think there's a misconception that LLMs can
only generate text that they've already seen before, and if you
believe that misconception then it would be easy to believe that
everything generated by an LLM is a copyright violation. But this is
something that is very easily seen to not be true if you spend any
amount of time using coding tools.

As for PR descriptions, I agree those should always be hand-written.
But that's always been a battle, even before AI. And similarly almost
no one writes real commit messages anymore.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxSW_5u4Qvj5kddZUQzzNdkteTZ5GJX46D_c3Gko87Dj%2Bg%40mail.gmail.com.

Jason Moore

unread,
Oct 30, 2025, 2:50:55 PMOct 30
to sy...@googlegroups.com
I don't think it is a terrible idea to simply have a "no LLMs" policy at this point in time. We can always change it in the future as things get clearer. People will still use them in their LLM enhanced editors, of course, and we can never detect the basic uses of the tools. But if people submit large chunks of text and code that have hallmarks of full generation from an LLM, then we can reject and point to the policy.

As for the smell test and misconceptions about what an LLM can produce, this may depend on whether you think only a literal copy of something violates copyright or if also a derivative of something violates copyright. I think the essential question lies in whether the code a LLM produces is a derivative of copyrighted code. There are many past court cases ruling that derivatives are copyright violations in the US and the OSS licenses almost all state that derivatives fall under the license. I doubt the LLM can produce a fix to the polynomials module if the only training data was the polynomials module. An LLM relies entirely on training on a vast corpus of works and generating code from all of that large body. Now, is that output then a derivative of one, some, or all of the training data? That is to be determined by those that rule on laws (hopefully). Given that we have spent about 40 years collectively trying to protect open source code with copyright licenses, it seems terribly wrong that if you can make your copy source large enough that you no longer have to abide by the licenses.

Paul Ivanov and Matthew Brett have done a good job explaining this nuance here: https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md

My personal opinion is that the LLMs should honor the licenses of the training set and if they did, then all is good. I have no idea how they can solve that from a technical perspective, but the companies are simply ignoring copyright and claiming they are above such laws and that all that they do is fair use. We plebes do not get that same ruling.

Jason


Francesco Bonazzi

unread,
Nov 5, 2025, 5:50:23 PMNov 5
to sympy
Maybe it should be made mandatory to disclose any usage of LLM when opening PRs.

Banning usage of LLM completely is a bit extreme, but it may be necessary if vibe spammers keep flooding github with useless PRs.
 

Oscar Benjamin

unread,
Nov 12, 2025, 9:01:46 AMNov 12
to sy...@googlegroups.com
On Wed, 5 Nov 2025 at 22:50, Francesco Bonazzi <franz....@gmail.com> wrote:
>
> Maybe it should be made mandatory to disclose any usage of LLM when opening PRs.

There needs to be at the very least a policy that bans using LLMs
without saying that they were used. There should be checkboxes when
opening a PR:

[x] I have or have not used an LLM
[x] I have checked the code from the LLM
[x] I have tested the code myself
[x] I do/don't understand all of the code that was generated by the LLM.

> Banning usage of LLM completely is a bit extreme, but it may be necessary if vibe spammers keep flooding github with useless PRs.

Using LLMs to generate comments and PR descriptions should just be
banned outright with an exception for google translate type use only.
Obviously this cannot be enforced but the messaging needs to be clear:
don't dump LLM output as if it is a comment from yourself.

As for use of LLMs for writing code I don't necessarily object to
simple autocomplete type LLMs for convenience but I think if you look
at typical PRs and contributors right now they really should not be
using LLMs at all. The LLMs only seem to help people to make
time-wasting vibe-code PRs where the "author" has no understanding of
the code they are submitting and has not even done basic testing.

--
Oscar

Daiki Takahashi

unread,
Nov 13, 2025, 7:03:52 AMNov 13
to sympy
Let me make this clear upfront: all of my posts on GitHub, including this one, rely on translation by an LLM.

I believe it would be reasonable to explicitly state in the policy that spam-like PRs and PRs relying heavily on LLMs are prohibited.
Along with that, the policy should also clarify that such PRs may be proactively closed without prior notice,
and that there should be a clear process for appealing an incorrect closure.

To reduce the review burden, one possible approach would be to require all PRs to undergo an initial review by Copilot before human review.
However, I am not sure how capable Copilot actually is.

2025年11月12日水曜日 23:01:46 UTC+9 Oscar:

Oscar Benjamin

unread,
Nov 13, 2025, 8:36:30 AMNov 13
to sy...@googlegroups.com
On Thu, 13 Nov 2025 at 12:03, Daiki Takahashi <har...@gmail.com> wrote:
>
> Let me make this clear upfront: all of my posts on GitHub, including this one, rely on translation by an LLM.
>
> I believe it would be reasonable to explicitly state in the policy that spam-like PRs and PRs relying heavily on LLMs are prohibited.

It is very difficult to define what is meant by "spam-like" and I
doubt that someone submitting a PR would understand this in the same
way as reviewers would.

There are different ways of using LLMs and the way that you use them
is absolutely fine. The way that many novice contributors use them is
not useful at all though and at least right now is harmful to sympy
development. I'm not sure how to define the difference between those
in a policy though.

> Along with that, the policy should also clarify that such PRs may be proactively closed without prior notice,
> and that there should be a clear process for appealing an incorrect closure.

Realistically I think in some cases this is the only option. Just
deciding to close them is still a burden though.

> To reduce the review burden, one possible approach would be to require all PRs to undergo an initial review by Copilot before human review.
> However, I am not sure how capable Copilot actually is.

I don't know about Copilot specifically and actually there are many
things called "copilot". I have used "GitHub Copilot" which is an
editor plugin for autocomplete but now there is a "Copilot" button on
the GitHub website that is something different (more like ChatGPT).
Does anyone have any experience of using that?

I see better potential in using AI to help out with reviewing PRs than
having people use AI to write the PRs. Many PRs need quite simple
feedback like "this should have tests. Please add a test in file f"
that could easily be handled by AI (and probably in a more patient,
friendly and helpful way than feedback from human reviewers such as
myself).

Somewhere someone suggested using CodeRabbit which I have seen on some
other repos. I haven't seen it produce anything useful but supposedly
it gets better if you "teach" it.

--
Oscar

Daiki Takahashi

unread,
Nov 14, 2025, 9:24:50 AMNov 14
to sympy
I created a PR and tried having GitHub Copilot review it as a test.
We can request a Copilot review directly from the PR page.
However, it requires Premium requests, so not everyone can use this feature.

I'm not sure how useful it really is, but it’s certainly convenient to try.


document:
-- 
haru-44

2025年11月13日木曜日 22:36:30 UTC+9 Oscar:
Message has been deleted

Francesco Bonazzi

unread,
Nov 14, 2025, 2:15:51 PMNov 14
to sympy
Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.

Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.

On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
However, it requires Premium requests, so not everyone can use this feature.

 Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.

LLMs look good at first because most questions had answers in their training set, as soon as you ask an LLM to do anything non-standard or just fix existing code in a way that is not trivial, they fail miserably.

Aaron Meurer

unread,
Nov 15, 2025, 1:06:56 PMNov 15
to sy...@googlegroups.com
On Fri, Nov 14, 2025 at 12:15 PM Francesco Bonazzi
<franz....@gmail.com> wrote:
>
> Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.
>
> Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.

"Finishing the Risch algorithm" is an enormous task. Of course an LLM
cannot one-shot that, and a human couldn't do it in one PR either. But
I have actually been using Claude Code to do some improvements to the
Risch algorithm and it's been working. So the statement that an LLM
cannot help with algorithmic improvements in SymPy is false.

>
> On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
>
> However, it requires Premium requests, so not everyone can use this feature.
>
>
> Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.

I can't speak to the specific tool being mentioned here, but the best
LLMs right now do require you to pay for them. If a tool is good and
actually improves developer quality of life, we shouldn't be afraid to
pay for it (that applies even outside of AI tools).

FWIW, when it comes to code review, my suggestion would be to use a
local LLM tool like claude code or codex to do the review for you. It
wouldn't be completely automated, but that would give you the best
results. I also agree that writing down the sorts of things you're
looking for in a SymPy review somewhere in the context is going to
make it work better. I would start by having an agent analyze recent
reviews (say, the 50 most recent PR reviews by Oscar), and use that to
construct a Markdown document listing the sorts of things that a good
reviewer is looking for in a review of a SymPy pull request.

>
> LLMs look good at first because most questions had answers in their training set, as soon as you ask an LLM to do anything non-standard or just fix existing code in a way that is not trivial, they fail miserably.

This was true three years ago with GPT 3 but it is not true anymore. I
encourage you to try using GPT-5 codex or Claude 4.5 Sonnet, ideally
using a modern tool like codex CLI, claude code, or Cursor. These
models are very good and can reason about problems they've never seen
before. They still have holes and you have to check everything they do
still, but you can't just assume that something isn't going to work
without trying it.

Even if you have moral qualms against AI (which I personally do not
share), you shouldn't let those give you the wrong impression about
the capabilities of these tools, especially the best-in-class models
like Claude.

Aaron Meurer

>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/3c225245-11be-49d4-8f50-f7c2f0010c44n%40googlegroups.com.

Oscar Benjamin

unread,
Nov 15, 2025, 2:25:03 PMNov 15
to sy...@googlegroups.com
On Sat, 15 Nov 2025 at 18:06, Aaron Meurer <asme...@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 12:15 PM Francesco Bonazzi
> <franz....@gmail.com> wrote:
> >
> > Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.
> >
> > Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.
>
> "Finishing the Risch algorithm" is an enormous task. Of course an LLM
> cannot one-shot that, and a human couldn't do it in one PR either. But
> I have actually been using Claude Code to do some improvements to the
> Risch algorithm and it's been working. So the statement that an LLM
> cannot help with algorithmic improvements in SymPy is false.

There is a huge difference between someone who knows what they are
doing, knows the codebase, understands the algorithm etc using an LLM
and reviewing the results as compared to a typical new SymPy
contributor using an LLM to write the code for them. If you are
someone who could write or review the code without using an LLM then
using an LLM and checking the results is reasonable.

By the way your most recent PR is here Aaron:
https://github.com/sympy/sympy/pull/28464
The PR looks good to me apart from the one bit where you said "Claude
generated this fix for the geometry test failure. It would be good to
review". I reviewed it and decided that it looks like a hacky fix and
showed a counterexample of the type that would break that code. The
LLM output cannot be trusted and does not substitute for humans
investigating and writing the code.

From my perspective as a reviewer "this is LLM code. It would be good
to review" is asking reviewers to do what would usually be expected to
be done by the author of the PR in the first instance. That is only a
small example but it shows the broader problem that I think LLMs will
cause in sympy by shifting greater burden onto reviewers while making
it easier for authors to generate more and more PRs to review.

If you are new to programming in general and new to a particular
codebase and then use an LLM to generate code that you don't
understand then the results are not going to be good. There have
always been PRs where the author clearly does not understand the
codebase well or does not understand all of the implications of the
particular choices made in the code. Now though there are PRs where
the author has no idea why the LLM wrote any of the code that it did
and has not even done the most basic of testing and can only respond
to feedback by copying it into an LLM.

What I think people submitting these PRs right now don't realise is
that when I see the LLM-generated PR description or comments, or the
LLM-generated code that they probably don't understand, it removes all
motivation to review any of their PRs and removes any level of trust
that I might give them from the usual benefit of the doubt. I am
mentally blacklisting contributors based on what I consider acceptable
even if there is not an agreed general policy.

I refuse to review PRs from newer contributors if this is the way that
it is going to happen so each of these needs to be reviewed by someone
else or can join the ever growing pile of unreviewed PRs.

> > On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
> >
> > However, it requires Premium requests, so not everyone can use this feature.
> >
> >
> > Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.
>
> I can't speak to the specific tool being mentioned here, but the best
> LLMs right now do require you to pay for them. If a tool is good and
> actually improves developer quality of life, we shouldn't be afraid to
> pay for it (that applies even outside of AI tools).
>
> FWIW, when it comes to code review, my suggestion would be to use a
> local LLM tool like claude code or codex to do the review for you. It
> wouldn't be completely automated, but that would give you the best
> results. I also agree that writing down the sorts of things you're
> looking for in a SymPy review somewhere in the context is going to
> make it work better. I would start by having an agent analyze recent
> reviews (say, the 50 most recent PR reviews by Oscar), and use that to
> construct a Markdown document listing the sorts of things that a good
> reviewer is looking for in a review of a SymPy pull request.

I was thinking more that an LLM bot on GitHub could handle basic
things like how to write the .mailmap file or how to add a test or run
the tests, fix trailing whitespace, interpret CI output and so on. It
would be good to have things like some tooling to identify older
related PRs or issues and say "maybe this fixes gh-1234 so a test
could be added for that" and other things of that nature as well.

I don't actually want an LLM to review the real code changes but I can
see the value in having LLMs help with some of the tedious back and
forth so that a contributor gets rapid help and when a human reviewer
gets to it the PR is more likely in a state that is ready to merge.

--
Oscar

Daiki Takahashi

unread,
Nov 16, 2025, 12:25:48 AMNov 16
to sympy
> I was thinking more that an LLM bot on GitHub could handle basic
> things like how to write the .mailmap file or how to add a test or run
> the tests, fix trailing whitespace, interpret CI output and so on.

I was thinking along these lines as well. Thanks to the current automated checks in CI,
reviewers no longer need to spend time thinking about things that don't require human judgment.
For the time being, I'd like to see LLMs further expand that area -- handling more of the tasks that
can be automated so reviewers can focus on the parts that really matter.

-- 
haru-44

2025年11月16日日曜日 4:25:03 UTC+9 Oscar:

Francesco Bonazzi

unread,
Nov 16, 2025, 4:21:15 AMNov 16
to sympy
The best way to use LLMs is as smart lookups. They have been trained on many books containing details of known algorithms, so LLMs can be used to get some clues from their knowledge, which would otherwise require a lot of time dedicated to reading and finding the correct paragraph explaining what you need in a book.

I have tried using LLMs for coding, the main problems I see are:
  1. they keep hallucinating and making up non-existent APIs quite often, or sometimes even utterly wrong code,
  2. LLMs apparently have no clues on understanding and continuing the existing code... each time they keep rewriting a lot of stuff.
Querying an LLM for suggestions should be seen as some evolution of looking up on Stackoverflow. However, care should be taken, because, unlike answers on Stackoverflow that are human-verified, LLM answers may be just wrong.

If LLMs are used the right way, they may help with being more productive. Unfortunately, my fear is that many developers will start using them without proper supervision. Average number of lines of code written will very likely increase, but so will the number of bugs.

Oscar Benjamin

unread,
Dec 1, 2025, 12:48:23 PMDec 1
to sy...@googlegroups.com
On Sun, 16 Nov 2025 at 09:21, Francesco Bonazzi <franz....@gmail.com> wrote:
>
> If LLMs are used the right way, they may help with being more productive. Unfortunately, my fear is that many developers will start using them without proper supervision. Average number of lines of code written will very likely increase, but so will the number of bugs.

I think that for a novice SymPy contributor the only right way to use
LLMs can be something like to ask questions about the codebase. Using
them to write the code just means skipping the thinking process that
would be a prerequisite for being able to supervise the LLMs in
writing and checking the code properly.

So far I have avoided pointing at individual pull requests in this
discussion but this one jumped out at me just now:

https://github.com/sympy/sympy/pull/28681

It was opened 6 hours ago by an entirely new contributor and in under
an hour grew to 800 lines of new code.

The author of the PR has also opened a PR in their own repo using the
same branch and over there you can see it being reviewed by
coderabbit:

https://github.com/Cprakhar/sympy/pull/1

There is a comment from coderabbit there that says "Here are the
copyable unit test edits:" and then shows hundreds of lines of unit
test code that seem to have been copied into the PR.

I don't know whether the code in the PR is reasonable. It looks very
LLM-style verbose/duplicative but besides that I don't want to review
it in any detail if the author hasn't spent time doing that
themselves.

Oscar

Francesco Bonazzi

unread,
Dec 8, 2025, 6:15:55 PM (11 days ago) Dec 8
to sympy
I fear that AI bots will start opening PRs soon (or maybe they are already doing it). AI can impersonate human conversation pretty well. The purpose of such bots is to use human feedback just to collect data.

Oscar Benjamin

unread,
Dec 9, 2025, 5:58:08 PM (10 days ago) Dec 9
to sy...@googlegroups.com
On Mon, 8 Dec 2025 at 23:15, Francesco Bonazzi <franz....@gmail.com> wrote:
>
> I fear that AI bots will start opening PRs soon (or maybe they are already doing it). AI can impersonate human conversation pretty well. The purpose of such bots is to use human feedback just to collect data.

I am actually getting emails roughly once a week right now from AI
companies offering to pay me to review AI generated PRs but I have not
replied to any of them.

I don't think that we are seeing AI bots though. It is just humans
using AI tools sometimes in a reasonable way but more often badly.

We absolutely need to have a policy about this that insists that use
of AI to write the code needs to be disclosed. A policy should clearly
state that it is not acceptable to submit AI generated code if it is
not code that you understand yourself and should explain why this is
bad and what you should do instead.

Regardless of whether the policy is enforceable I think people need to
see a clear statement of what is a reasonable way of going about
things. Honestly I don't blame people for thinking that having an AI
just write all the code is the modern way with all the hype around
this.

Right now the majority of PRs opened are from people who have used
some AI tool to write the code. They have trusted the code in
deference to the AI's seemingly superior capabilities and knowledge
and just launched it into a PR.

The end result is that most PRs now are unchecked LLM output. It is a
waste of time to review these as long as the author thinks that
submitting unchecked LLM output is reasonable because any review
comments are just typed into the LLM and the LLM even writes their
comments in reply.

If we were talking about this in the context of software developers
working in a company together then I think that there could be all
sorts of ways of managing this. In the context of an open source
project having loads of people appearing from nowhere and spewing LLMs
into PRs is unmanageable.

--
Oscar

Anand Bansal

unread,
Dec 16, 2025, 6:37:26 AM (4 days ago) Dec 16
to sympy
The problem of AI generated code is happening all across the open source. One case that I am close is https://github.com/paradigmxyz/reth. What I think they are doing is just adding a very strict CI for everything and again reviewers have to put so much effort.

Arka Saha

unread,
Dec 16, 2025, 12:48:15 PM (3 days ago) Dec 16
to sympy
I read this discussion on AI bots performing PRs and indeed it's an issue, and would probably be a major issue in the next few months. But again, if the code so generated, even by a AI bot, if it performs well, benefits the organisation and solves the issue, then I don't see why we should demotivate such cases. Also yes, it would need proper fine tuning of LLM if someone aims to achieve this, training it to make proper comments, PRs, branch names etc. 
And also yes sometimes a adversary might misuse such a bot, that's what we need to prevent.
Again, opinions might vary, and yeah I would always prefer to write my own code lol and not use any bot to make PRs, I prefer getting my hands dirty.

Oscar Benjamin

unread,
Dec 16, 2025, 1:47:55 PM (3 days ago) Dec 16
to sy...@googlegroups.com
On Tue, 16 Dec 2025 at 17:48, Arka Saha <i.am.ar...@gmail.com> wrote:
>
> I read this discussion on AI bots performing PRs and indeed it's an issue, and would probably be a major issue in the next few months. But again, if the code so generated, even by a AI bot, if it performs well, benefits the organisation and solves the issue, then I don't see why we should demotivate such cases.

It isn't good code though and it doesn't benefit the organisation or
solve any issues.

What we are seeing is really just spam with many more pull requests of
much lower general quality. Even if some of them are good, the review
process is overloaded to separate the good ones from the rest.

The problem is that AI is enabling novices to generate low effort PRs
much more easily at the same time as making it harder to review those
PRs because superficially they look good but actually everything about
them is wrong in ways that are hard to predict or understand without
close attention.

Because the AI can write the code and open the pull request and answer
all of the questions about how to do all of those steps, people think
it is acceptable to do that without having done basic things like:

- Reading any of the code (before or after the changes).
- Thinking about any of the code or changes themselves.
- Knowing what changes are even in the PR that they have submitted.
- Knowing how to test changes to the code or how to run the test suite.

Previously it was not really possible to get to the point of having a
PR that passes CI checks without spending some time doing these
things. Now it is possible to skip all of those steps and then produce
a garbage PR that superficially looks reasonable while actually being
entirely wrong.

People at this level really are not benefitting from the use of AI. If
they learned how to do things without AI then they might become
capable of using the AI to produce something good in future.

--
Oscar

Sham S

unread,
Dec 17, 2025, 8:28:42 AM (2 days ago) Dec 17
to sy...@googlegroups.com
I agree with Oscar on AI generated PRs. Not only sympy many other open source projects are experiencing this issue, I see most of them are closed without merging as they are verbose and often contain false claims about what they do or haven't passed basic unit tests. I think having a clear policy on disclosure and understanding of the code is the only way to manage this moving forward.



--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

Peter Stahlecker

unread,
Dec 17, 2025, 10:44:38 AM (2 days ago) Dec 17
to sympy
I have been following this discussion for a while.
What I do not understand is this: why would anybody want to push a PR which he does not understand?
This seems to take out all the fun.


Francesco Bonazzi

unread,
Dec 17, 2025, 12:53:28 PM (2 days ago) Dec 17
to sympy
On Wednesday, December 17, 2025 at 4:44:38 p.m. UTC+1 peter.st...@gmail.com wrote:

What I do not understand is this: why would anybody want to push a PR which he does not understand?
This seems to take out all the fun.

This is a good question indeed. I have some theories:
  1.  People opening PR using AI seem to also chat using AI. I suspect some of the might be full AI-bots. Why? Maybe someone testing some product and looking at Github as a way to collect human feedback to further train their model.
  2. many seem to have an overdecorated Github account, often with links at their LinkedIn or other networking sites. In this case, they may simply be trying to bolster their Github account by getting a lot of code merged into major projects, in order to look like fruitful developers. I suspect that in some cases this may help getting job contracts if the recruiters aren't careful enough.
These are my suspicions. Both cases are bad for our community and these kind of people are doing a lot of damage to open source communities.
 

Peter Stahlecker

unread,
Dec 17, 2025, 2:23:07 PM (2 days ago) Dec 17
to sympy
Your point 2. makes emminent sense to me, more so since over 95% of these PRs seem to come from India, where such statistics are considered to be important. (I was in India over 100 times in my job as a salesman before I retired).
My worry is that key people, you, Oscar, Jason, others get tired of these AI - PRs and stop taking care of sympy - which would be the end of sympy.
NB: I am too old and too ignorant to ever push a PR to sympy, but i enjoy the community a lot.

Oscar Benjamin

unread,
Dec 17, 2025, 2:53:01 PM (2 days ago) Dec 17
to sy...@googlegroups.com
On Wed, 17 Dec 2025 at 17:53, Francesco Bonazzi <franz....@gmail.com> wrote:
>
> On Wednesday, December 17, 2025 at 4:44:38 p.m. UTC+1 peter.st...@gmail.com wrote:
>
> What I do not understand is this: why would anybody want to push a PR which he does not understand?
> This seems to take out all the fun.
>
> This is a good question indeed. I have some theories:
>
> People opening PR using AI seem to also chat using AI. I suspect some of the might be full AI-bots. Why? Maybe someone testing some product and looking at Github as a way to collect human feedback to further train their model.
> many seem to have an overdecorated Github account, often with links at their LinkedIn or other networking sites. In this case, they may simply be trying to bolster their Github account by getting a lot of code merged into major projects, in order to look like fruitful developers. I suspect that in some cases this may help getting job contracts if the recruiters aren't careful enough.

I don't think that these are AI bots. They are humans who are using AI
for everything including writing the code and writing comments and
things.

The reason for doing this is the google summer of code (GSOC)
programme. SymPy enters that programme every year and a few people
(usually students) will do projects where they get paid by Google to
work on something in SymPy. This is what they want on their CV.
SymPy's rules are that someone has to have a PR merged to be
considered for GSOC so every year at this time large numbers of people
turn up and start opening PRs, many of which are low quality.

This is not anything new but the difference now is that it is much
easier to open a PR as I said above:

> I think what has happened is that the combination of user-friendly
editors with easy git/GitHub integration and LLM agent plugins has
brought us to the point where there are pretty much no technical
barriers preventing someone from opening up gibberish spam PRs while
having no real idea what they are doing.

Previously there were some barriers like you can't edit the code
without first looking at the code or you have to figure out git etc.
These barriers had two effects:

- They would filter out many PRs before they even existed.
- They required greater effort so that the person opening the PR was
forced to think more about what they were doing resulting in a better
PR.

Removing those barriers means having more PRs of a lower quality. AI
in particular makes it possible to reduce the time taken to generate a
low quality PR massively and the effect of this is obvious if you look
at how quickly some people are opening multiple PRs in succession.

--
Oscar

Oscar Benjamin

unread,
Dec 17, 2025, 3:26:27 PM (2 days ago) Dec 17
to sy...@googlegroups.com
On Wed, 17 Dec 2025 at 19:52, Oscar Benjamin <oscar.j....@gmail.com> wrote:
>
> I don't think that these are AI bots. They are humans who are using AI
> for everything including writing the code and writing comments and
> things.
>
> The reason for doing this is the google summer of code (GSOC)
> programme.

Maybe there should be something about this in the GSOC rules/guidance.
The reality is that when it comes to ranking candidates for GSOC
spammy AI stuff is going to go in the bin so if someone is opening AI
PRs now in the hope of getting accepted for GSOC then I think they
have misunderstood the situation. We should probably make that clear
at the outset to anyone thinking of applying.

--
Oscar

Francesco Bonazzi

unread,
3:49 AM (15 hours ago) 3:49 AM
to sympy
On Wednesday, December 17, 2025 at 8:53:01 p.m. UTC+1 Oscar wrote:
I don't think that these are AI bots. They are humans who are using AI
for everything including writing the code and writing comments and
things.

Let's do a simple test. Instead of commenting these PRs by typing text in, let's just attach an image containing the comment. This is no problem for human beings, but I expect AI-bots to fail in understanding the comment, unless they are connected with an OCR or use vision-language models.
 
The reason for doing this is the google summer of code (GSOC)
programme. SymPy enters that programme every year and a few people
(usually students) will do projects where they get paid by Google to
work on something in SymPy. This is what they want on their CV.
SymPy's rules are that someone has to have a PR merged to be
considered for GSOC so every year at this time large numbers of people
turn up and start opening PRs, many of which are low quality.


SymPy isn't the only project that's being spammed by AI-generated PRs. Apparently this problem is quite common.

Let's keep an eye on the preventive measures that other projects are taking.

Ralf Schlatterbeck

unread,
4:00 AM (15 hours ago) 4:00 AM
to sy...@googlegroups.com
On Fri, Dec 19, 2025 at 12:49:36AM -0800, Francesco Bonazzi wrote:
>
> Let's do a simple test. Instead of commenting these PRs by typing text in,
> let's just attach an image containing the comment. This is no problem for
> human beings, but I expect AI-bots to fail in understanding the comment,
> unless they are connected with an OCR or use vision-language models.

I've successfully uploaded photographed pages from a book describing
contrapunct (music) rules and asked Claude (the AI from anthropic.com)
to write code for a project. So OCR is not a test for an AI these days.
Many AIs have OCR built in.

Ralf
--
Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16
Open Source Consulting www: www.runtux.com
Reichergasse 131, A-3411 Weidling email: off...@runtux.com
Reply all
Reply to author
Forward
0 new messages