AI generated pull requests

307 views
Skip to first unread message

Oscar Benjamin

unread,
Oct 25, 2025, 6:46:10 PMOct 25
to sympy
Hi all,

I am increasingly seeing pull requests in the SymPy repo that were
written by AI e.g. something like Claude Code or ChatGPT etc. I don't
think that any of these PRs are written by actual AI bots but rather
that they are "written" by contributors who are using AI tooling.

There are two separate categories:

- Some contributors are making reasonable changes to the code and then
using LLMs to write things like the PR description or comments on
issues.
- Some contributors are basically just vibe coding by having an LLM
write all the code for them and then opening PRs usually with very
obvious problems.

In the first case some people use LLMs to write things like PR
descriptions because English is not their first language. I can
understand this and I think it is definitely possible to do this with
LLMs in a way that is fine but it needs to amount to using them like
Google Translate rather than asking them to write the text. The
problems are that:

- LLM summaries for something like a PR are too verbose and include
lots of irrelevant information making it harder to see what the actual
point is.
- LLMs often include information that is just false such as "fixes
issue #12345" when the issue is not fixed.

I think some people are doing this in a way that is not good and I
would prefer for them to just write in broken English or use Google
Translate or something but I don't see this as a major problem.

For the vibe coding case I think that there is a real problem. Many
SymPy contributors are novices at programming and are nowhere near
experienced enough to be able to turn vibe coding into outputs that
can be included in the codebase. This means that there are just spammy
PRs with false claims about what they do like "fixes X", "10x faster"
etc where the code has not even been lightly tested and clearly does
not work or possibly does not even do anything.

I think what has happened is that the combination of user-friendly
editors with easy git/GitHub integration and LLM agent plugins has
brought us to the point where there are pretty much no technical
barriers preventing someone from opening up gibberish spam PRs while
having no real idea what they are doing.

Really this is just inexperienced people using the tools badly which
is not new. Low quality spammy PRs are not new either. There are some
significant differences though:

- I think that the number of low quality PRs is going to explode. It
was already bad last year in the run up to GSOC (January to March
time) and I think it will be much worse this year.
- I don't think that it is reasonable to give meaningful feedback on
PRs where this happens because the contributor has not spent any time
studying the code that they are changing and any feedback is just
going to be fed into an LLM.

I'm not sure what we can do about this so for now I am regularly
closing low quality PRs without much feedback but some contributors
will just go on to open up new PRs. The "anyone can submit a PR model"
has been under threat for some time but I worry that the whole idea is
going to become unsustainable.

In the context of the Russia-Ukraine war I have often seen references
to the "cost-exchange problem". This refers to the fact that while
both sides have a lot of anti-air defence capability they can still be
overrun by cheap drones because million dollar interceptor missiles
are just too expensive to be used against any large number of incoming
thousand dollar drones. The solution there would be to have some kind
of cheap interceptor like an automatic AA gun that can take out many
cheap drones efficiently even if much less effective against fancier
targets like enemy planes.

The first time I heard about ChatGPT was when I got an email from
StackOverflow saying that any use of ChatGPT was banned. Looking into
it the reason given was that it was just too easy to generate
superficially reasonable text that was low quality spam and then too
much effort for real humans to filter that spam out manually. In other
words bad/incorrect answers were nothing new but large numbers of
inexperienced people using ChatGPT had ruined the cost-exchange ratio
of filtering them out.

I think in the case of SymPy pull requests there is an analogous
"effort-exchange problem". The effort PR reviewers put in to help with
PRs is not reasonable if the author of the PR is not putting in a lot
more effort themselves because there are many times more people trying
to author PRs than review them. I don't think that it can be
sustainable in the face of this spam to review PRs in the same way as
if they had been written by humans who are at least trying to
understand what they are doing (and therefore learning from feedback).
Even just closing PRs and not giving any feedback needs to become more
efficient somehow.

We need some sort of clear guidance or policy on the use of AI that
sets clear explanations like "you still need to understand the code".
I think we will also need to ban people for spam if they are doing
things like opening AI-generated PRs without even testing the code.
The hype that is spun by AI companies probably has many novice
programmers believing that it actually is reasonable to behave like
this but it really is not and that needs to be clearly stated
somewhere. I don't think any of this is malicious but I think that it
has the potential to become very harmful to open source projects.

The situation right now is not so bad but if you project forwards a
bit to when the repo gets a lot busier after Christmas I think this is
going to be a big problem and I think it will only get worse in future
years as well.

It is very unfortunate that right now AI is being used in all the
wrong places. It can do a student's homework because it knows the
answers to all the standard homework problems but it can't do the more
complicated more realistic things and then students haven't learned
anything from doing their homework. In the context of SymPy it would
be so much more useful to have AI doing other things like reviewing
the code, finding bugs, etc rather than helping novices to get a PR
merged without actually investing the time to learn anything from the
process.

--
Oscar

gu...@uwosh.edu

unread,
Oct 25, 2025, 9:06:55 PMOct 25
to sympy
Here's a brainstorming idea for how to implement something to address your valid concerns.
How about the following policy?
No review of a pull request will occur unless it meets certain minimum requirements:
1) It passes all pre-existing tests;
2) It includes test coverage for all new code;
3) It includes tests covering any bug fixes.

I can see how to implement #1 automatically. Could #2 be implemented using one of the coverage testing tools? My experience with those is limited. It also would require some work to make sure new tests cover all changed code. I think this would clear out a lot of the very low quality, doesn't work or does nothing code. However, I see a couple of problems as well:
1) What happens if the bug fix is to an erroneous test?
2) This does not address low quality descriptions of the PR and its goals.
3) People who are just learning the code base will need a way to get help on running and fixing issues with testing. I think contributors might have to be in the position of asking for help on this list with issues of that sort or maybe there should be a specific venue for that.

Just some ideas to help start the ball rolling.

Jonathan

Jason Moore

unread,
Oct 26, 2025, 2:16:04 AMOct 26
to sy...@googlegroups.com
Hi Oscar,

Thanks for raising this. I agree, this problem will grow and it is not good. I think we should have a policy about LLM generated contributions. It would be nice if a SYMPEP was drafted for one.

Having a standard way to reject spam PRs would be helpful. If we could close a PR and add a label to trigger sympybot to leave a comment that says "This PR does not meet SymPy's quality standards for AI generated code and comments, see policy <link>" could be helpful. It still requires manual steps from reviewers.

I also share the general concern expressed by some in the scipy ecosystem here:


which is that LLMs universally violate copyright licenses of open source code. If this is true, then PRs with LLM generated code are polluting SymPy's codebase with copyright violations.

Jason


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxQ1ntG0EWBGihrXErLhGuABHH7Kt5RmGJvp9bHcqaC5%3DQ%40mail.gmail.com.

Jason Moore

unread,
Oct 26, 2025, 2:27:53 AMOct 26
to sy...@googlegroups.com

Oscar Benjamin

unread,
Oct 26, 2025, 3:24:40 PMOct 26
to sy...@googlegroups.com
On Sun, 26 Oct 2025 at 01:06, 'gu...@uwosh.edu' via sympy
<sy...@googlegroups.com> wrote:
>
> Here's a brainstorming idea for how to implement something to address your valid concerns.
> How about the following policy?
> No review of a pull request will occur unless it meets certain minimum requirements:
> 1) It passes all pre-existing tests;
> 2) It includes test coverage for all new code;
> 3) It includes tests covering any bug fixes.

I think that what will happen is that the author will pass these
instructions to the LLM agent and then the agent will generate some
code that superficially resembles meeting these criteria. Then the PR
description will have a bunch of emoji-filled bullet points
redundantly stating that it meets those criteria.

I'm not going to point to the specific PR but I closed one where the
description had a statement in it like

"You can run the tests with `pytest sympy/foo/bar`"

That is literally an instruction from the LLM to the user for how they
can test the generated code and if you actually run the test command
it clearly shows that the code doesn't work. It was still submitted in
that form as a spam PR though.

Of course the tests in CI did not pass and it is not hard to see the
problem in that case but other cases can be more subtle than this. It
is not hard to generate code that passes all existing tests, includes
coverage etc while still being gibberish and this is really the
problem with using LLMs. There isn't any substitute in this situation
for actual humans doing real thinking.

--
Oscar

Oscar Benjamin

unread,
Oct 26, 2025, 3:30:05 PMOct 26
to sy...@googlegroups.com
Yes, the copyright is a big problem. I don't think I would say that
LLMs universally violate copyright e.g. if used for autocompleting an
obvious line of code or many other tasks. There are certain basic
things like x += 1 that cannot reasonably be considered to be under
copyright even if they do appear in much code. Clearly though an LLM
can produce a large body of code where the only meaningful
interpretation is that the code has been "copied" from one or two
publicly available codebases.

The main difficulty I think with having a policy about the use of LLMs
is that unless it begins by saying "no LLMs" then it somehow needs to
begin by acknowledging what a reasonable use can be which means
confronting the copyright issue up front.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAP7f1AhXNE-UapwEm1bQW9de%3Di%2BWixFen5sTp8MsCMScsqA-%3Dg%40mail.gmail.com.

Aaron Meurer

unread,
Oct 30, 2025, 2:08:12 PMOct 30
to sy...@googlegroups.com
I like the Ghostty policy, which is that AI coding assistance is
allowed, but it must be disclosed
https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md#ai-assistance-notice.
It should also be our policy that the person submitting the code is
ultimately responsible for it, regardless of what tools were used to
create it.

I think it would be a mistake to ban AI usage entirely because AI can
be very useful if used properly, i.e., you review the code it writes
before submitting it.

For me the copyright question doesn't really pass the smell test, at
least for the majority of the use-cases where I would use AI in SymPy.
For example, if I use AI to generate some fix for some part of SymPy,
say the polynomials module, then where would that fix have "come from"
for it to be a copyright violation? Where else in the world is there
code that looks like the SymPy polynomials module? Most code in SymPy
is very unique to SymPy. The only place it could have possibly come
from is SymPy itself, but if SymPy already had it then the code
wouldn't be needed in the first place (and anyways that wouldn't be a
copyright violation). I think there's a misconception that LLMs can
only generate text that they've already seen before, and if you
believe that misconception then it would be easy to believe that
everything generated by an LLM is a copyright violation. But this is
something that is very easily seen to not be true if you spend any
amount of time using coding tools.

As for PR descriptions, I agree those should always be hand-written.
But that's always been a battle, even before AI. And similarly almost
no one writes real commit messages anymore.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxSW_5u4Qvj5kddZUQzzNdkteTZ5GJX46D_c3Gko87Dj%2Bg%40mail.gmail.com.

Jason Moore

unread,
Oct 30, 2025, 2:50:55 PMOct 30
to sy...@googlegroups.com
I don't think it is a terrible idea to simply have a "no LLMs" policy at this point in time. We can always change it in the future as things get clearer. People will still use them in their LLM enhanced editors, of course, and we can never detect the basic uses of the tools. But if people submit large chunks of text and code that have hallmarks of full generation from an LLM, then we can reject and point to the policy.

As for the smell test and misconceptions about what an LLM can produce, this may depend on whether you think only a literal copy of something violates copyright or if also a derivative of something violates copyright. I think the essential question lies in whether the code a LLM produces is a derivative of copyrighted code. There are many past court cases ruling that derivatives are copyright violations in the US and the OSS licenses almost all state that derivatives fall under the license. I doubt the LLM can produce a fix to the polynomials module if the only training data was the polynomials module. An LLM relies entirely on training on a vast corpus of works and generating code from all of that large body. Now, is that output then a derivative of one, some, or all of the training data? That is to be determined by those that rule on laws (hopefully). Given that we have spent about 40 years collectively trying to protect open source code with copyright licenses, it seems terribly wrong that if you can make your copy source large enough that you no longer have to abide by the licenses.

Paul Ivanov and Matthew Brett have done a good job explaining this nuance here: https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md

My personal opinion is that the LLMs should honor the licenses of the training set and if they did, then all is good. I have no idea how they can solve that from a technical perspective, but the companies are simply ignoring copyright and claiming they are above such laws and that all that they do is fair use. We plebes do not get that same ruling.

Jason


Francesco Bonazzi

unread,
Nov 5, 2025, 5:50:23 PMNov 5
to sympy
Maybe it should be made mandatory to disclose any usage of LLM when opening PRs.

Banning usage of LLM completely is a bit extreme, but it may be necessary if vibe spammers keep flooding github with useless PRs.
 

Oscar Benjamin

unread,
Nov 12, 2025, 9:01:46 AMNov 12
to sy...@googlegroups.com
On Wed, 5 Nov 2025 at 22:50, Francesco Bonazzi <franz....@gmail.com> wrote:
>
> Maybe it should be made mandatory to disclose any usage of LLM when opening PRs.

There needs to be at the very least a policy that bans using LLMs
without saying that they were used. There should be checkboxes when
opening a PR:

[x] I have or have not used an LLM
[x] I have checked the code from the LLM
[x] I have tested the code myself
[x] I do/don't understand all of the code that was generated by the LLM.

> Banning usage of LLM completely is a bit extreme, but it may be necessary if vibe spammers keep flooding github with useless PRs.

Using LLMs to generate comments and PR descriptions should just be
banned outright with an exception for google translate type use only.
Obviously this cannot be enforced but the messaging needs to be clear:
don't dump LLM output as if it is a comment from yourself.

As for use of LLMs for writing code I don't necessarily object to
simple autocomplete type LLMs for convenience but I think if you look
at typical PRs and contributors right now they really should not be
using LLMs at all. The LLMs only seem to help people to make
time-wasting vibe-code PRs where the "author" has no understanding of
the code they are submitting and has not even done basic testing.

--
Oscar

Daiki Takahashi

unread,
Nov 13, 2025, 7:03:52 AMNov 13
to sympy
Let me make this clear upfront: all of my posts on GitHub, including this one, rely on translation by an LLM.

I believe it would be reasonable to explicitly state in the policy that spam-like PRs and PRs relying heavily on LLMs are prohibited.
Along with that, the policy should also clarify that such PRs may be proactively closed without prior notice,
and that there should be a clear process for appealing an incorrect closure.

To reduce the review burden, one possible approach would be to require all PRs to undergo an initial review by Copilot before human review.
However, I am not sure how capable Copilot actually is.

2025年11月12日水曜日 23:01:46 UTC+9 Oscar:

Oscar Benjamin

unread,
Nov 13, 2025, 8:36:30 AMNov 13
to sy...@googlegroups.com
On Thu, 13 Nov 2025 at 12:03, Daiki Takahashi <har...@gmail.com> wrote:
>
> Let me make this clear upfront: all of my posts on GitHub, including this one, rely on translation by an LLM.
>
> I believe it would be reasonable to explicitly state in the policy that spam-like PRs and PRs relying heavily on LLMs are prohibited.

It is very difficult to define what is meant by "spam-like" and I
doubt that someone submitting a PR would understand this in the same
way as reviewers would.

There are different ways of using LLMs and the way that you use them
is absolutely fine. The way that many novice contributors use them is
not useful at all though and at least right now is harmful to sympy
development. I'm not sure how to define the difference between those
in a policy though.

> Along with that, the policy should also clarify that such PRs may be proactively closed without prior notice,
> and that there should be a clear process for appealing an incorrect closure.

Realistically I think in some cases this is the only option. Just
deciding to close them is still a burden though.

> To reduce the review burden, one possible approach would be to require all PRs to undergo an initial review by Copilot before human review.
> However, I am not sure how capable Copilot actually is.

I don't know about Copilot specifically and actually there are many
things called "copilot". I have used "GitHub Copilot" which is an
editor plugin for autocomplete but now there is a "Copilot" button on
the GitHub website that is something different (more like ChatGPT).
Does anyone have any experience of using that?

I see better potential in using AI to help out with reviewing PRs than
having people use AI to write the PRs. Many PRs need quite simple
feedback like "this should have tests. Please add a test in file f"
that could easily be handled by AI (and probably in a more patient,
friendly and helpful way than feedback from human reviewers such as
myself).

Somewhere someone suggested using CodeRabbit which I have seen on some
other repos. I haven't seen it produce anything useful but supposedly
it gets better if you "teach" it.

--
Oscar

Daiki Takahashi

unread,
Nov 14, 2025, 9:24:50 AMNov 14
to sympy
I created a PR and tried having GitHub Copilot review it as a test.
We can request a Copilot review directly from the PR page.
However, it requires Premium requests, so not everyone can use this feature.

I'm not sure how useful it really is, but it’s certainly convenient to try.


document:
-- 
haru-44

2025年11月13日木曜日 22:36:30 UTC+9 Oscar:
Message has been deleted

Francesco Bonazzi

unread,
Nov 14, 2025, 2:15:51 PMNov 14
to sympy
Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.

Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.

On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
However, it requires Premium requests, so not everyone can use this feature.

 Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.

LLMs look good at first because most questions had answers in their training set, as soon as you ask an LLM to do anything non-standard or just fix existing code in a way that is not trivial, they fail miserably.

Aaron Meurer

unread,
Nov 15, 2025, 1:06:56 PMNov 15
to sy...@googlegroups.com
On Fri, Nov 14, 2025 at 12:15 PM Francesco Bonazzi
<franz....@gmail.com> wrote:
>
> Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.
>
> Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.

"Finishing the Risch algorithm" is an enormous task. Of course an LLM
cannot one-shot that, and a human couldn't do it in one PR either. But
I have actually been using Claude Code to do some improvements to the
Risch algorithm and it's been working. So the statement that an LLM
cannot help with algorithmic improvements in SymPy is false.

>
> On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
>
> However, it requires Premium requests, so not everyone can use this feature.
>
>
> Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.

I can't speak to the specific tool being mentioned here, but the best
LLMs right now do require you to pay for them. If a tool is good and
actually improves developer quality of life, we shouldn't be afraid to
pay for it (that applies even outside of AI tools).

FWIW, when it comes to code review, my suggestion would be to use a
local LLM tool like claude code or codex to do the review for you. It
wouldn't be completely automated, but that would give you the best
results. I also agree that writing down the sorts of things you're
looking for in a SymPy review somewhere in the context is going to
make it work better. I would start by having an agent analyze recent
reviews (say, the 50 most recent PR reviews by Oscar), and use that to
construct a Markdown document listing the sorts of things that a good
reviewer is looking for in a review of a SymPy pull request.

>
> LLMs look good at first because most questions had answers in their training set, as soon as you ask an LLM to do anything non-standard or just fix existing code in a way that is not trivial, they fail miserably.

This was true three years ago with GPT 3 but it is not true anymore. I
encourage you to try using GPT-5 codex or Claude 4.5 Sonnet, ideally
using a modern tool like codex CLI, claude code, or Cursor. These
models are very good and can reason about problems they've never seen
before. They still have holes and you have to check everything they do
still, but you can't just assume that something isn't going to work
without trying it.

Even if you have moral qualms against AI (which I personally do not
share), you shouldn't let those give you the wrong impression about
the capabilities of these tools, especially the best-in-class models
like Claude.

Aaron Meurer

>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/3c225245-11be-49d4-8f50-f7c2f0010c44n%40googlegroups.com.

Oscar Benjamin

unread,
Nov 15, 2025, 2:25:03 PMNov 15
to sy...@googlegroups.com
On Sat, 15 Nov 2025 at 18:06, Aaron Meurer <asme...@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 12:15 PM Francesco Bonazzi
> <franz....@gmail.com> wrote:
> >
> > Let's remember that LLMs may write copyrighted material. There is some risk associated with copy-pasting from an LLM output into SymPy code.
> >
> > Furthermore, what practical algorithmic improvements can an LLM do to SymPy? Can an LLM finish the implementation of the Risch algorithm? I doubt it.
>
> "Finishing the Risch algorithm" is an enormous task. Of course an LLM
> cannot one-shot that, and a human couldn't do it in one PR either. But
> I have actually been using Claude Code to do some improvements to the
> Risch algorithm and it's been working. So the statement that an LLM
> cannot help with algorithmic improvements in SymPy is false.

There is a huge difference between someone who knows what they are
doing, knows the codebase, understands the algorithm etc using an LLM
and reviewing the results as compared to a typical new SymPy
contributor using an LLM to write the code for them. If you are
someone who could write or review the code without using an LLM then
using an LLM and checking the results is reasonable.

By the way your most recent PR is here Aaron:
https://github.com/sympy/sympy/pull/28464
The PR looks good to me apart from the one bit where you said "Claude
generated this fix for the geometry test failure. It would be good to
review". I reviewed it and decided that it looks like a hacky fix and
showed a counterexample of the type that would break that code. The
LLM output cannot be trusted and does not substitute for humans
investigating and writing the code.

From my perspective as a reviewer "this is LLM code. It would be good
to review" is asking reviewers to do what would usually be expected to
be done by the author of the PR in the first instance. That is only a
small example but it shows the broader problem that I think LLMs will
cause in sympy by shifting greater burden onto reviewers while making
it easier for authors to generate more and more PRs to review.

If you are new to programming in general and new to a particular
codebase and then use an LLM to generate code that you don't
understand then the results are not going to be good. There have
always been PRs where the author clearly does not understand the
codebase well or does not understand all of the implications of the
particular choices made in the code. Now though there are PRs where
the author has no idea why the LLM wrote any of the code that it did
and has not even done the most basic of testing and can only respond
to feedback by copying it into an LLM.

What I think people submitting these PRs right now don't realise is
that when I see the LLM-generated PR description or comments, or the
LLM-generated code that they probably don't understand, it removes all
motivation to review any of their PRs and removes any level of trust
that I might give them from the usual benefit of the doubt. I am
mentally blacklisting contributors based on what I consider acceptable
even if there is not an agreed general policy.

I refuse to review PRs from newer contributors if this is the way that
it is going to happen so each of these needs to be reviewed by someone
else or can join the ever growing pile of unreviewed PRs.

> > On Friday, November 14, 2025 at 3:24:50 p.m. UTC+1 har...@gmail.com wrote:
> >
> > However, it requires Premium requests, so not everyone can use this feature.
> >
> >
> > Most of these AI-assisted tools are designed to take money from developers. I would strongly advise against paying for these services.
>
> I can't speak to the specific tool being mentioned here, but the best
> LLMs right now do require you to pay for them. If a tool is good and
> actually improves developer quality of life, we shouldn't be afraid to
> pay for it (that applies even outside of AI tools).
>
> FWIW, when it comes to code review, my suggestion would be to use a
> local LLM tool like claude code or codex to do the review for you. It
> wouldn't be completely automated, but that would give you the best
> results. I also agree that writing down the sorts of things you're
> looking for in a SymPy review somewhere in the context is going to
> make it work better. I would start by having an agent analyze recent
> reviews (say, the 50 most recent PR reviews by Oscar), and use that to
> construct a Markdown document listing the sorts of things that a good
> reviewer is looking for in a review of a SymPy pull request.

I was thinking more that an LLM bot on GitHub could handle basic
things like how to write the .mailmap file or how to add a test or run
the tests, fix trailing whitespace, interpret CI output and so on. It
would be good to have things like some tooling to identify older
related PRs or issues and say "maybe this fixes gh-1234 so a test
could be added for that" and other things of that nature as well.

I don't actually want an LLM to review the real code changes but I can
see the value in having LLMs help with some of the tedious back and
forth so that a contributor gets rapid help and when a human reviewer
gets to it the PR is more likely in a state that is ready to merge.

--
Oscar

Daiki Takahashi

unread,
Nov 16, 2025, 12:25:48 AM (14 days ago) Nov 16
to sympy
> I was thinking more that an LLM bot on GitHub could handle basic
> things like how to write the .mailmap file or how to add a test or run
> the tests, fix trailing whitespace, interpret CI output and so on.

I was thinking along these lines as well. Thanks to the current automated checks in CI,
reviewers no longer need to spend time thinking about things that don't require human judgment.
For the time being, I'd like to see LLMs further expand that area -- handling more of the tasks that
can be automated so reviewers can focus on the parts that really matter.

-- 
haru-44

2025年11月16日日曜日 4:25:03 UTC+9 Oscar:

Francesco Bonazzi

unread,
Nov 16, 2025, 4:21:15 AM (13 days ago) Nov 16
to sympy
The best way to use LLMs is as smart lookups. They have been trained on many books containing details of known algorithms, so LLMs can be used to get some clues from their knowledge, which would otherwise require a lot of time dedicated to reading and finding the correct paragraph explaining what you need in a book.

I have tried using LLMs for coding, the main problems I see are:
  1. they keep hallucinating and making up non-existent APIs quite often, or sometimes even utterly wrong code,
  2. LLMs apparently have no clues on understanding and continuing the existing code... each time they keep rewriting a lot of stuff.
Querying an LLM for suggestions should be seen as some evolution of looking up on Stackoverflow. However, care should be taken, because, unlike answers on Stackoverflow that are human-verified, LLM answers may be just wrong.

If LLMs are used the right way, they may help with being more productive. Unfortunately, my fear is that many developers will start using them without proper supervision. Average number of lines of code written will very likely increase, but so will the number of bugs.
Reply all
Reply to author
Forward
0 new messages