On Fri, 6 Mar 2026 at 08:58, <
peter.st...@gmail.com> wrote:
>
> I participated in that video conference yesterday.
>
> There were about 85 participants, I assume mostly mentors. Stephanie from Google knew a number of them by name.
Thanks, I wanted to join that meeting but was busy with work at the time.
For the benefit of others on the mailing list this was a meeting
related to the GSOC programme with people who are GSOC admin/mentors
and with Stephanie who runs GSOC from Google. The purpose of the
meeting was to discuss the problem that many GSOC-affiliated projects
have been having with large numbers of often AI-assisted
PRs/candidates.
> All complained about the onslaught of low quality PRs generated by, or at least with the help of, AI.
>
> Stephanie was quite strict about AI: Do not accept it.
>
> None of the participants so far has had a good PR generated by AI.
I have seen good PRs that are generated by AI. The difficulty here is
that when someone uses AI and produces something good it is often not
possible to tell that they have used AI. When it is bad then you can
see all the AI flaws. Using AI in a good way means fixing all of those
problems before you show it to someone else.
> One idea was to limit the number of PRs for GSoC. Maybe a ‘one shot policy’ would force applicants to think long and hard before submitting a PR.
I'm not sure if a one shot policy is the right approach. What I am
sure of is that we need to change something about the basic model of
sympy development. The idea that anyone can open a PR for anything and
then maintainers try to review all the PRs has been unsustainable for
sympy for a long time but is now just absurdly broken. There need to
be some limits around who can open PRs, how many PRs, what the PRs are
for and probably also some limits about issues as well.
I think now that talking about AI a lot is perhaps missing the problem
in the context of sympy development. There are some issues that are
unique to AI usage but fundamentally the problem is just numbers.
Various factors including AI but also other things like user-friendly
editors with git integration, many online guides and so on have
reduced the barrier to opening a PR so much that large numbers of
people can do it easily. If you couple that with a strong motivating
factor like GSOC then we end up with too many people and too many PRs
and it is not possible for us to handle those PRs.
The economics of open-to-all open-source collaboration depend on there
being some barriers of effort and technical capability that prevent
most people from being able to reach the point of opening a PR in the
first place. It is entirely understandable that people would try to
reduce those barriers but that undermines the foundations of the
effort-based economy that makes open collaboration work.
If there were small numbers of these AI PRs then it would be
reasonable to have a discussion with each person and some of those
discussions could turn it into a more productive direction. The
problem is the numbers though: it isn't possible to have a discussion
with each person. We see this with all the GSOC announcements here on
the mailing list. People post their ideas and questions and then no
one answers them because there are just too many for anyone to respond
to without being very selective.
It is not possible for us to just tell people not to use AI because
they will just do it anyway. That is just a fact that we have to
accept about the future. I found this paper interesting:
https://arxiv.org/abs/2601.20245
The paper is from two Anthropic (an AI company) employees. They did a
randomised control trial where they recruited professional
programmers, divided them into an AI group and a no-AI group and asked
them to do a programming task with or without AI. After the task they
asked them questions to see how well they answered the questions.
The results/conclusions of the study are all fairly obvious so the
only point of interest there is that it is coming from an AI company
but they did not find any meaningful benefit in their metrics from
using AI. What I found interesting is the paper's description of their
pilot studies before the main study.
In the first pilot study they recruited 39 people, split them into AI
and no-AI groups and asked them to do a task. They estimated that one
third of the people in the no-AI group used AI even though the whole
point of the study was to see what would happen if they spent 30
minutes doing a task *without* using AI. For the second pilot study
they tried to improve the instructions to make it clearer that "if you
are in the no-AI group then don't use AI because that is the whole
point" but then they estimated that one quarter of the people in the
no-AI group still used AI. For the other pilots and the main study
they used a different software platform that could do full screen
reading to limit use of AI but I guess it is still unclear if people
used AI on different devices.
There was no material incentive in the study to cheat and use AI. The
participants were going to be paid $150 for their time and the process
would take 1 hour either way. There was no reward for being faster or
slower or for producing better or worse code. It wasn't an exam or a
job interview. The purpose of the study was just to measure what
happens if people do a task with or without AI but (at least) a third
of the people in the no-AI group decided that they would just use AI
anyway even though that was the exact thing that they were being paid
not to do.
I think what this means is that it is just impossible to tell people
not to use AI. It is still surprising to me though how many people
would be dishonest about this to the extent that one third of people
would straight up lie even though it falsifies the results of and
defeats the entire purpose of the study they are being paid to
participate in.
--
Oscar