On Sun, Jan 26, 2025 at 1:40 AM Jason Moore <
moore...@gmail.com> wrote:
>
> Hi,
>
> I was browsing Paul Invanov's blog today and came across this article:
>
>
https://pirsquared.org/blog/numfocus-concerns.html
>
> We are part of NUMFOCUS, so I'd say it is important to at least be aware of this. I do not have an opinion yet myself, but wanted to share.
I love Paul, but I think that blog post is mostly FUD and these
concerns about the 501c6 are not something to be worried about. I had
many discussions with various people about this and related issues at
the NumFOCUS summit last year and I'm confident that everything is OK.
The 501c6 is more or less just a way for NumFOCUS to raise more money,
as it makes it easier for some types or organizations to give. But the
whole thing is being set up so that it does not affect the
relationship with the projects (like SymPy). Unfortunately the summit
was several months ago so I don't remember all the details, but maybe
some more details have been posted publicly since then. But the
biggest high level takeaway I had from the summit is that NumFOCUS
really does care about the open source projects and has their best
interests as community run projects at heart, and also that it is
probably the only fiscal sponsorship organization that fits that
description (i.e., moving away from NumFOCUS would be a bad idea).
If you're still concerned about this, I would suggest emailing Andy
Terrel about this (or maybe we can get him to respond here). He is on
the NumFOCUS board and also is a (from a long time ago) contributor to
SymPy.
I personally don't think LLM outputs violate OSS licenses. The closest
something might come to being an issue is if an LLM generated a
significant block of code that is verbatim copied from something else.
That's not only unlikely in general due to the way LLMs work, but it's
unlikely for SymPy because most code that would be written for SymPy
is not something that would already have appeared somewhere else.
At any rate, the ship has basically sailed on this. I would expect a
large fraction of SymPy contributors already make use of LLMs in some
form or other, whether it's using code completion from something like
GitHub copilot or prompting a tool like ChatGPT or Cursor to help
refactor or write a function. Frankly if you're not using LLMs at all
to help you code you should because they are very useful tools.
Looking at some other projects, scikit-image added "no ai
contributions" policy and they ended up having to remove it
https://github.com/scikit-image/scikit-image/pull/7429. scikit-learn
has a policy disallowing completely automated contributions
(contributions that have no human in the loop)
https://github.com/scikit-learn/scikit-learn/blob/main/doc/developers/contributing.rst#automated-contributions-policy.
I think that's a good policy, but also I don't know if it's something
we need to write down unless it starts to become an issue (has it?).
There's also, separately, the question of the quality of LLM generated
code. I think that we need to use the GitHub review process we have
always been using to ensure the SymPy code remains high quality
regardless of its source. This means the usual things: good, thorough
tests that check for correctness, readable code, avoiding various
antipatterns, etc. LLM generated code won't always fit these
parameters, especially if not prompted correctly.
I think the biggest concern here is contributors (especially newer
contributors) contributing code that exclusively comes from an LLM
without any thought from the contributor themselves. This is
especially likely from potential GSoC applicants. This we should
disallow, because LLMs are not good enough to do this right now, and
in the case of a GSoC applicant, it tells us nothing about their
coding ability. Basically, any contributor to SymPy should be
responsible for all the code they contribute. This especially makes it
harder to evaluate GSoC applicants, but that's unfortunately the world
we live in and we just need to learn how to evaluate people better
(happy to discuss ideas for this. Should we do video call interviews
with top GSoC applicants?)
Aaron Meurer
Aaron Meurer
>
> Jason
>
moorepants.info
> +01
530-601-9791
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
sympy+un...@googlegroups.com.
> To view this discussion visit
https://groups.google.com/d/msgid/sympy/CAP7f1AjsFmZv%2BZGB2RVH9%3DS4KcaR%2B%2B0QtG8hJ1hwKYKLOXg%3D9w%40mail.gmail.com.