GitHub Copilot investigation

6 views
Skip to first unread message

James Salsman

unread,
Oct 18, 2022, 9:55:54 AM10/18/22
to licenses-m...@creativecommons.org
Please see: https://githubcopilotinvestigation.com/

>... for­mer GitHub CEO Nat Fried­man claimed dur­ing the Copi­lot tech­ni­cal pre­view that “train­ing [machine-learn­ing] sys­tems on pub­lic data is fair use”.
>
> Well—is it? The answer isn’t a mat­ter of opin­ion; it’s a mat­ter of law. Nat­u­rally, Microsoft, OpenAI, and other researchers have been pro­mot­ing the fair-use argu­ment. Nat Fried­man fur­ther asserted that there is “jurispru­dence” on fair use that is “broadly relied upon by the machine[-]learn­ing com­mu­nity”. But Soft­ware Free­dom Con­ser­vancy dis­agreed, and pressed Microsoft for evi­dence to sup­port its posi­tion. Accord­ing to SFC direc­tor Bradley Kuhn—
>
> "[W]e inquired pri­vately with Fried­man and other Microsoft and GitHub rep­re­sen­ta­tives in June 2021, ask­ing for solid legal ref­er­ences for GitHub’s pub­lic legal posi­tions … They pro­vided none."
>
> Why couldn’t Microsoft pro­duce any legal author­ity for its posi­tion? Because SFC is cor­rect: there isn’t any. Though some courts have con­sid­ered related issues, there is no US case squarely resolv­ing the fair-use ram­i­fi­ca­tions of AI train­ing.
>
> Fur­ther­more, cases that turn on fair use bal­ance mul­ti­ple fac­tors. Even if a court ulti­mately rules that cer­tain kinds of AI train­ing are fair use—which seems pos­si­ble—it may also rule out oth­ers. As of today, we have no idea where Copi­lot or Codex sits on that spec­trum. Nei­ther does Microsoft nor OpenAI.

James Salsman

unread,
Nov 11, 2022, 6:29:21 PM11/11/22
to licenses-m...@creativecommons.org
The GitHub Copilot class action suit got filed; see the same link from
October below. Microsoft's competitors seem to share a similar willful
dismissal of licensing restrictions, e.g., "It currently is based on
open-source large language models trained on public data." --
https://docs.replit.com/ghostwriter/faq

I hope everyone can make it at least virtually to Aaron Swartz Day at
the Internet Archive. They still have tickets at
https://www.eventbrite.com/e/aaron-swartz-day-and-international-hackathon-tickets-453532256187
Kat Walsh and Lisa Rein are speaking about the past and future of CC
on Sunday at noon Pacific:
https://www.aaronswartzday.org/asd-2022-livestream/
(The organizers never got back to me on my talk proposal about how the
English Wikipedia's Economics article has never had any mention of
inequality, but I'm satisfied with their roster.)

Similarly for https://creativecommons.org/2022/10/05/join-us-to-celebrate-20-years-of-creative-commons/
next Thursday in San Francisco and online.

I see that CC is hiring fundraising staff, so if there's anything I or
anyone else can do to help as a volunteer, please let us know. I
recently started following
https://eval.ai/web/challenges/challenge-page/1866/overview and it
occured to me that CC could use the same system to crowdsource banner
ad text for their site with the same evaluation and leaderboard
system. Would anyone like to collaborate on support for that?

Best regards,
Jim Salsman

On Tue, Oct 18, 2022 at 6:55 AM James Salsman <jsal...@gmail.com> wrote:
>
> Please see: https://githubcopilotinvestigation.com/
>
> >... for­mer GitHdub CEO Nat Fried­man claimed dur­ing the Copi­lot tech­ni­cal pre­view that “train­ing [machine-learn­ing] sys­tems on pub­lic data is fair use”.

James Salsman

unread,
Nov 11, 2022, 9:14:09 PM11/11/22
to licenses-m...@creativecommons.org
P.S. I am hosting a virtual hackathon this weekend for anyone who
wants to work on fundraising banner crowdsourcing:
https://www.aaronswartzday.org/projects-to-hack-on-asd-2022/ whether
you can attend in person or not.

James Salsman

unread,
Nov 11, 2022, 9:15:02 PM11/11/22
to licenses-m...@creativecommons.org
Reply all
Reply to author
Forward
0 new messages