[The IPKat] [Guest post] What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000

67 views

Skip to first unread message

Eleonora Rosati

unread,

Sep 27, 2025, 7:29:58 AM (8 days ago) Sep 27

to ipkat_...@googlegroups.com

[Guest post] What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000

Eleonora Rosati Saturday, September 27, 2025 - AI, AI training, Anthropic, Artificial Intelligence, Claude, copyright, Georgia Jenkins, Guest post, USA

The IPKat has received and is pleased to host the following post by Katfriend Georgia Jenkins (University of Liverpool) on the recent Anthropic settlement. Here’s what Georgia writes:

What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000

by Georgia Jenkins

Anthropic's Claude ...

As seasons change, it strikes this Katfriend as a good time to reflect on what she coins “AI copyright summer”. In the UK we saw Baroness Kidron's powerful speech on AI and creative industries, and more recently, 70+ artists demanding Prime Minister Keir Starmer to better protect copyright works against the threat of AI. While in 2024 brat summer, alongside a viral dance, influenced brand marketing and even a presidential campaign, AI copyright summer has been more chaotic. By the end of August 2025 there were 40+ copyright-related claims against AI companies in the US, one of which being Bartz v. Anthropic.

It is Bartz that has sent copyright into overdrive, particularly the colourful nuggets of Judge Alsup – for example: “If Anthropic loses big it will be because what it did wrong was also big.”

In August 2024, authors Andrea Batz, Charles Graeber and Kirk Wallace Johnson commenced copyright infringement proceedings against Anthropic, a software firm that offers an AI software service called Claude. The case hinged on Anthropic’s unauthorised use of pirated and purchased copies of books to create a central library for the purpose of training the large language models (LLMs) that underly Claude. The library comprised both ‘traditional’ copies and versions called ‘data mixes’ which optimise training data and improve LLM performance.

At the height of the summer, Anthropic moved for summary judgement citing fair use in relation to the pirated and purchased copies used for training LLMs and creating a permanent library.

Training copies

The authors’ first argument turned on the similarity of training LLMs to the creative process. They argued that its intention was to memorize their works’ creative elements or, put differently, to train it to read and write. In short, an inherently human process that ‘should’ fall outside of the first factor of fair use, purpose and character. In stark contrast, Judge Alsup found that training is a ‘quintessentially transformative’ use, stating that: “The technology at issue was amongst the most transformative many of us will see in our lifetimes.”

Even if this was transformative, the authors argued that Anthropic engaged in extensive copying that was not strictly necessary. And while entire books were copied, Judge Alsup found that the training copies differ from the work’s ordinary use (e.g. reading). Additionally, although Anthropic demonstrated that it could have used a smaller set of books, in terms of output no portion of the works were exposed to the public. Though this process generates potentially competing works and could foreclose future licensing opportunities for authors, Judge Alsup pointed toward competition not being a justification for copyright.

The central library

Anthropic’s library comprises digital copies of lawfully acquired print books, pirated digital books, and copies of each through data mixing:

1. Purchased library copies

Here the authors complained that Anthropic “destructively” changed the format from print to digital. However, as they destroyed the print versions, there were no new copies, and this process also eased storage and enabled searchability. This echoed cases like Google Books, Sony and Napster which affirm digitization as falling outside remit of the copyright holder’s interest, at least in certain specific cases. There was no issue of the amount taken as format shifting required the whole work.

2. Pirated library copies

Unsurprisingly, the pirated copies did not benefit from Anthropic’s argument that they had future potential to train LLMs. This is something that Anthropic confusingly also hinted, stating that:

You can’t just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want. That would destroy the academic publishing market if that were the case.

... and Claude Kat

It is unlikely that these copies could be saved by fair use, particularly when it remains uncertain whether they would be used to train LLMs in the future. Further Anthropic’s aim, in the words of Judge Alsup, to acquire “all the books in the world” and keep them even if they decide not to make copies for training, directly impacts the works’ value and would likely destroy the “entire publishing market”.

A (brat-ish) outcome

The case hung on the pirated copies and copies that were not (yet?) used to train LLMs. The former were not transformative, and the latter was inconclusive due to lack of evidence. Both would require a trial.

But in a twist worth of deeming “AI copyright autumn”, Anthropic soon agreed to pay at least $USD 1.5 billion plus interest to authors (now a class action). As there are approximately 500,000 works, it amounts to $USD 3,000 per work. They’ve agreed to destroy pirated datasets and, in exchange, they avoid litigation relating to conduct up to 25 August 2025.

Not one to be left out, Judge Alsup commented that he was “disappointed that counsel left important questions to be answered in the future”, and ended up postponing the class settlement and ordering parties to address 34 questions (here and here) relating to the settlement. Many questions centre upon unpacking the approach to the settlement, particularly multiple claim scenarios (authors and/or publishers) and the potential "gamesmanship” of the process.

The parties’ joint response swayed Judge Alsup as two weeks later he reportedly approved the settlement. Described as “the largest copyright recovery of all time”, anyone can register for the class action here if they believe that Anthropic may have downloaded their books from the pirated sources. However, potential members must have had their book downloaded before August 2022, have an ISBN or ASIN, and registered with the US Copyright Office before the book was downloaded.

Some have already speculated what authors will eventually receive, post legal fees, due to the narrow qualifying criteria and multiple claim scenarios:

[T]raditionally published authors might see around $1,000-1,500 per book. Self-published authors who own their rights would keep more. Academic authors or others who signed away their rights might get nothing.

While this saga underlines the importance of licensing as a departure point from sticky copyright questions, one can’t help but think we have been launched into more chaos. It is worth highlighting that Anthropic is classed as a startup whose valuation has steadily increased following Claude’s release in March 2023 and which is backed by Amazon. It also raised $USD 13 billion in funding at a $USD 183 billion post-money valuation while nutting out the details of the settlement.

Bartz evidences a turning point in a post-AI world for copyright enthusiasts but, for Anthropic, alongside its supporters and competitors, perhaps this is simply the cost of doing business. Some have quoted former Google CEO Eric Schmidt’s comment last year that:

[I] f your product takes off, you “hire a whole bunch of lawyers to go clean the mess up,” because “if nobody uses your product, it doesn’t matter that you stole all the content.

But, for this Katfriend, Mark Zuckerberg’s comments to “move fast and break things” seems more appropriate. Only the thing that has been broken is the social and cultural value of human creativity, priced at $USD 3,000 per work (and only for copying pirated books).

So, Judge Alsup spoke for many when he stated while adjourning the hearing: “I’ve learned a lot.”

Do you want to reuse the IPKat content? Please refer to our 'Policies' section. If you have any queries or requests for permission, please get in touch with the IPKat team.

Reply all

Reply to author

Forward

0 new messages