The IPKat has received and is pleased to host the following post by Katfriend Georgia Jenkins (University of Liverpool) on the recent Anthropic settlement. Here’s what Georgia writes:
What is the value of a pirated book copied to train LLMs? Apparently only USD$ 3,000
by Georgia Jenkins
 |
Anthropic's Claude ...
|
As seasons change, it strikes this Katfriend as a good time to reflect on what she coins “AI copyright summer”. In the UK we saw
Baroness Kidron's powerful speech on AI and creative industries, and more recently,
70+ artists demanding Prime Minister Keir Starmer to better protect copyright works against the threat of AI. While in 2024
brat summer, alongside a
viral dance, influenced brand marketing and even a presidential campaign, AI copyright summer has been more chaotic. By the end of August 2025 there were
40+ copyright-related claims against AI companies in the US, one of which being
Bartz v. Anthropic.
In August 2024, authors Andrea Batz, Charles Graeber and Kirk Wallace Johnson commenced copyright infringement proceedings against Anthropic, a software firm that offers an AI software service called
Claude. The case hinged on Anthropic’s unauthorised use of pirated and purchased copies of books to create a central library for the purpose of training the large language models (LLMs) that underly Claude. The library comprised both ‘traditional’ copies and versions called ‘data mixes’ which
optimise training data and improve LLM performance.
At the height of the summer, Anthropic moved for summary judgement citing fair use in relation to the pirated and purchased copies used for training LLMs and creating a permanent library.
Training copies
The authors’ first argument turned on the similarity of training LLMs to the creative process. They argued that its intention was to memorize their works’ creative elements or, put differently, to train it to read and write. In short, an inherently human process that ‘should’ fall outside of the first factor of fair use, purpose and character. In stark contrast, Judge Alsup found that training is a ‘quintessentially transformative’ use,
stating that: “The technology at issue was amongst the most transformative many of us will see in our lifetimes.”
Even if this was transformative, the authors argued that Anthropic engaged in extensive copying that was not strictly necessary. And while entire books were copied, Judge Alsup found that the training copies differ from the work’s ordinary use (e.g. reading). Additionally, although Anthropic demonstrated that it could have used a smaller set of books, in terms of output no portion of the works were exposed to the public. Though this process generates potentially competing works and could foreclose future licensing opportunities for authors, Judge Alsup pointed toward competition not being a justification for copyright.
The central library
Anthropic’s library comprises digital copies of lawfully acquired print books, pirated digital books, and copies of each through data mixing:
1. Purchased library copies
Here the authors complained that Anthropic “destructively” changed the format from print to digital. However, as they destroyed the print versions, there were no new copies, and this process also eased storage and enabled searchability. This echoed cases like
Google Books,
Sony and
Napster which affirm digitization as falling outside remit of the copyright holder’s interest, at least in certain specific cases. There was no issue of the amount taken as format shifting required the whole work.
2. Pirated library copies
Unsurprisingly, the pirated copies did not benefit from Anthropic’s argument that they had future potential to train LLMs. This is something that Anthropic confusingly also hinted,
stating that:
You can’t just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want. That would destroy the academic publishing market if that were the case.
It is unlikely that these copies could be saved by fair use, particularly when it remains uncertain whether they would be used to train LLMs in the future. Further Anthropic’s aim, in the words of Judge Alsup, to acquire “all the books in the world” and keep them even if they decide not to make copies for training, directly impacts the works’ value and would likely destroy the “entire publishing market”.
A (brat-ish) outcome
The case hung on the pirated copies and copies that were not (yet?) used to train LLMs. The former were not transformative, and the latter was inconclusive due to lack of evidence. Both would require a trial.
But in a twist worth of deeming “AI copyright autumn”, Anthropic soon agreed to pay at least $USD 1.5 billion plus interest to authors (now a class action). As there are approximately 500,000 works, it amounts to $USD 3,000 per work. They’ve agreed to destroy pirated datasets and, in exchange, they avoid litigation relating to conduct up to 25 August 2025.
Not one to be left out, Judge Alsup
commented that he was “disappointed that counsel left important questions to be answered in the future”, and ended up postponing the class settlement and ordering parties to address 34 questions (
here and
here) relating to the settlement. Many questions centre upon unpacking the approach to the settlement, particularly multiple claim scenarios (authors and/or publishers) and the potential
"gamesmanship” of the process.
The parties’
joint response swayed Judge Alsup as two weeks later he
reportedly approved the settlement.
Described as “the largest copyright recovery of all time”, anyone can register for the class action
here if they believe that Anthropic may have downloaded their books from the pirated sources. However, potential members must have had their book downloaded before August 2022, have an ISBN or ASIN, and registered with the US Copyright Office before the book was downloaded.
Some have already
speculated what authors will eventually receive, post legal fees, due to the narrow qualifying criteria and multiple claim scenarios:
[T]raditionally published authors might see around $1,000-1,500 per book. Self-published authors who own their rights would keep more. Academic authors or others who signed away their rights might get nothing.
While this saga underlines the importance of licensing as a departure point from sticky copyright questions, one can’t help but think we have been launched into more chaos. It is worth highlighting that Anthropic is classed as a startup whose valuation has steadily increased following Claude’s release in March 2023 and which is backed by Amazon. It also
raised $USD 13 billion in funding at a $USD 183 billion post-money valuation while nutting out the details of the settlement.
Bartz evidences a turning point in a post-AI world for copyright enthusiasts but, for Anthropic, alongside its supporters and competitors, perhaps this is simply the cost of doing business.
Some have quoted former Google CEO Eric Schmidt’s comment last year that:
[I] f your product takes off, you “hire a whole bunch of lawyers to go clean the mess up,” because “if nobody uses your product, it doesn’t matter that you stole all the content.
But, for this Katfriend, Mark Zuckerberg’s comments to “move fast and break things” seems more appropriate. Only the thing that has been broken is the social and cultural value of human creativity, priced at $USD 3,000 per work (and only for copying pirated books).