Recent headlines are inundated with AI licensing agreements. Most relate to the use of copyright works to train AI models (IPKat
), in parallel to copyright infringement lawsuits. However, a new interest has emerged: downstream AI service providers that build upon foundational models. They use LLMs in a range of scenarios: fine-tuning them for targeted application, connecting them to external data sources to generate responses through retrieval-augmented generation (RAG), allowing them to act as authorised agentic agents to automate tasks, integrating them with AI-interfaces in user-facing products, or embedding their features into existing software services.
‘advanced AI-powered search engine and answer engine’. Trained on multiple LLMs (e.g. GPT-4, Claude, and Llama), it integrates ‘real-time web searching to provide precise, well-sourced, and up-to-date responses to user queries’ through RAG. The product of ‘
‘with citations’. As an
its efficiency for ‘planning trips, researching purchases, managing job searches, eliminating to-do lists and the online chores that span endless tabs’.
Does connecting a hybrid LLM to the internet infringe copyright law? Perplexity AI
maintains that: ‘it doesn’t train large foundation models (like OpenAI or Anthropic) but instead uses available data to answer questions in real time, citing sources […]’. However, lawsuits commenced by
Reddit,
Dow Jones and NYP Holdings, and
Encyclopaedia Britannica and Merriam-Webster in the US, alongside
Nikkei and Asahi Shimbun (major newspaper publishers) in Japan, indicate that the very boundaries of the internet are being redrawn (again). This potentially extends to the UK, where the
BBC has threatened legal action against Perplexity AI.
Generally, these cases centre on Perplexity AI unlawfully accessing copyright works within their RAG database. Yet there are differing approaches to framing the technical acts as infringing uses. Reddit
alleges that ‘Perplexity AI’s business model is to effectively take Reddit’s content from Google search results, feed them into a third party’s LLM, and call it a new product’. Their focus is on RAG requiring unlawful circumvention of technological protection measures (TPM) designed to prevent scraping. All without a licence, despite a $20billion valuation. The complaint details that Reddit caught Perplexity AI ‘red-handed’ by creating a ‘test-post’, only accessible through Google search results, appearing in users' responses hours later.
Dow Jones
argues, inter alia, that the RAG database comprises unauthorised reproductions and that Perplexity's output contains full or partial reproductions. Encyclopaedia Britannica and Merriam-Webster
claim, inter alia, copyright infringement relating to scraping, and unlawful reproduction at the input and output stage.
All claimants focus on the harm Perplexity AI inflicts on their revenue because users are encouraged to
‘skip the links’, eliminating traffic to the source. In response, Perplexity AI
points towards its ‘cited source’ feature alongside referral traffic as a ‘market differentiator of its answer engine’ that legitimises the RAG system. They contend that not only do responses include source citations, but that they are ‘clickable’ and drive traffic to the source.
Perplexity AI, across all lawsuits, is described as a bad faith actor. Reddit leans on
Cloudfare’s assessment that ‘Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives’ akin to a ‘North Korean hacker’; not a ‘reputable AI company’. Ironically, on a subreddit thread, Perplexity AI
replies, that ‘this is a sad example of what happens when public data becomes a big part of a public company’s business model’ and deems the lawsuit as ‘the opposite of an open internet’. They also
accuse Cloudfare’s systems of being ‘fundamentally inadequate for distinguishing between legitimate AI assistant and actual threats’, noting that ‘[i]f you can’t tell a helpful digital assistant from a malicious scraper, then you probably shouldn’t be making decisions about what constitutes legitimate web traffic’.
Perplexity AI’s own characterisation as an ‘open web’s saviour’ is not lost on Reddit that
emphasize their own role as a ‘steward of its users’ communities discussions, and authentic human discourse’ that Perplexity AI’s use allegedly risks. More broadly, this Kat views this as a discussion that tests the very boundaries of the copyright system. Agentic AI systems remind her of the same challenges addressed in
Meltwater where the CJEU found that end-user browsing does not require a licence. This entrenched the use of ‘walled gardens’ (e.g. paywalls, robots.txt protocol, and subscriptions) to control access to their content and data and saw the introduction of the press publishers' right (article 15 of the
DSM) in the EU. The latter to be addressed in
Like Company v Google, C-250/25, a CJEU referral relating to Google’s AI chatbot, Gemini, training LLMs and generating similar output (IPKat
here).
Licensing deals
Behind the scenes, an increasingly complicated web of agreements is shifting internet power asymmetries related to copyright again. Perhaps the most notable being Getty Images and Perplexity AI's multi-year licensing agreement to ‘
boost AI-powered search visuals’, where Getty:
[P]provide[s] visuals to Perplexity through an API integration that will allow the AI platform to pull licensed images directly from its vast image library, giving users access to premium content with proper attribution.
Perplexity will include image credits and source links for generated answers. This follows Getty’s largely unsuccessful first instance decision against Stability AI in the UK (IPKat
here). Perplexity has also implemented a ‘
Publisher's Program’ where revenues are allegedly shared with TIME, Der Spiegel, Fortune, amongst
others. Perplexity has
pledged that the ad-revenue share will be ‘double digit’ percentage points. They view this as a way to avoid ‘cannibalizing publishers or competing with them’ to ensure that ‘there are these vibrant and diverse business models and revenue streams’. A similar approach is taken by OpenAI that have
signed licensing agreements with publishers including News Corp (Dow Jones), The Financial Times, and Axel Springer. Indeed Dow Jones, whose case will
proceed to trial despite Perplexity AI’s motion to dismiss, has
remarked that they:
[A]pplaud principled companies like OpenAI, which understands that integrity and creativity are essential if we are to realise the potential of Artificial Intelligence’ compared to Perplexity AI’s ‘content kleptocracy.
Comment
The emerging picture is incredibly hazy. All sides portray themselves as integral protectors of the internet, however, they also forge new online revenues dependent on the very openness they allegedly protect: From Perplexity AI’s ad monetisation and subscription model where users pay for the summarization of the very data they publish on Reddit, to Reddit
launching ‘Interactive Ads’ that lets ‘brands build custom, interactive ad experiences directly for Reddit’s 100,000+ communities – inviting redditors to play, participate, and explore directly within the ad itself’.
Linked to the
enshittification of the internet, the monetisation of ‘public data’ leaves one clear loser, us. And from an IP perspective, the release of the
Perplexity Patent Research Agent may trigger alarm over the commercialisation of openly accessible patent data given the significance of public disclosure and knowledge sharing within the patent system.
As a ‘first-mover’ for downstream AI use, Perplexity AI is in a unique position, shaping the future of copyright and data accessibility. And while this Kat enjoys the vivid allegories, the Perplexity AI litigation and licensing agreements remove key public policy issues from larger public discourse. It also limits our understanding of the broader socio-cultural impact of generative AI and potential new ways forward that exist perhaps beyond the boundaries of copyright law entirely.
Four Seasons in One Day holds a special place in this Kat’s heart. Crowded House sings of sunshine on black clouds, the feeling of multitudes coexisting; that one can experience four seasons in one day. While they are hesitant to predict the weather, it seems that Perplexity AI has seemingly steered the course despite multiple storms and that it does indeed pay to make (bold) predictions.