Use of AI in the patent industry: Solving the confidentiality problem

45 views

Skip to first unread message

Rose Hughes

unread,

Oct 2, 2025, 4:52:31 AMOct 2

to ipkat_...@googlegroups.com

Use of AI in the patent industry: Solving the confidentiality problem

Dr Rose Hughes Thursday, October 02, 2025 - AI, AI patent software, Artificial Intelligence, large language models, LLMs, machine learning, patents, PatKat, Rose Hughes

As patent attorneys, we are constantly told that AI and Large Language Models (LLMs) are poised to disrupt the profession, and that we must all leap onto the AI bandwagon or be left behind. However, when this Kat talks to fellow patent professionals away from the LinkedIn and conference-circuit hype, it appears that the majority of the profession remain sceptical. Indeed, whatever the AI patent tool providers may tell you, many patent attorneys are not using any form of AI in their daily practice. This hesitation appears to stem from two major concerns. First, the understandable and legally mandated patent attorney obsession with confidentiality. Second, the unfortunate tendency for LLMs to simply make things up in the form of hallucinations. In the first of two posts looking at how patent attorneys may navigate these issues, let us first tackle the all-important issue of confidentiality.

The importance of confidentiality is absolute

Other than the military, there is perhaps no sector in which maintaining confidentiality of client data is as important as the patent industry. Everything made available to the public by any means before the date of filing of the patent application for the invention may be cited against the novelty and inventiveness of the invention (Article 54 EPC). When drafting a patent, patent attorneys are dealing with the most sensitive client data (their valuable IP) at the most sensitive time (just before the patent is filed). Any disclosure before the filing date may invalidate the patent and cost the client incalculable damage.

AI helper

Worrying about the effect of new technology on confidential information is also hardly a new problem for our profession. Cloud computing raised similar issues when it first appeared. Even the simple act of Google searching raises confidentiality issues, given that the mere act of searching for an inventive concept may prompt Google to update its search algorithm or begin auto-completing search results that may impact the validity of an invention. Patent attorneys in the pharma sector also know not to BLAST a novel antibody sequence or search for a new chemical structure in public databases. LLMs are just the latest tool to which these considerations apply.

The patent attorney's code of conduct

Any use of AI by patent attorneys is governed by the same codes of conduct that apply to our usual practice, including the epi Code of Conduct for European professional representatives and IPReg for UK attorneys.

According to Article 1 of the epi Code of Conduct, the basic task of a European Patent Attorney is to serve as a reliable adviser to persons interested in patent matters. We "must at all times give adequate care and attention and apply the necessary expertise to work entrusted" to us by our clients. IPReg's Core Regulatory Framework Article 1.8 also requires that "[y]ou keep your client’s affairs confidential unless permitted by law or your client consents". As patent attorneys we are thus bound by our professional codes of conduct to keep our client's data confidential. This requirement applies to the use of AI as to the use of any other tool or software.

The epi guidelines on the Use of Generative AI in the Work of Patent Attorneys

To help us navigate the issues arising with the LLM-age, epi has provided some guidelines on the Use of Generative AI in the Work of Patent Attorneys. These guidelines begin by emphasising that "[w]hen using AI of any kind in professional work, a Member must adopt the highest possible standards of probity; must take all reasonable steps to maintain confidentiality when this is required". Towards this aim, Guideline 1 requires that "Members should inform themselves about both the general characteristics of generative AI models and the specific attributes of any model(s) employed in their professional work, in terms of (at least) the key aspects of prompt confidentiality".

Similarly, Guideline 2a requires that "Members when using generative AI must, to the extent called for by the circumstances, ensure adequate confidentiality of training datasets, instruction prompts and other content transmitted to AI models. If there is doubt that confidentiality will be maintained to a level that is appropriate to the prevailing context the AI model in question should not be used."

Finally, Guideline 2b specifies that "in ensuring adequate confidentiality, Members must inform themselves about the likelihoods and modes of non-confidential disclosures deriving from use of specific AI models."

The epi guidelines on generative AI therefore confirm for us, in case we were in any doubt, that the requirement for protecting client confidential data also applies in the context of AI models. The epi guidelines also helpfully point out that there are many risks involved with the use of LLMs.

However, patent attorneys may still be left wondering how they are to "inform themselves adequately" and how they may address or resolve any doubts with respect to the use of AI tools. Given the uncertainty, potential risks and lack of clear practical steps which may be taken to address the issues, it is unsurprising that many firms and companies have simply opted for not using any form of AI at all, and by blocking employees' access to the foundational LLMs.

Whilst avoiding AI for the moment may be okay for the short-term, it is unlikely to be a sustainable position, unless you are on the brink of retirement. Many firms are therefore also tentatively exploring how AI may be used via employment of third-party solutions offering to revolutionise the patent drafting and prosecution process. Anecdotally, this mostly seems to entail appointing a partner or two to investigate the various offerings.

There are varying degrees to which these AI patent tool providers have dealt with confidentiality issues. When dealing with these (often very nascent) companies, patent attorney firms need to interrogate two critical confidentiality questions. First, what is the relationship and what are confidentiality provisions between the AI patent tool provider and the provider of the underlying LLM. Second, what are the confidentiality policies and terms of use of the AI-patent tool providers themselves.

Using AI for patents? It probably uses LLMs

All AI patent tool providers, without exception, will be using foundational models to some degree. If they are not, they simply will not have a tool that is useful for language analysis or generation. However, a common misconception appears to be that AI patent drafting companies are "training" their own AI models. This is not the case. Training an AI model capable of doing even the basic patent drafting or prosecution tasks requires the sophistication of a LLM from a foundational LLM lab such as OpenAI, Google or Anthropic. Any AI patent drafting or prosecution service provider will therefore be sending data to one of these LLMs. This means that, when using an AI-patent tool, it is critical for a patent attorney to ask to see the agreement or terms of use that the AI-patent tool provider has with the LLM provider. In particular, it is essential for the patent attorney to establish 1) if any user input or output data from the tool is retained by the LLM provider and 2) if the LLM provider trains or updates its model based on user data from the tool. To maintain the adequate level of confidentiality for client data, the answer to both of these questions needs to be "no".

The difference between training and optimisation

Whilst AI patent tool providers are very much not building their own LLMs, they will still be highly interested in improving their software and LLM prompting. AI patent tool providers are therefore going to be very interested in the form and content of user inputs as well as how users adapt their outputs. However, this information is also likely to contain highly sensitive client data. Depending on the customizability of the tool, user inputs and outputs may also contain highly expert patent attorney knowledge from the patent attorney themselves. AI patent tool providers may also be interested in how they might use this information to improve their software and LLM prompts, for example incorporating information on how to draft a certain type of patent application, or respond to a certain form of EPO objection. To protect both their own and their client's data, it is therefore also critical that patent firms establish how and what user data the AI tool provider has access to, how long this will be stored and how it will be used. The terms of use also need to be clear in this respect.

Beware of brand trust and the LLM Trojan horse

It is remarkable how much familiarity renders trust, of which the widespread adoption of Microsoft's Copilot is a classic example. Whilst many companies shun the use of LLMs from newer players in the industry, such as OpenAI, these same companies appear perfectly comfortable with adopting Microsoft's equivalent LLM offering and integrating it throughout their business. However, Copilot is an LLM just like any other, hungry for client training data to improve its output (Copilot arguably does needs some improvement compared to some of the other major players...). The same questions therefore need to be asked of Copilot as for any other software: is your data retained by Microsoft, and is it being used to train the model? Like the other LLMs, Microsoft has different Copilot offerings, e.g. for individuals and companies, with different training and data retention provisions. It is necessary for a patent attorney to know that they have the right protections under the appropriate service agreement when submitting data to the LLM, even when this LLM is provided by such a trusted and ubiquitous brand as Microsoft.

On top of this is the insidious, Trojan horse-like use of AI in patent software solutions. You will be hard-pressed to find any patent software, including search tools and portfolio analysis software, that does not make some use of AI. In many cases this will be presented as an AI option button. If this AI tool is analysing or generating language, it is highly likely to be sending data to an underlying LLM from a foundational LLM lab. The same questions need to be asked with respect to this use of AI as with any LLM use in connection with any client confidential data: what data is retained and is the LLM trained on the data?

Some practical guidance

Away from the AI-hype, most patent attorneys are still trying to figure out if and how they can use AI safely and effectively in their day-to-day work. Lack of AI user expertise and concerns over confidentiality have led many to consider outsourcing the problem to third-party software providers. However, we know from the epi guidelines that using a third party tool does not relieve patent attorneys of their duty to inform themselves of the risk, whatever high-level assurances the AI service providers may give.

The practical steps that all patent attorneys can take are as follows. First, establish if LLMs are being used in the AI software solutions you are employing (spoiler alert: they probably are). Next, establish which LLMs are being used (ChatGPT, Claude, Gemini, or a combination). You can now ask the AI tool service provider for copies of the relevant agreements and terms of use they have with the LLM provider. In these agreements you need to check the data retention policy and whether user data is used for training the model. Having checked this, you next need to be confident that the AI tool service providers are not themselves using your data to optimise their code and LLM prompting.

Final thoughts

The epi guidelines on the use of generative AI tell us to all be careful and diligent with our use of AI. However, what many attorneys currently seem to lack, however, is the requisite knowledge and expertise of how to ensure adequate care and diligence when using AI tools provided by third parties. There appears to be a general lack of appreciation of the ubiquitous nature of LLMs, and that the vast majority of AI tool providers are not training their own LLMs or AI models but providing a user interface for engagement with the LLMs. The appropriate question is therefore usually not "will my data be used to train your model", but what agreement do you have with Google / Anthropic / OpenAI / Microsoft etc to ensure that the LLM lab is not using your data to train their model. However, as outlined above, once these fundamentals are understood, there are simple practical steps that can be taken to ensure your client's data is adequately protected and our ultimate code of conduct is followed. Good luck everyone!

Further reading

Use of large language models in the patent industry: A risk to patent quality? (Oct 2023)

Reply all

Reply to author

Forward

0 new messages