Patents.google.com -- machine translation questions...

WARREN SMITH

unread,

May 11, 2026, 11:18:35 AM (10 days ago) May 11

to honyaku

t is somewhat unusual to find myself on the consumer side of translation, rather than acting as the translator.

I am currently handling a portfolio of approximately 30 recently published Chinese patent applications that need to be filed in the United States. As part of an initial assessment, I conducted a controlled comparison using one application. I reviewed the machine translation available on patents.google.com and compared it against the source Chinese text. I then provided both the original text and the machine translation, along with the PDF of the source document, to ChatGPT for evaluation.

The results were mixed. On the one hand, the machine translation was generally serviceable and, with attorney-level edits, could form a workable basis for direct U.S. filing. However, I observed several issues that are difficult to justify given the maturity of current translation systems. For example, a numerical value of 1.035 in the source text appeared as 1.065 in the English translation. There was also at least one instance of a fully omitted sentence. These are not the types of errors one would expect in a relatively controlled technical domain.

ChatGPT performed well as a secondary review tool. It identified not only the numerical discrepancy and omission, but also flagged internal inconsistencies in the source material itself, such as a mismatch between “nitrogen” in the abstract and “argon” in the claims.

What was more surprising was the variability across the portfolio. A second application, similar in subject matter and filed within a comparable timeframe, had a machine translation of significantly lower quality, with numerous substantive errors. This raises a practical concern: why is translation quality inconsistent across closely related documents, even when the underlying technology and filing dates are similar?

As a side observation, and for context, the economic transition from translation to patent practice has been notable. Practicing patent law has not approached the income level I previously achieved as a translator. In addition, while my Japanese language skills remain useful, they have not proven to be as strong a differentiator in my current role as one might expect. The most valuable carryover has been the familiarity with patent drafting and structure developed through years of translation work.

Warren

Herman

unread,

May 11, 2026, 1:14:57 PM (10 days ago) May 11

to hon...@googlegroups.com

On 2026-05-11 08:18, 'WARREN SMITH' via Honyaku E<>J translation list wrote:
> t is somewhat unusual to find myself on the consumer side of
> translation, rather than acting as the translator.
> I am currently handling a portfolio of approximately 30 recently
> published Chinese patent applications that need to be filed in the
> United States. As part of an initial assessment, I conducted a
> controlled comparison using one application. I reviewed the machine
> translation available on patents.google.com and compared it against the
> source Chinese text. I then provided both the original text and the
> machine translation, along with the PDF of the source document, to
> ChatGPT for evaluation.
> The results were mixed. On the one hand, the machine translation was
> generally serviceable and, with attorney-level edits, could form a
> workable basis for direct U.S. filing. However, I observed several
> issues that are difficult to justify given the maturity of current
> translation systems. For example, a numerical value of 1.035 in the
> source text appeared as 1.065 in the English translation. There was also
> at least one instance of a fully omitted sentence. These are not the
> types of errors one would expect in a relatively controlled technical
> domain.

I would say, these are not the types of errors one would expect given a
modernist conception of AI (e.g. the Star Trek computer).

However, LLMs operate on post-postmodern principles, under which
unexpected behavior can be completely expected, and in the case of LLMs
is to be expected (even though it is unexpected).

I have actually noted recently an increase in cases of omitted text and
untranslated source appearing in the translation output of several
different LLM/MT systems, which I don't recall seeing in the past at all.

> ChatGPT performed well as a secondary review tool. It identified not
> only the numerical discrepancy and omission, but also flagged internal
> inconsistencies in the source material itself, such as a mismatch
> between “nitrogen” in the abstract and “argon” in the claims.
> What was more surprising was the variability across the portfolio. A
> second application, similar in subject matter and filed within a
> comparable timeframe, had a machine translation of significantly lower
> quality, with numerous substantive errors. This raises a practical
> concern: why is translation quality inconsistent across closely related
> documents, even when the underlying technology and filing dates are
> similar?

The most obvious explanation would be that different models were used
for the different translations, but even the same model can produce
highly divergent results.

An LLM has no object model of translation and always solves just one
problem: given the input (including user's prompt and LLM's own
previously generated output), what is the most likely next token (or set
of n tokens)? So on the whole, it behaves as a chaotic system, i.e.
potentially highly sensitive to some slight difference in the starting
conditions.

Herman

Rene

unread,

May 12, 2026, 2:45:23 AM (9 days ago) May 12

to hon...@googlegroups.com

On Tue, May 12, 2026 at 12:18 AM 'WARREN SMITH' via Honyaku E<>J translation list <hon...@googlegroups.com> wrote:

What was more surprising was the variability across the portfolio. A second application, similar in subject matter and filed within a comparable timeframe, had a machine translation of significantly lower quality, with numerous substantive errors. This raises a practical concern: why is translation quality inconsistent across closely related documents, even when the underlying technology and filing dates are similar?

Is that not because these gigantic neural networks, the details of which even their creators do not understand, function more like a human brain than a traditional computer program which follows set algorithms?

It seems that with AI you are in effect handing over the source to a bunch of unknown human translators...

Rene von Rentzell, Tokyo

Herman

unread,

May 12, 2026, 3:40:42 AM (9 days ago) May 12

to hon...@googlegroups.com

On 2026-05-11 23:43, Rene wrote:
>
>
> On Tue, May 12, 2026 at 12:18 AM 'WARREN SMITH' via Honyaku E<>J
> translation list <hon...@googlegroups.com

> <mailto:hon...@googlegroups.com>> wrote:
>
> __

> What was more surprising was the variability across the portfolio. A
> second application, similar in subject matter and filed within a
> comparable timeframe, had a machine translation of significantly
> lower quality, with numerous substantive errors. This raises a
> practical concern: why is translation quality inconsistent across
> closely related documents, even when the underlying technology and
> filing dates are similar?
>
>
> Is that not because these gigantic neural networks, the details of which
> even their creators do not understand, function more like a human brain
> than a traditional computer program which follows set algorithms?
> It seems that with AI you are in effect handing over the source to a
> bunch of unknown human translators...
>

In this context, the human brain is probably more like a traditional
computer program. It would be highly unusual to find that the same human
translator working at around the same time on several similar documents
would produce translations of notably different quality, wouldn't it?

Herman

Rene

unread,

May 12, 2026, 4:32:35 AM (9 days ago) May 12

to hon...@googlegroups.com

On Tue, May 12, 2026 at 4:40 PM 'Herman' via Honyaku E<>J translation list <hon...@googlegroups.com> wrote:

In this context, the human brain is probably more like a traditional
computer program. It would be highly unusual to find that the same human
translator working at around the same time on several similar documents
would produce translations of notably different quality, wouldn't it?

Agree, but how do you know you can look a giant server farm as ONE brain? One query might be handled by one part of the giant network, and another by another. Consider that these things are dealing with probably millions of things at the same time. So it would be more tossing a couple of documents into an anonymous pool of unknown translators.

Rene von Rentzell, Tokyo

Herman

unread,

May 12, 2026, 6:37:49 AM (9 days ago) May 12

to hon...@googlegroups.com

On the assumption that the exact physico-chemical state of the brain of
a single translator varies more from moment to moment in a functionally
relevant sense than do the individual CPUs or other hardware making up a
server farm.

Herman

Warren Smith

unread,

May 12, 2026, 9:05:11 AM (9 days ago) May 12

to hon...@googlegroups.com

It seems that with AI you are in effect handing over the source to a bunch of unknown human translators...

That's the thing... with the proper instruction I trust the AI more than do an unknown human translator. If I could be assured that I would have someone like Herman or Bill or Rene to do my translations, I would favor humans, but I don't know what I am getting with Chinese translation from a translator I don't know. I would much rather use a committee of several AIs checking each other's work than use an unknown human translator.

Warren

--
You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/honyaku/CAAtz4ZLh06E0%2BG5uFHX_a9%3DZ-8g79pVf%3DjzAYuXEkWUWx0vcPg%40mail.gmail.com.

Tom Gally

unread,

May 12, 2026, 9:33:34 AM (9 days ago) May 12

to hon...@googlegroups.com

I haven’t used LLMs to translate technical documents like patents, but for the formal speeches that I translate occasionally, I find that even the latest models occasionally make mistakes with numbers, omit phrases or even entire sentences, etc., especially if I set them up to respond quickly, i.e., without extended reasoning.

When I have LLMs check translations, they do catch errors but not completely or consistently. Different models will catch different mistakes (as well as what are, in my opinion, nonmistakes), and even the same model will identify different problems when given the same translation to check in a new conversation. So for high-stakes translations, I run the checks multiple times with multiple models, correcting the mistakes found after each run before moving on to the next. It is an asymptotic process: the translations get closer and closer to perfection but never quite reach it. I have never produced a perfect translation myself in any case, and the translations I create together with these imperfect LLMs are better than those I could do on my own.

As I described here back in January, I have Claude Code do most of the translation and checking process automatically using multiple models through OpenRouter (a “committee of several AIs,” to use Warren’s apt phrase). About a month ago, I also started having Claude build a knowledge wiki for itself about the translations I do using Andrej Karpathy’s framework ( https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f ). The speech-translation season is at a lull right now, so I haven’t had a chance yet to check whether providing this knowledge wiki to the AI committee will actual lead to better translations.

If anyone wants to see a similar AI-produced knowledge wiki, the one being produced by Claude using Karpathy’s framework for my dictionary site is here: https://github.com/tkgally/je-dict-1/blob/main/planning/wiki/index.md

I have similar wikis going for some other private projects, including a university class I teach. They help me organize my thoughts a bit if nothing else.

Tom Gally

Herman

unread,

May 12, 2026, 6:50:24 PM (9 days ago) May 12

to hon...@googlegroups.com

On 2026-05-12 06:04, Warren Smith wrote:
> It seems that with AI you are in effect handing over the source to a
> bunch of unknown human translators...
>
> That's the thing... with the proper instruction I trust the AI more than
> do an unknown human translator. If I could be assured that I would have
> someone like Herman or Bill or Rene to do my translations, I would favor
> humans, but I don't know what I am getting with Chinese translation from
> a translator I don't know. I would much rather use a committee of
> several AIs checking each other's work than use an unknown human
> translator.
>
>

That is a reasonable approach. I would note though, that varying the
instructions would probably be much more effective in principle that
simply varying the model, because you are in effect working with a hash
function here, and the optimal solution -- one that avoids convergence
on shared hallucinations -- would be the one obtained by maximally
scattering your keys within the relevant region of the search space,
given that the model is not going to exhaustively probe every matching
entry in its data table (and given also that the different models are
more or less the same algorithms operating on more or less the same
training data).

But here is a thought experiment that may be of interest:

If a committee of LLMs is engaged by Party A to produce a translation
that will be submitted as evidence in litigation, and a committee of the
same LLMs is engaged by Party B to find a way of impeaching the
translation furnished by A, who will tend to win?

Herman Kahn

Rene

unread,

May 13, 2026, 2:52:35 AM (8 days ago) May 13

to hon...@googlegroups.com

On Wed, May 13, 2026 at 7:50 AM 'Herman' via Honyaku E<>J translation list <hon...@googlegroups.com> wrote:

(and given also that the different models are
more or less the same algorithms operating on more or less the same
training data).

But do these things really have "algorithms"? The way I understand it, they operate by comparing "weights" over widely distributed data sets. And their creators have no way to understand how they do that in detail.

Roman Yampolskiy has some interesting conversations about that.

But here is a thought experiment that may be of interest:
If a committee of LLMs is engaged by Party A to produce a translation
that will be submitted as evidence in litigation, and a committee of the
same LLMs is engaged by Party B to find a way of impeaching the
translation furnished by A, who will tend to win?

Indeed. Maybe you can try something like that out on Molthub?

---

Rene von Rentzell, Tokyo

Herman

unread,

May 13, 2026, 7:20:02 AM (8 days ago) May 13

to hon...@googlegroups.com

On 2026-05-12 23:42, Rene wrote:
> On Wed, May 13, 2026 at 7:50 AM 'Herman' via Honyaku E<>J translation
> list <hon...@googlegroups.com <mailto:hon...@googlegroups.com>> wrote:
>
> (and given also that the different models are
> more or less the same algorithms operating on more or less the same
> training data).
>
>
> But do these things really have "algorithms"? The way I understand it,
> they operate by comparing "weights" over widely distributed data sets.
> And their creators have no way to understand how they do that in detail.
> Roman Yampolskiy has some interesting conversations about that.
>
>

Yes, each of them has a specific mathematical algorithm - an extension
of the method of least squares developed by Gauss, the idea of which is
that a complex function can be approximated by a series of simple
functions.

These simple functions include multiplying the input by constants
("weights"), the values of which are established through iterative
adjustment ("training") based on an algorithm developed by Leibniz
(chain rule).

And then later, some new input is multiplied by those adjusted weights,
etc. to produce a new output.

So the creator specifies a function that based on mathematical
principles is believed to be capable of converting the expected range of
inputs to the expected range of outputs, but in terms of what exactly
any given one of the billions of such weights or constants "means" or
why it should have one value versus another, I guess it could be said
that the creators have no way of understanding that.

Does that mean we are not dealing with an algorithm here? I don't think
so, because by that logic, I would be forced to say that circumference =
2πr is not an algorithm/function/formula, because I don't understand, or
nobody understands, in detail, what the 9 in 3.14159 really means and
why it isn't 8 instead (not to mention the infinite number of other
digits).

Herman Kahn

Reply all

Reply to author

Forward