John,
Thanks for that. I had tested the previous version of DeepSeek in December with J-E translation and with prompts in Japanese, but it wasn't very good; in fact, it often mixed Chinese with Japanese. The R1 release that has been attracting so much attention over the past two weeks was better in my tests with nontranslation tasks but still not as good as the latest versions of ChatGPT or Claude. (It is interesting to read R1’s reasoning traces, though.)
Just now, I did a side-by-side test of translation with DeepSeek R1 and the just-released ChatGPT o3-mini. I ran the test on Perplexity Pro, which hosts the R1 model in the US (and thus doesn’t have some of the Chinese censorship of political topics) and which has just added o3-mini as well. The text was a speech I translated a month ago from Japanese to English, preceded by a long prompt specifying the speech’s purpose and audience and the sort of style I wanted.
An initial comparison of the output suggested that, while the R1 output didn’t seem bad, o3-mini produced a writing style closer to what I asked for in the prompt—smoother and more natural.
But then I noticed that the output length was 5,855 characters for R1, 9,052 characters for o3-mini, and 11,021 for my own polished version. Comparing the three translations side-by-side with the original, I discovered that R1 had omitted entire paragraphs toward the end of the speech, and that 03-mini had switched to a strange abbreviated style (using slashes instead of “and” between noun phrases, for example) toward the end as well. The vanilla versions of ChatGPT, Claude, and Gemini that I ran the same prompt and text through a month ago had had none of those problems.
Maybe R1 is indeed “insanely good” for Chinese translation. For Japanese translation, at least, it looks like the American nonreasoning models are better.
In other AI news, the start-up Sakana AI in Tokyo yesterday released a small LLM optimized for Japanese that you can download and run offline from your browser cache:
I did a couple J-E translation tests on it now, with the inference performed locally in Chrome on my M1 Mac mini. It was slow but not unusably so. The quality of the translation wasn’t very good, though—similar to Google Translate from a few years ago. That’s not surprising considering how small the model is.
I haven’t tried other locally hosted models recently, but the new small open-weight models from DeepSeek, Qwen, Mistral, and others are reported to be pretty good for coding. Meta is apparently getting ready to release new versions of Llama as well. At some point I will download and try the latest versions with J-E translation.
Nearly all of the translation I do these days is for documents that will soon be published on the web, so confidentiality is not a big consideration and I can use cloud-based LLMs. Translators who cannot use cloud-hosted models because of confidentiality concerns might want to try out the smaller, distilled models that can be run locally.
Tom Gally