Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Emacs as a translator's tool

56 views
Skip to first unread message

Marcin Borkowski

unread,
May 29, 2020, 1:55:39 AM5/29/20
to Help Gnu Emacs mailing list
Hi all,

does anyone here perform translations within Emacs? Do you know of any
tools facilitating that? There exist a few CATs, or Computer Aided
Translation systems, but - AFAIK - they are all proprietary and closed
source. Emacs seems capable of implementing at least a simple CAT, but
I could not find any existing solutions for that. (I skimmed through
the answers here:
https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
but did not find anything useful.)

The first thing I would need is a way to highlight the "currently
translated sentence" in the other window, where I would keep the
original text, with an easy way to highlight the next/previous one -
this seems very easy to do, but did anyone actually code anything like
this?

TIA,

--
Marcin Borkowski
http://mbork.pl

stardiviner

unread,
May 29, 2020, 2:22:09 AM5/29/20
to Marcin Borkowski, help-gn...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


If you're using Org Mode, then I'm trying to write an extension based on Org
Babel and google-translate to generate translated results.

Here is my ob-translate repo: https://github.com/stardiviner/ob-translate

Also, some links which might helpful for you.

- - https://github.com/atykhonov/google-translate
- - http://github.com/juergenhoetzel/babel
- - https://github.com/liShiZhensPi/baidu-translate
- - https://github.com/jcs-elpa/define-it
- --
[ stardiviner ]
I try to make every word tell the meaning that I want to express.

Blog: https://stardiviner.github.io/
IRC(freenode): stardiviner, Matrix: stardiviner
GPG: F09F650D7D674819892591401B5DF1C95AE89AC3

-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QqgQUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsMEQAgAu7XSOd5MK+quzPWRQ5OxyKJ73Mx5
57jfm3j7Eo5UVlZEJyXt6LPoh1g82ORUGhhREADD2Q63/BjX3xwKPKhhSKLwZidl
EdkA+NmqAyFeXb/85nmIF4UTYxkcJQyCYXwBuiHi1Dx8CojBysi4m7w+xDzPGwwC
Q/9EBdVapnqRzILwMHqV8HMZddtaWbaLYcBMB2BRE3y/GyaTtvB85aPWyDufMLTP
3XMde6NxguJvJ7fGdy0n6QTEvYDT2QvxecWWLEL349jwkviiarlClf8HEn+ICPqz
OVG+Fg2y9k3XAUg27JV3tpm/OrF7v3/E8cc0k6NzUdvFmLft/wn4m0ENAA==
=lwdy
-----END PGP SIGNATURE-----

Marcin Borkowski

unread,
May 29, 2020, 2:35:58 AM5/29/20
to Help Gnu Emacs mailing list

On 2020-05-29, at 07:55, Marcin Borkowski <mb...@mbork.pl> wrote:

> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source. Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)
>
> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous one -
> this seems very easy to do, but did anyone actually code anything like
> this?

OK, so I assumed nobody did it, so here's my take. Probably not
extremely well-done, but I just coded it in 15 minutes, so there you go.
Comments welcome.

--8<---------------cut here---------------start------------->8---
(defface ecat-highlight-face '((t :background "#e7ede7"))
"Face for highlighting the currently translated sentence.")

(defvar ecat-sentence-overlay nil
"The overlay to highlight the currently translated sentence.")

(defun ecat-highlight-this-sentence ()
"Highlight the sentence at point using an overlay."
(interactive)
(delete-overlay ecat-sentence-overlay)
(save-excursion
(let ((sentence-end (progn (forward-sentence)
(point)))
(sentence-beginning (progn (backward-sentence)
(point))))
(setq ecat-sentence-overlay
(make-overlay sentence-beginning sentence-end))))
(overlay-put ecat-sentence-overlay 'face 'ecat-highlight-face))

(defun ecat-highlight-next-sentence ()
"Move the highlight to the next sentence."
(interactive)
(save-excursion
(set-buffer (overlay-buffer ecat-sentence-overlay))
(goto-char (overlay-end ecat-sentence-overlay))
(let ((sentence-end (progn (forward-sentence)
(point)))
(sentence-beginning (progn (backward-sentence)
(point))))
(move-overlay ecat-sentence-overlay sentence-beginning sentence-end))))

(defun ecat-highlight-previous-sentence ()
"Move the highlight to the previous sentence."
(interactive)
(save-excursion
(set-buffer (overlay-buffer ecat-sentence-overlay))
(goto-char (overlay-start ecat-sentence-overlay))
(let ((sentence-beginning (progn (backward-sentence)
(point)))
(sentence-end (progn (forward-sentence)
(point))))
(move-overlay ecat-sentence-overlay sentence-beginning sentence-end))))

(defun ecat-disable-sentence-highlighting ()
"Disable sentence highlighting."
(interactive)
(delete-overlay ecat-sentence-overlay))
--8<---------------cut here---------------end--------------->8---

Best,

MENGUAL Jean-Philippe

unread,
May 29, 2020, 2:39:40 AM5/29/20
to help-gn...@gnu.org
Hi,

I mainly use, in Emacs, the po-mode (gettext-el). I still have the
problem I described here, i.e. I would love the "Last-translator" to be
up-to-date automatically with my info in the PO files, but except this,
requiring me to do things manually, I like how it works. Also you need
to have in your .emacs the po-wrap function, to ensure the file stays on
the screen 80 columns.

Regards


Logo Hypra JEAN-PHILIPPE MENGUAL
DIRECTEUR TECHNIQUE ET QUALITÉ
102, rue des poissonniers, 75018, Paris
Tel : +331 84 73 06 61 <tel:+33184730661> Mob : +336 76 34 93 37
<tel:+33676349337>
jpme...@hypra.fr <mailto:jpme...@hypra.fr>
www.hypra.fr <http://www.hypra.fr/>
Facebook Hypra <https://www.facebook.com/hyprasoftware/> Twitter Hypra
<https://twitter.com/Hypra_> Linkedin Jean-Philippe
<https://fr.linkedin.com/in/jean-philippe-mengual-800133135>



Le 29/05/2020 à 07:55, Marcin Borkowski a écrit :
> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source. Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)
>
> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous one -
> this seems very easy to do, but did anyone actually code anything like
> this?
>
> TIA,
>

Jean-Christophe Helary

unread,
May 29, 2020, 2:57:21 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list
Marcin,

> On May 29, 2020, at 14:55, Marcin Borkowski <mb...@mbork.pl> wrote:
>
> Hi all,
>
> does anyone here perform translations within Emacs?

Yes, sometimes.

> Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source.

No. OmegaT is very much GPL and is listed in the Free Software directory. Java based and has recently shifted from using Oracle to AdoptOpenJDK 11.

> Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)

As Jean-Philippe mentions po-mode exists, even if limited in scope.


--
Jean-Christophe Helary @brandelune
http://mac4translators.blogspot.com


Jean-Christophe Helary

unread,
May 29, 2020, 4:03:41 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list


> On May 29, 2020, at 15:57, Jean-Christophe Helary <jean.christ...@traduction-libre.org> wrote:
>
> Marcin,
>
>> On May 29, 2020, at 14:55, Marcin Borkowski <mb...@mbork.pl> wrote:
>>
>> Hi all,
>>
>> does anyone here perform translations within Emacs?
>
> Yes, sometimes.
>
>> Do you know of any
>> tools facilitating that? There exist a few CATs, or Computer Aided
>> Translation systems, but - AFAIK - they are all proprietary and closed
>> source.
>
> No. OmegaT is very much GPL and is listed in the Free Software directory. Java based and has recently shifted from using Oracle to AdoptOpenJDK 11.

In fact, the reason why I came (back) to emacs in the first place is, OmegaT...

I love OmegaT. I created the user support list in 2004 and I've been involved with it since 2002.

But I thought that instead of having a translation memory system in which editor functions were added, maybe having a text editor to which translation memory matching was added would be more efficient. That was my pipe dream them.

So, all that happened in 2003-2004 with the big Common Lisp revival, when Peter Seibel published Practical Common Lisp, when Slime was all over the place, when Bill Clementson had his amazing blog on what could be done with emacs / common lisp and slime, etc.

And I thought to myself that since emacs was a lisp environment, why not see what it's all about ? (There was also a Mac application, Alpaca I think, that was basically a text editor with CL inside).

Notice that in 15 years I have not made 1 inch of progress (or maybe just one, I can understand what goes wrong in my init file :). But at least I'm still around and I like it :)

OmegaT has evolved so much now that it has become one of the mainstream CAT tools (even if "market share" is not high at all). It is used in the EU Translation Bureau. It serves in universities to teach students basic CAT concepts, and it also works well with the Okapi Framework tools that are also Java based. And has a very friendly *multilingual* community available in pretty much all the time zones.

Emanuel Berg

unread,
May 29, 2020, 4:14:33 AM5/29/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> OK, so I assumed nobody did it, so here's my take.
> Probably not extremely well-done, but I just coded
> it in 15 minutes, so there you go.
> Comments welcome.

Byte-compile is your first stop for code comments:

Warning: defface for ‘ecat-highlight-face’ fails
to specify containing group

Warning: Use ‘with-current-buffer’ rather than
save-excursion+set-buffer

> (defun ecat-highlight-this-sentence () [...]
> (defun ecat-highlight-next-sentence () [...]
> (defun ecat-highlight-previous-sentence ()

Can't you do ecat-highlight-next-sentence and
ecat-highlight-previous-sentence by just moving point
to the next sentence and then do
ecat-highlight-this-sentence? Feels more natural...

Anyway, what other features do the proprietary
CATs have?

I always thought translation was just a matter of
reading one thing and then typing what it means,
looking up the occasional word or phrase for the
idiomatic equivalent.

Some idiomatic phrases are pitfalls tho. For example
the English "more or less" looks like the Swedish
"mer eller mindre" (which means "correct but with
room for fine details") but the way native speakers
use it seems to be more (?) "både och" which means
discussion can go both (disparate) ways and BOTH
are correct!

So perhaps one could have a list of these "trap
phrases" so when they turn up in the text, they are
highlighted to indicate "watch out! we are not just
piling words here!"

Who'd compile that list is another matter...

Good idea BTW :)

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Emanuel Berg

unread,
May 29, 2020, 4:22:41 AM5/29/20
to help-gn...@gnu.org
MENGUAL Jean-Philippe wrote:

> I mainly use, in Emacs, the po-mode (gettext-el).
> I still have the problem I described here, i.e.
> I would love the "Last-translator" to be up-to-date
> automatically with my info in the PO files, but
> except this, requiring me to do things manually,
> I like how it works. Also you need to have in your
> .emacs the po-wrap function, to ensure the file
> stays on the screen 80 columns.

Say what? :)

Yuri Khan

unread,
May 29, 2020, 4:29:26 AM5/29/20
to Emanuel Berg, help-gnu-emacs
On Fri, 29 May 2020 at 15:14, Emanuel Berg via Users list for the GNU
Emacs text editor <help-gn...@gnu.org> wrote:

> Anyway, what other features do the proprietary
> CATs have?
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

I have not used any professional CATs, but one important function is
having a vocabulary (also called translation memory).

Imagine translating a novel. When a new character is introduced, you
have to decide how his/her name is translated and spelled. You need to
record it so that you’re consistent. Same goes for any names, not just
of people.

If the translation is a joint effort, that vocabulary needs to be
shared so that the whole team calls characters the same names.

Marcin Borkowski

unread,
May 29, 2020, 4:29:41 AM5/29/20
to MENGUAL Jean-Philippe, help-gn...@gnu.org

On 2020-05-29, at 08:39, MENGUAL Jean-Philippe <mengual...@free.fr> wrote:

> Hi,
>
> I mainly use, in Emacs, the po-mode (gettext-el). I still have the
> problem I described here, i.e. I would love the "Last-translator" to
> be up-to-date automatically with my info in the PO files, but except
> this, requiring me to do things manually, I like how it works. Also
> you need to have in your .emacs the po-wrap function, to ensure the
> file stays on the screen 80 columns.

Thanks for the response. Do I get it correctly that you mean
translating software, so mainly short, possibly unrelated pieces of
text? If so, this seems pretty different to translating a paper or
a book...

Emanuel Berg

unread,
May 29, 2020, 4:30:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> OmegaT is very much GPL and is listed in the Free
> Software directory. Java based and has recently
> shifted from using Oracle to AdoptOpenJDK 11.

Indeed, its in the Debian repos:

$ aptitude show omegat
[...]
Description: Computer Assisted Translation (CAT) tool
OmegaT's main features are
* multiple source texts handling, retaining
complex folder hierarchies
* fuzzy matching with other segments in the
source file(s) or TMX files from previous
projects
* easy glossary terms management
* flexible regex-based sentence segmenting
(using an SRX-like method)
* powerful regex-based searches along with the
facility to apply a filter to display search
results in the editor
* ability to batch process documents from the
command line
* extended project statistics
* easy-to-understand documentation and tutorial
* plugin architecture with separate Lucene
stemmer (recognition of inflected forms) and
LanguageTool (style and grammar checker)
plugins
* integration with Hunspell for spelling
checking
* simple API to access source/target/selection
textual data

OmegaT supports 24 formats, including
documentation formats such as OpenDocument, Open
XML (MS Office 2007), DocBook and (x)HTML, and
also localization formats such as Java
properties and PO files. An Okapi plugin can
further extend the supported formats, for
example to include TTX (TradosTag).
Homepage: https://www.omegat.org

Heh, Java people: "ability to batch process documents
from the command line" :)

Emanuel Berg

unread,
May 29, 2020, 4:35:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> In fact, the reason why I came (back) to emacs in
> the first place is, OmegaT...
>
> I love OmegaT. I created the user support list in
> 2004 and I've been involved with it since 2002.

Be sure to add it to Gmane!

https://gmane.io

> But I thought that instead of having a translation
> memory system in which editor functions were added,
> maybe having a text editor to which translation
> memory matching was added would be more efficient.
> That was my pipe dream them.

What's a translation memory system/matching? :O

Emanuel Berg

unread,
May 29, 2020, 4:40:10 AM5/29/20
to help-gn...@gnu.org
Yuri Khan wrote:

>> Anyway, what other features do the proprietary
>> CATs have? I always thought translation was just
>> a matter of reading one thing and then typing what
>> it means, looking up the occasional word or phrase
>> for the idiomatic equivalent.
>
> I have not used any professional CATs, but one
> important function is having a vocabulary (also
> called translation memory).
>
> Imagine translating a novel. When a new character
> is introduced, you have to decide how his/her name
> is translated and spelled. You need to record it so
> that you’re consistent. Same goes for any names,
> not just of people.

Wait, don't tell me... let's use... A TEXT FILE?!?

Rivendell Vattnadal
Shire Fylke
Strider Vidstige

> If the translation is a joint effort, that
> vocabulary needs to be shared [...]

The Internet. Good enough for government work :)

Jean-Christophe Helary

unread,
May 29, 2020, 4:41:44 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list


> On May 29, 2020, at 17:14, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

It is. But *computer aided translation* tools make that easier by putting all the translation ressources (glossaries, legacy translations, dictionaries, searches, autocompletion) into one translation "IDE" that helps the translator not lose time on repetitive tasks.

Emanuel Berg

unread,
May 29, 2020, 4:45:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

>> I always thought translation was just a matter of
>> reading one thing and then typing what it means,
>> looking up the occasional word or phrase for the
>> idiomatic equivalent.
>
> It is. But *computer aided translation* tools make
> that easier by putting all the translation
> ressources (glossaries, legacy translations,
> dictionaries, searches, autocompletion) into one
> translation "IDE" that helps the translator not
> lose time on repetitive tasks.

... which are?

Jean-Christophe Helary

unread,
May 29, 2020, 5:28:17 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list


> On May 29, 2020, at 17:43, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>>> I always thought translation was just a matter of
>>> reading one thing and then typing what it means,
>>> looking up the occasional word or phrase for the
>>> idiomatic equivalent.
>>
>> It is. But *computer aided translation* tools make
>> that easier by putting all the translation
>> ressources (glossaries, legacy translations,
>> dictionaries, searches, autocompletion) into one
>> translation "IDE" that helps the translator not
>> lose time on repetitive tasks.
>
> ... which are?

Typing text :)

If there is ONE repetitive task in translation, it is typing text.

So anything that is already registered and which can be semi-automatically entered is a godsend.

For ex. You translate a sentence in which half is already registered as a legacy translation. The translation memory engine finds the match in the background (no need for you to search it) and presents its corresponding translation (that's called "fuzzy matching"). You hit a shortcut and boom, you have half the sentence translated. Now, the other half contains glossary terms that are in a TSV file (or equivalent), here again the search has happened in the background and you are presented with a choice of terms that you can enter with a keybinding. You just need to type the semantic "glue" between the terms.

Et voilà. The "IDE" did all the searches in the background when you started working on a given segment and autocompletion or keybindings give you easy access to what you need to enter.

to...@tuxteam.de

unread,
May 29, 2020, 5:59:19 AM5/29/20
to help-gn...@gnu.org
On Fri, May 29, 2020 at 10:35:09AM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:

[...]

> Wait, don't tell me... let's use... A TEXT FILE?!?
>
> Rivendell Vattnadal
> Shire Fylke
> Strider Vidstige

How do you capture context?

Cheers
-- t
signature.asc

Emanuel Berg

unread,
May 29, 2020, 6:40:30 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> For ex. You translate a sentence in which half is
> already registered as a legacy translation.
> The translation memory engine finds the match in
> the background (no need for you to search it) and
> presents its corresponding translation (that's
> called "fuzzy matching"). You hit a shortcut and
> boom, you have half the sentence translated. Now,
> the other half contains glossary terms that are in
> a TSV file (or equivalent), here again the search
> has happened in the background and you are
> presented with a choice of terms that you can enter
> with a keybinding. You just need to type the
> semantic "glue" between the terms.

OK. I personally don't like that but that's in Emacs,
for sure. abbrev-mode and more advanced yasnippet and
many, many other solutions and packs.

Emanuel Berg

unread,
May 29, 2020, 6:45:09 AM5/29/20
to help-gn...@gnu.org
tomas wrote:

>> Wait, don't tell me... let's use... A TEXT FILE?!?
>>
>> Rivendell Vattnadal
>> Shire Fylke
>> Strider Vidstige
>
> How do you capture context?

I don't know, a header?

"LOTR characters and place names"

Maybe the file name can also be something
descriptive, e.g. lotr-swedish.txt

to...@tuxteam.de

unread,
May 29, 2020, 7:34:49 AM5/29/20
to help-gn...@gnu.org
On Fri, May 29, 2020 at 12:44:18PM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:
> tomas wrote:
>
> >> Wait, don't tell me... let's use... A TEXT FILE?!?
> >>
> >> Rivendell Vattnadal
> >> Shire Fylke
> >> Strider Vidstige
> >
> > How do you capture context?
>
> I don't know, a header?
>
> "LOTR characters and place names"
>
> Maybe the file name can also be something
> descriptive, e.g. lotr-swedish.txt

"time flies like an arrow and fruit flies like banana"

(1) Translate this phrase into Swedish
(2a) How would you characterize the context of the first 'like'
above? Of the second one?
(2b) Write a short Emacs Lisp program which can distinguish
between both
(3) Do the same as (2a) and (2b) for both terms 'flies'

Enjoy ;-P

(Human languages are... interesting)

-- tomás
signature.asc

Emanuel Berg

unread,
May 29, 2020, 7:51:20 AM5/29/20
to help-gn...@gnu.org
tomas wrote:

> "time flies like an arrow and fruit flies like
> banana"
> (1) Translate this phrase into Swedish

That's impossible in a meaningful way if all that is
to be included as here time PASSES BY (går), it
doesn't fly. So maybe "Tiden springer iväg som
långdistanslöpare", as time can here can also "run
away"...

Obviously, every single expressions can't be
translated and still have every word translated to
its exact translation!

Emacs can't do that, CATs can't, no one can in a way
that makes sense.

Here, we care about what CAT _can_ do. Keep a list of
LOTR characters in a text file, send it by mail to
fellow translators in ~/.mailrc, and expand abbrevs
and snippet templates Emacs can already do, sorry.

> (2a) How would you characterize the context of the
> first 'like' above? Of the second one?

That's right, because I didn't know translating
fiction, non-fiction, and manuals involved solving
linguistic-logical puzzles from Chomsky's "Best Of"
multimedia CD-ROM...

Try this:

https://en.wikipedia.org/wiki/Context-sensitive_language

Jean-Christophe Helary

unread,
May 29, 2020, 8:02:16 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list
>> (2a) How would you characterize the context of the
>> first 'like' above? Of the second one?
>
> That's right, because I didn't know translating
> fiction, non-fiction, and manuals involved solving
> linguistic-logical puzzles from Chomsky's "Best Of"
> multimedia CD-ROM...

Indeed :) It's not like translation theory waited for discussions on help-gnu-emacs. Plus, DeepL does a very good job at translating the boring stuff.

Takesi Ayanokoji

unread,
May 29, 2020, 10:28:27 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list
Hi,Marcin

> does anyone here perform translations within Emacs?

I am translating Emacs' info manuals using Emacs' po-mode and Po4a program.

--- "Why Po4a?" starts here.

I use PO format as translation memory because it is major format for *nix
i18n nowadays.

PO format is used gettext's tool-chain programs, but some tools in gettext
is limited for translating messages in various program sources.

These tools process source files includes messages convert/reflect to/from
translation memories.

For this reason, I use Po4a tools for generating po files from files
written in various formats, and vise versa.

--- "Why Po4a?" ends here.

> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous

In the po-mode, strings are highlighted "currently processing msgid".

Msgid is strings before translating.
These strings' range (sentence, paragraph, page, ...) is defined by Po4a
when extracting from original document files, and Po4a semms define msgid
as a paragraph.

For example, belows are 'Anti news' in Emacs manual before and after
translating.

before:
https://raw.githubusercontent.com/ayatakesi/emacs-24.5-doc-emacs/d826fbbb960688a04f46cec4e8e0131d2c39e218/anti.texi.po

after:
https://raw.githubusercontent.com/ayatakesi/emacs-24.5-doc-emacs/b0506307007fc3d36e8168c2f84bd125e97484fe/anti.texi.po

And in po4a, there are many commands operates on a msgid/msgstr.

Best,

Eric Abrahamsen

unread,
May 29, 2020, 1:39:41 PM5/29/20
to Yuri Khan, Emanuel Berg, help-gnu-emacs
Yuri Khan <yuri....@gmail.com> writes:

> On Fri, 29 May 2020 at 15:14, Emanuel Berg via Users list for the GNU
> Emacs text editor <help-gn...@gnu.org> wrote:
>
>> Anyway, what other features do the proprietary
>> CATs have?
>>
>> I always thought translation was just a matter of
>> reading one thing and then typing what it means,
>> looking up the occasional word or phrase for the
>> idiomatic equivalent.
>
> I have not used any professional CATs, but one important function is
> having a vocabulary (also called translation memory).
>
> Imagine translating a novel. When a new character is introduced, you
> have to decide how his/her name is translated and spelled. You need to
> record it so that you’re consistent. Same goes for any names, not just
> of people.
>
> If the translation is a joint effort, that vocabulary needs to be
> shared so that the whole team calls characters the same names.

I'm a translator, primarily of fiction, and do all of it in Emacs,
specifically in Org mode.

I've thought many times over the years about what I would really want an
Emacs-based translation environment to provide for me. I don't do
technical translation, so there's not a whole lot of value in
sentence-by-sentence correspondences. But as Yuri mentions it can be
very useful to keep track of how you've translated certain names, or
certain important terms, in different places throughout the text.
Basically I would want two things:

1. A way to keep track of location correspondences between the source
text and translated text. CAT tool split the text up by sentence, but
that's not very useful for fiction (particularly Chinese->English
translation) because there's rarely a one-to-one correspondence.
There /is/ a more reliable correspondence between paragraphs, though,
and I'd like to know which paragraph equals which. The point would
mostly be to find my place again when I start translating at the
beginning of the day, and to implement a more useful follow-mode. I
imagined this would happen when the mode was turned on: it would run
down the file and insert markers that would be used to find
correspondences. Special characters could be inserted into the file
to indicate that two paragraphs should be joined, or one paragraph
split.
2. Link terms in the translation to a glossary pulled from the original.
This would be character names, places, special terms, etc. They might
not always be translated the same way, but I need to know how I've
handled them earlier in the document. Glossary terms would be
highlighted in the source text, and when you came to the equivalent
spot in the translation, you'd use a command like
insert-translation-term that would prompt for the translation,
offering completion on earlier translations, and then insert that
term into the translated text with a link to the original in the
glossary. There would also be two multi-occur commands: one that
prompted for a translation and showed all the places in the source
text where it came from, and another that did the opposite: prompted
for an original glossary term and showed all the places in the
translation where it was translated.


Anyway, that's what I've been thinking about. Almost no code so far,
though!

Eric

Jean-Christophe Helary

unread,
May 29, 2020, 1:58:21 PM5/29/20
to Eric Abrahamsen, Yuri Khan, help-gnu-emacs, Emanuel Berg


> On May 30, 2020, at 2:39, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:

> I've thought many times over the years about what I would really want an
> Emacs-based translation environment to provide for me. I don't do
> technical translation, so there's not a whole lot of value in
> sentence-by-sentence correspondences.

Most translation tools I know (or I've used professionally) rely on a segmentation scheme set by the user. If the user wants paragraph based segmentation, so be it. What people call "sentence" segmentation is actually a regex based system that takes into account various signs in the source language.

> But as Yuri mentions it can be
> very useful to keep track of how you've translated certain names, or
> certain important terms, in different places throughout the text.
> Basically I would want two things:
>
> 1. A way to keep track of location correspondences between the source
> text and translated text. CAT tool split the text up by sentence,

(not true, see above)

> but
> that's not very useful for fiction (particularly Chinese->English
> translation) because there's rarely a one-to-one correspondence.
> There /is/ a more reliable correspondence between paragraphs, though,
> and I'd like to know which paragraph equals which. The point would
> mostly be to find my place again when I start translating at the
> beginning of the day, and to implement a more useful follow-mode.

I'm not sure I understand what you mean. What's the difficulty that you are facing ?

> I
> imagined this would happen when the mode was turned on: it would run
> down the file and insert markers that would be used to find
> correspondences. Special characters could be inserted into the file
> to indicate that two paragraphs should be joined, or one paragraph
> split.

What would be the use of such a marking ?

> 2. Link terms in the translation to a glossary pulled from the original.
> This would be character names, places, special terms, etc. They might
> not always be translated the same way, but I need to know how I've
> handled them earlier in the document. Glossary terms would be
> highlighted in the source text, and when you came to the equivalent
> spot in the translation, you'd use a command like
> insert-translation-term that would prompt for the translation,
> offering completion on earlier translations, and then insert that
> term into the translated text with a link to the original in the
> glossary. There would also be two multi-occur commands: one that
> prompted for a translation and showed all the places in the source
> text where it came from, and another that did the opposite: prompted
> for an original glossary term and showed all the places in the
> translation where it was translated.

Very nice ideas.

Eric Abrahamsen

unread,
May 29, 2020, 2:23:05 PM5/29/20
to Jean-Christophe Helary, help-gnu-emacs, Emanuel Berg, Yuri Khan
Jean-Christophe Helary <jean.christ...@traduction-libre.org>
writes:

>> On May 30, 2020, at 2:39, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>
>> I've thought many times over the years about what I would really want an
>> Emacs-based translation environment to provide for me. I don't do
>> technical translation, so there's not a whole lot of value in
>> sentence-by-sentence correspondences.
>
> Most translation tools I know (or I've used professionally) rely on a
> segmentation scheme set by the user. If the user wants paragraph based
> segmentation, so be it. What people call "sentence" segmentation is
> actually a regex based system that takes into account various signs in
> the source language.

Okay, that's good to know. I guess I would just set it to split by
paragraph, but would also like manual control in some cases.

>> But as Yuri mentions it can be
>> very useful to keep track of how you've translated certain names, or
>> certain important terms, in different places throughout the text.
>> Basically I would want two things:
>>
>> 1. A way to keep track of location correspondences between the source
>> text and translated text. CAT tool split the text up by sentence,
>
> (not true, see above)
>
>> but
>> that's not very useful for fiction (particularly Chinese->English
>> translation) because there's rarely a one-to-one correspondence.
>> There /is/ a more reliable correspondence between paragraphs, though,
>> and I'd like to know which paragraph equals which. The point would
>> mostly be to find my place again when I start translating at the
>> beginning of the day, and to implement a more useful follow-mode.
>
> I'm not sure I understand what you mean. What's the difficulty that you are facing ?
>
>> I
>> imagined this would happen when the mode was turned on: it would run
>> down the file and insert markers that would be used to find
>> correspondences. Special characters could be inserted into the file
>> to indicate that two paragraphs should be joined, or one paragraph
>> split.
>
> What would be the use of such a marking ?

A follow-mode, as I mentioned above. And just finding my place. I do my
translation in two sibling Org sub-trees, original and translation,
displayed in two side-by-side windows. I don't want to mess with
two-column-mode or anything like that. I want to be able to go to the
bottom of the translation, run a command, and have the second window
display the corresponding original. If I realize I've done something
wrong a couple of chapters previously, and I skip back up to that
location in the translation, I want to run the same command to display
the corresponding spot in the original.

>> 2. Link terms in the translation to a glossary pulled from the original.
>> This would be character names, places, special terms, etc. They might
>> not always be translated the same way, but I need to know how I've
>> handled them earlier in the document. Glossary terms would be
>> highlighted in the source text, and when you came to the equivalent
>> spot in the translation, you'd use a command like
>> insert-translation-term that would prompt for the translation,
>> offering completion on earlier translations, and then insert that
>> term into the translated text with a link to the original in the
>> glossary. There would also be two multi-occur commands: one that
>> prompted for a translation and showed all the places in the source
>> text where it came from, and another that did the opposite: prompted
>> for an original glossary term and showed all the places in the
>> translation where it was translated.
>
> Very nice ideas.

Maybe this will inspire me to write some code! The nice thing about the
glossary is that it wouldn't have to just be vocabulary. You could just
as easily use it for "every time the car crash is referenced", or
something like that. You'd just have to manually mark the passage in the
original, rather than automated marking by text search.

Emanuel Berg

unread,
May 29, 2020, 6:48:41 PM5/29/20
to help-gn...@gnu.org
Giovanni Bono wrote:

> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
> [...]

:O

Certainly not the way I'm used to see Emacs Lisp...

But as they say, styles make fights!

Gene

unread,
May 29, 2020, 7:17:11 PM5/29/20
to
On Friday, May 29, 2020 at 1:55:39 AM UTC-4, Marcin Borkowski wrote:
> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that?

Funny you should ask. I just overcame a years-long obstacle with my personal attempts to exploit org-modes ability to evaluate code blocks.

Though most people don't consider so-called `natural' language -- EG no less `artificial' or `man made' than so-called `artificial' languages, such as programming languages, math notations, and such -- org-mode's assortment of notations permitted in code blocks is `translate' which allows google translate to be used to translate the natural language `code'

to use `translate' as a type of code within #+begin_src tranlate and #end_src parenthetical book ends a few details must be addressed:

(package-activate 'ob-translate )

AND

(org-babel-do-load-languages
'org-babel-load-languages
'(
; (eshell . t) ; commented out here to indicate that ob-translate is the active ingredient

; (emacs-lisp . t) ; only ob-translate is the active ingredient

(ob-translate . t)
) ;
)

From this juncture one should be ready to experiment.

Here's an experiment I conducted:

#+begin_src translate
acta deos numquam mortalia fallunt
#+end_src

#+RESULTS:
: Mortal deeds never deceive the gods



I hope this helps those perhaps only interested-in or requiring relatively simple, unsophisticated translation or perhaps lacking sophisticated programming abilities ... perhaps both.

Cheers

Emanuel Berg

unread,
May 29, 2020, 9:34:10 PM5/29/20
to help-gn...@gnu.org
Can't we compile a list of what the commercial CATs
offer? M Helary and Mr Abrahamsen?

I'll read thru this thread tomorrow (today)
God willing but I don't understand everything, in
particular examples would nice to get the exact
meaning of the desired functionality...

With examples we can also see if Emacs already can do
it. And if not: Elisp contest :)

Some features are probably silly, we don't have to
list or do them, or everything in the CATs, just what
really makes sense and is useful on an
every-day basis.

When we are done, we put it in the wiki or in a pack.

We can't have that Emacs doesn't have a firm grip on
this issue. Because translation is a very common task
with text!

Also, let's compile a list of what Emacs already has
to this end. It doesn't matter if some of that stuff
already appears somewhere else, modularity is
our friend.

Pepp pepp :)

Jean-Christophe Helary

unread,
May 29, 2020, 11:12:23 PM5/29/20
to Emanuel Berg, help-gn...@gnu.org


> On May 30, 2020, at 10:33, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Can't we compile a list of what the commercial CATs
> offer? M Helary and Mr Abrahamsen?

x commercial → ○ professional, if you don't mind :)
OmegaT is very much a professional tool and certainly not a "commercial" one.


My 20 years of practice but otherwise not technically so very informed idea is the following:


1) CAT tools extract translatable contents from various file formats into an easy-to-handle format, and put the translated contents back into the original format. That way the translator does not have to worry *too much* about the idiosyncrasies of the original format.

→ File filters are a core part of a CAT tool *but* as was suggested in the thread it is possible to rely on an external filter that will output contents in a standard localization "intermediate" format (current "industry" standards are PO and XLIFF). Such filters provide export and import functions so that the translated files are converted back to the original format.

File filters can also accept rules for not outputting non-translatable text (the current standard is ITS)

The PO format can be handled by po4a (perl), translate-toolkit (python) and the Okapi Framework tools (java).
XLIFF has the Okapi Framework, OpenXLIFF (electron/node) and the translate-toolkit. All are top-notch pro-grade free software and in the case of Okapi and OpenXLIFF have been developed by people who have participated to the standardization process (XLIFF/TMX/SRX/ITS/TBX, etc...)

→ emacs could rely on such external filters and only specialize in one "intermediate" format. The po-mode already does that for PO files.


2) Once the text is extracted, it needs to be segmented. Basic "no" segmentation usually means paragraph based segmentation. Paragraphs are defined differently depending on the original format (1, or 2 line breaks for a text file, a block tag for XML-based formats, etc.).
Fine-grained segmentation is obtained by using a set of native language based regex that includes break rules and no-break rules. A simple example is break after a "period followed by a space" but don't break after "Mr. " for English.

→ File filters usually handle the segmentation part based on user specifications. Once the file is segmented into the intermediate format, it is not structurally trivial to "split" or "merge" segments because the tool needs to remember what will go back into the original file structure.

→ emacs could rely on the external filters to handle the segmentation.


3) The real strength of a CAT tool shows where it helps the translator handle all the resources needed in the translation. Let me list potential resources:

- Legacy translations, called "translation memories" (TM), usually in multilingual "aligned" files where a given segment has equivalents in various languages. Translated PO files are used as TMs, the XML standard is TMX.

- Glossaries, usually in a similar but simpler format, sometimes only TSV, sometimes CSV, the XML-based standard is TBX.

- Internal translations, which are produced by the translator while translating. Each translated segment adding to the project "memory".

- Dictionaries are a more global form of glossaries, usually monolingual, format varies.

- external files, either local documents, or web documents, in various formats, usually monolingual (otherwise they'd be aligned and used as TMs)

→ each resource format needs a way to be parsed, memorized, fetched, recycled efficiently during the translation


4) Usually the process is the following:

- the translator "enters" a segment
- the tool displays "matches" from the resources that relatively closely correspond to the segment contents
- the translator inserts or modifies the matches
- when no matches are produced the translator enters a translation from scratch
- the translator can add glossary items to the project glossary
- the new translation is added to the "internal" memory set
- the translator moves to the next segment


5) The matching is usually some sort of levenstein distance-based algorithm. The "tokens" that are used in the "distance" calculation are usually produced by native language based tokenizers (the Lucene tokenizers are quite popular)

The better the match, the more efficient the tool is at helping the translator recycle resources. The matching process/quality is where tools profoundly differ (OmegaT is generally considered to have excellent quality matches, sometimes better than expensive commercial tools).

Some tools propose "context" matches where the previous and next segments are also taken into account, some tools propose "subsegment" matches where even if a whole segment won't match significant subparts can, etc.

The matching process must sometimes apply to extremely big resources (like many million lines of multilingual TMs in the case of the EU legal corpora) and must thus be able to handle the data quickly regardless of the set size.


6) Goodies that are time savers include:

- history based autocompletion
- glossary/TM/dictionary based autocompletion
- MT services access
- shortcuts that auto insert predefined text chunks
- spell-checking/grammar checking
- QA checks against glossary terms, completeness/length of the translation, integrity of the format structure, numbers used, etc. (QA checks are also available as external processes in some of the solutions mentioned above, or related solutions.)


> I'll read thru this thread tomorrow (today)
> God willing but I don't understand everything, in
> particular examples would nice to get the exact
> meaning of the desired functionality...

Go ahead if you have questions.

> With examples we can also see if Emacs already can do
> it. And if not: Elisp contest :)

:)

> Some features are probably silly, we don't have to
> list or do them, or everything in the CATs, just what
> really makes sense and is useful on an every-day basis.

A lot of the heavy-duty tasks can be handled by external processes.

> When we are done, we put it in the wiki or in a pack.
>
> We can't have that Emacs doesn't have a firm grip on
> this issue. Because translation is a very common task
> with text!
>
> Also, let's compile a list of what Emacs already has
> to this end. It doesn't matter if some of that stuff
> already appears somewhere else, modularity is
> our friend.

:)

Jean-Christophe Helary

unread,
May 29, 2020, 11:20:31 PM5/29/20
to Eric Abrahamsen, help-gnu-emacs, Emanuel Berg, Yuri Khan


> On May 30, 2020, at 3:22, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>
>>> I
>>> imagined this would happen when the mode was turned on: it would run
>>> down the file and insert markers that would be used to find
>>> correspondences. Special characters could be inserted into the file
>>> to indicate that two paragraphs should be joined, or one paragraph
>>> split.
>>
>> What would be the use of such a marking ?
>
> A follow-mode, as I mentioned above.

Is such a mode in emacs ?

> And just finding my place. I do my
> translation in two sibling Org sub-trees, original and translation,
> displayed in two side-by-side windows. I don't want to mess with
> two-column-mode or anything like that.

:) I'm not sure anybody uses that anymore. But it must have been big when it started because it get the F2 key assigned by default...

> I want to be able to go to the
> bottom of the translation, run a command, and have the second window
> display the corresponding original. If I realize I've done something
> wrong a couple of chapters previously, and I skip back up to that
> location in the translation, I want to run the same command to display
> the corresponding spot in the original.

I seem to remember a long discussion about bookmarks here or on devel a while ago. Did you consider that ?

Giovanni Bono

unread,
May 30, 2020, 2:21:04 AM5/30/20
to help-gn...@gnu.org
Emanuel Berg via Users list for the GNU Emacs text editor
<help-gn...@gnu.org> writes:

> Giovanni Bono wrote:
>
>> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
>> [...]
>
> :O
>
> Certainly not the way I'm used to see Emacs Lisp...
>
> But as they say, styles make fights!

If you are referring to variable scoping, I think it makes no actual
difference to the code I posted. I was probably just experimenting,

Giovanni


Eli Zaretskii

unread,
May 30, 2020, 2:25:00 AM5/30/20
to help-gn...@gnu.org
> From: Jean-Christophe Helary <jean.christ...@traduction-libre.org>
> Date: Sat, 30 May 2020 12:20:14 +0900
> Cc: help-gnu-emacs <help-gn...@gnu.org>, Emanuel Berg <moase...@zoho.eu>,
> Yuri Khan <yuri....@gmail.com>
>
> > A follow-mode, as I mentioned above.
>
> Is such a mode in emacs ?

Yes, it is. Type "C-h f follow-mode RET".

Jean-Christophe Helary

unread,
May 30, 2020, 2:36:22 AM5/30/20
to Eli Zaretskii, help-gn...@gnu.org
Ok, big confusion with the packages. I search in list-packages for a "follow-mode" and obvisouly did not find anything...

Eric Abrahamsen

unread,
May 30, 2020, 12:46:21 PM5/30/20
to Jean-Christophe Helary, help-gnu-emacs, Emanuel Berg, Yuri Khan
Jean-Christophe Helary <jean.christ...@traduction-libre.org>
writes:

>> On May 30, 2020, at 3:22, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>>
>>>> I
>>>> imagined this would happen when the mode was turned on: it would run
>>>> down the file and insert markers that would be used to find
>>>> correspondences. Special characters could be inserted into the file
>>>> to indicate that two paragraphs should be joined, or one paragraph
>>>> split.
>>>
>>> What would be the use of such a marking ?

[...]

>> I want to be able to go to the
>> bottom of the translation, run a command, and have the second window
>> display the corresponding original. If I realize I've done something
>> wrong a couple of chapters previously, and I skip back up to that
>> location in the translation, I want to run the same command to display
>> the corresponding spot in the original.
>
> I seem to remember a long discussion about bookmarks here or on devel
> a while ago. Did you consider that ?

The only part of this code I ever actually wrote used bookmarks to save
where I was at the end of the work day. But usually you just save one
bookmark per file, indicating "where you are" in the file. That's a
different concern than splitting the two texts into segments, and
recording correspondences between segments in the texts. If you
segmented a whole novel by sentences, and then saved a bookmark per
sentence, I'm sure it would cause something to catch on fire.

At first I thought I'd run through the text when the mode was turned on,
insert a whole bunch of markers, then keep a list of marker-pairs. That
seemed like it would be hard to keep properly in sync, though, so now
I'm thinking of running through the text and actually inserting
separator characters, perhaps #x1f, either making them invisible or
putting some other nice display on them. That makes it easier to sync,
and has the advantage that it persists to disk and you only have to do
the major parsing once. Then strip them out during export.

Anyway, still experimenting...

Marcin Borkowski

unread,
May 31, 2020, 1:10:30 AM5/31/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-05-29, at 10:14, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Can't you do ecat-highlight-next-sentence and
> ecat-highlight-previous-sentence by just moving point
> to the next sentence and then do
> ecat-highlight-this-sentence? Feels more natural...

That would completely defeat the purpose. All CAT stuff (as many people
already told) is about efficiency. One of the main points of my (very
simple) code is that I do not have to move point anywhere.

> Anyway, what other features do the proprietary
> CATs have?
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

Well, you already got your answers, but let me stress that one of the
important points is extracting text from some strange formats and
putting the translation back into it. Think Word documents with complex
formatting, or HTML with many tags/attributes. If you are to translate
things like

<p class="important-instruction">Click the <span
class="dancing-elephants">big red button</span> to launch the nuke</p>

and all the markup has to be there in the translation, you really don't
want to type it by hand - it's time-consuming and error-prone.

> Some idiomatic phrases are pitfalls tho. For example
> the English "more or less" looks like the Swedish
> "mer eller mindre" (which means "correct but with
> room for fine details") but the way native speakers
> use it seems to be more (?) "både och" which means
> discussion can go both (disparate) ways and BOTH
> are correct!
>
> So perhaps one could have a list of these "trap
> phrases" so when they turn up in the text, they are
> highlighted to indicate "watch out! we are not just
> piling words here!"
>
> Who'd compile that list is another matter...

I guess this is a very minor problem...

Best,

--
Marcin Borkowski
http://mbork.pl

Marcin Borkowski

unread,
May 31, 2020, 1:15:11 AM5/31/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-05-29, at 13:51, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Here, we care about what CAT _can_ do. Keep a list of
> LOTR characters in a text file, send it by mail to
> fellow translators in ~/.mailrc, and expand abbrevs
> and snippet templates Emacs can already do, sorry.

Extremely inefficient, error-prone and irritating. Again: CATs are
about automating reetitive things.

How would you expect people to know which mail contains which (version
of) the file? How would you expect people to know which file (out of
several dozen or more) to look at to find something? How would you
expect people to remember the names of the snippets? Etc., etc., etc.

Marcin Borkowski

unread,
May 31, 2020, 1:18:07 AM5/31/20
to Giovanni Bono, help-gn...@gnu.org

On 2020-05-29, at 17:02, Giovanni Bono <giovan...@unimi.it> wrote:

> Marcin Borkowski <mb...@mbork.pl> writes:
>
>> Hi all,
>>
>> does anyone here perform translations within Emacs? Do you know of any
>> tools facilitating that? There exist a few CATs, or Computer Aided
>> Translation systems, but - AFAIK - they are all proprietary and closed
>> source. Emacs seems capable of implementing at least a simple CAT, but
>> I could not find any existing solutions for that. (I skimmed through
>> the answers here:
>> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
>> but did not find anything useful.)
>>
>> The first thing I would need is a way to highlight the "currently
>> translated sentence" in the other window, where I would keep the
>> original text, with an easy way to highlight the next/previous one -
>> this seems very easy to do, but did anyone actually code anything like
>> this?
>>
>> TIA,
>
> hello Marcin,
>
> I translated a few books, a few years ago, using Emacs as a simple CAT.
> Here is a screenshot of the last iteration:
>
>
> On the left there are three windows with translated, current, an next
> sentences from the source text. Central windows are for translated and
> current sentences, and the bottom central window is for current word.
> The right window is for statistics, and (not shown here) Wordnet
> (/usr/bin/wn) lookup.
>
> The idea is to have some words (in bold in the sreenshot) that are
> controlled, so that while translating them you can keep trace of all
> other occurencies and prior translations. So every word in the source
> material need to be indexed and referenced to a (possibly empty) word in
> the ongoing translation.
>
> Work happens in the very central frame, where words are presented
> untranslated at first, and you can move them around or substitute them
> with prior or new (including empty) translations. After a while, it
> gets fast.
>
> I am attaching the code. Most of it is a painful and messy tratment of
> the publisher markup, and all of it is intended for personal use and for
> the particular book I was translating. But maybe you can adapt some of
> it to your needs. Regards,

Thanks a lot! This looks pretty impressive. If only I had time to
analyze yor code ATM...

I'll look into it one day, though!

Giovanni Bono

unread,
May 31, 2020, 2:58:42 AM5/31/20
to help-gn...@gnu.org
Sure! If you want to try it instead, I could send you the data (the
manuscript) off list. Then it is just a matter of loading the file (it
works on a recent Emacs), typing ‘M-x gi/roth/startup’ and trying the
keybindings (commented in the code). For that you would need a large
enough frame (240 columns x 60 rows), cause unfortunately windows
splitting is hardcoded. Regards,

Giovanni


Steinar Bang

unread,
Jun 1, 2020, 4:26:31 AM6/1/20
to help-gn...@gnu.org
>>>>> Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org>:

(so those are the names of the Swedish translation, interesting)

> Rivendell Vattnadal

(can't remember)
("Kløvendal" according to google)

> Shire Fylke

Syssel (which is a little ironic considering the Swedish translation...)

> Strider Vidstige

Vidvandre

(I've always rather preferred the Norwegian version of Isengard: Jarnagard)


Emanuel Berg

unread,
Jun 1, 2020, 4:33:18 AM6/1/20
to help-gn...@gnu.org
Steinar Bang wrote:

> (so those are the names of the Swedish translation,
> interesting)

Right, there are to my knowledge two translations of
LOTR and they are not similar in details. This is
from the first one, by Åke Ohlmaks. He was
a ctroversial figure and so was his translation, but
many people like it.

>> Rivendell Vattnadal
>
> (can't remember) ("Kløvendal" according to google)
>
>> Shire Fylke
>
> Syssel (which is a little ironic considering the
> Swedish translation...)
>
>> Strider Vidstige
>
> Vidvandre
>
> (I've always rather preferred the Norwegian version
> of Isengard: Jarnagard)

Heh, cool :)

Emanuel Berg

unread,
Jun 1, 2020, 4:39:49 AM6/1/20
to help-gn...@gnu.org
Giovanni Bono wrote:

>> Giovanni Bono wrote:
>>
>>> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
>>> [...]
>>
>> :O
>>
>> Certainly not the way I'm used to see Emacs
>> Lisp...
>>
>> But as they say, styles make fights!
>
> If you are referring to variable scoping, I think
> it makes no actual difference to the code I posted.
> I was probably just experimenting,

No no, just the whole style... it is just everything
at once I guess. You are skilled and used to write
code like that, no doubt.

Emanuel Berg

unread,
Jun 1, 2020, 4:51:39 AM6/1/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

>> Can't you do ecat-highlight-next-sentence and
>> ecat-highlight-previous-sentence by just moving
>> point to the next sentence and then do
>> ecat-highlight-this-sentence? Feels more
>> natural...
>
> That would completely defeat the purpose. All CAT
> stuff (as many people already told) is about
> efficiency. One of the main points of my (very
> simple) code is that I do not have to move
> point anywhere.

Because its inefficient? ... you are a fast
translator :)

but ... OK.

Only it still looks wierd with the same code two
extra times.

> Well, you already got your answers, but let me
> stress that one of the important points is
> extracting text from some strange formats and
> putting the translation back into it. Think Word
> documents with complex formatting, or HTML with
> many tags/attributes. If you are to translate
> things like
>
> <p class="important-instruction">Click the <span
> class="dancing-elephants">big red button</span> to
> launch the nuke</p>
>
> and all the markup has to be there in the
> translation, you really don't want to type it by
> hand - it's time-consuming and error-prone.

Well, that's a task for a parser rather to convert
between one format to another... very mechanical
and easy.

>> Some idiomatic phrases are pitfalls tho.
>> For example the English "more or less" looks like
>> the Swedish "mer eller mindre" (which means
>> "correct but with room for fine details") but the
>> way native speakers use it seems to be more (?)
>> "både och" which means discussion can go both
>> (disparate) ways and BOTH are correct! So perhaps
>> one could have a list of these "trap phrases" so
>> when they turn up in the text, they are
>> highlighted to indicate "watch out! we are not
>> just piling words here!" Who'd compile that list
>> is another matter...
>
> I guess this is a very minor problem...

It depends how many there are... should be systemized
just as everything unusual is in any trade.
Easy thing to do with huge gain and possible to build
on and extend...

Emanuel Berg

unread,
Jun 1, 2020, 4:55:12 AM6/1/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> How would you expect people to know which mail
> contains which (version of) the file? How would you
> expect people to know which file (out of several
> dozen or more) to look at to find something?
> How would you expect people to remember the names
> of the snippets? Etc., etc., etc.

I know, it is soo difficult! Sometimes I wonder how
I even manage to use my computer...