Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Emacs as a translator's tool

83 views
Skip to first unread message

Marcin Borkowski

unread,
May 29, 2020, 1:55:39 AM5/29/20
to Help Gnu Emacs mailing list
Hi all,

does anyone here perform translations within Emacs? Do you know of any
tools facilitating that? There exist a few CATs, or Computer Aided
Translation systems, but - AFAIK - they are all proprietary and closed
source. Emacs seems capable of implementing at least a simple CAT, but
I could not find any existing solutions for that. (I skimmed through
the answers here:
https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
but did not find anything useful.)

The first thing I would need is a way to highlight the "currently
translated sentence" in the other window, where I would keep the
original text, with an easy way to highlight the next/previous one -
this seems very easy to do, but did anyone actually code anything like
this?

TIA,

--
Marcin Borkowski
http://mbork.pl

stardiviner

unread,
May 29, 2020, 2:22:09 AM5/29/20
to Marcin Borkowski, help-gn...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


If you're using Org Mode, then I'm trying to write an extension based on Org
Babel and google-translate to generate translated results.

Here is my ob-translate repo: https://github.com/stardiviner/ob-translate

Also, some links which might helpful for you.

- - https://github.com/atykhonov/google-translate
- - http://github.com/juergenhoetzel/babel
- - https://github.com/liShiZhensPi/baidu-translate
- - https://github.com/jcs-elpa/define-it
- --
[ stardiviner ]
I try to make every word tell the meaning that I want to express.

Blog: https://stardiviner.github.io/
IRC(freenode): stardiviner, Matrix: stardiviner
GPG: F09F650D7D674819892591401B5DF1C95AE89AC3

-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QqgQUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsMEQAgAu7XSOd5MK+quzPWRQ5OxyKJ73Mx5
57jfm3j7Eo5UVlZEJyXt6LPoh1g82ORUGhhREADD2Q63/BjX3xwKPKhhSKLwZidl
EdkA+NmqAyFeXb/85nmIF4UTYxkcJQyCYXwBuiHi1Dx8CojBysi4m7w+xDzPGwwC
Q/9EBdVapnqRzILwMHqV8HMZddtaWbaLYcBMB2BRE3y/GyaTtvB85aPWyDufMLTP
3XMde6NxguJvJ7fGdy0n6QTEvYDT2QvxecWWLEL349jwkviiarlClf8HEn+ICPqz
OVG+Fg2y9k3XAUg27JV3tpm/OrF7v3/E8cc0k6NzUdvFmLft/wn4m0ENAA==
=lwdy
-----END PGP SIGNATURE-----

Marcin Borkowski

unread,
May 29, 2020, 2:35:58 AM5/29/20
to Help Gnu Emacs mailing list

On 2020-05-29, at 07:55, Marcin Borkowski <mb...@mbork.pl> wrote:

> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source. Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)
>
> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous one -
> this seems very easy to do, but did anyone actually code anything like
> this?

OK, so I assumed nobody did it, so here's my take. Probably not
extremely well-done, but I just coded it in 15 minutes, so there you go.
Comments welcome.

--8<---------------cut here---------------start------------->8---
(defface ecat-highlight-face '((t :background "#e7ede7"))
"Face for highlighting the currently translated sentence.")

(defvar ecat-sentence-overlay nil
"The overlay to highlight the currently translated sentence.")

(defun ecat-highlight-this-sentence ()
"Highlight the sentence at point using an overlay."
(interactive)
(delete-overlay ecat-sentence-overlay)
(save-excursion
(let ((sentence-end (progn (forward-sentence)
(point)))
(sentence-beginning (progn (backward-sentence)
(point))))
(setq ecat-sentence-overlay
(make-overlay sentence-beginning sentence-end))))
(overlay-put ecat-sentence-overlay 'face 'ecat-highlight-face))

(defun ecat-highlight-next-sentence ()
"Move the highlight to the next sentence."
(interactive)
(save-excursion
(set-buffer (overlay-buffer ecat-sentence-overlay))
(goto-char (overlay-end ecat-sentence-overlay))
(let ((sentence-end (progn (forward-sentence)
(point)))
(sentence-beginning (progn (backward-sentence)
(point))))
(move-overlay ecat-sentence-overlay sentence-beginning sentence-end))))

(defun ecat-highlight-previous-sentence ()
"Move the highlight to the previous sentence."
(interactive)
(save-excursion
(set-buffer (overlay-buffer ecat-sentence-overlay))
(goto-char (overlay-start ecat-sentence-overlay))
(let ((sentence-beginning (progn (backward-sentence)
(point)))
(sentence-end (progn (forward-sentence)
(point))))
(move-overlay ecat-sentence-overlay sentence-beginning sentence-end))))

(defun ecat-disable-sentence-highlighting ()
"Disable sentence highlighting."
(interactive)
(delete-overlay ecat-sentence-overlay))
--8<---------------cut here---------------end--------------->8---

Best,

MENGUAL Jean-Philippe

unread,
May 29, 2020, 2:39:40 AM5/29/20
to help-gn...@gnu.org
Hi,

I mainly use, in Emacs, the po-mode (gettext-el). I still have the
problem I described here, i.e. I would love the "Last-translator" to be
up-to-date automatically with my info in the PO files, but except this,
requiring me to do things manually, I like how it works. Also you need
to have in your .emacs the po-wrap function, to ensure the file stays on
the screen 80 columns.

Regards


Logo Hypra JEAN-PHILIPPE MENGUAL
DIRECTEUR TECHNIQUE ET QUALITÉ
102, rue des poissonniers, 75018, Paris
Tel : +331 84 73 06 61 <tel:+33184730661> Mob : +336 76 34 93 37
<tel:+33676349337>
jpme...@hypra.fr <mailto:jpme...@hypra.fr>
www.hypra.fr <http://www.hypra.fr/>
Facebook Hypra <https://www.facebook.com/hyprasoftware/> Twitter Hypra
<https://twitter.com/Hypra_> Linkedin Jean-Philippe
<https://fr.linkedin.com/in/jean-philippe-mengual-800133135>



Le 29/05/2020 à 07:55, Marcin Borkowski a écrit :
> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source. Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)
>
> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous one -
> this seems very easy to do, but did anyone actually code anything like
> this?
>
> TIA,
>

Jean-Christophe Helary

unread,
May 29, 2020, 2:57:21 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list
Marcin,

> On May 29, 2020, at 14:55, Marcin Borkowski <mb...@mbork.pl> wrote:
>
> Hi all,
>
> does anyone here perform translations within Emacs?

Yes, sometimes.

> Do you know of any
> tools facilitating that? There exist a few CATs, or Computer Aided
> Translation systems, but - AFAIK - they are all proprietary and closed
> source.

No. OmegaT is very much GPL and is listed in the Free Software directory. Java based and has recently shifted from using Oracle to AdoptOpenJDK 11.

> Emacs seems capable of implementing at least a simple CAT, but
> I could not find any existing solutions for that. (I skimmed through
> the answers here:
> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
> but did not find anything useful.)

As Jean-Philippe mentions po-mode exists, even if limited in scope.


--
Jean-Christophe Helary @brandelune
http://mac4translators.blogspot.com


Jean-Christophe Helary

unread,
May 29, 2020, 4:03:41 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list


> On May 29, 2020, at 15:57, Jean-Christophe Helary <jean.christ...@traduction-libre.org> wrote:
>
> Marcin,
>
>> On May 29, 2020, at 14:55, Marcin Borkowski <mb...@mbork.pl> wrote:
>>
>> Hi all,
>>
>> does anyone here perform translations within Emacs?
>
> Yes, sometimes.
>
>> Do you know of any
>> tools facilitating that? There exist a few CATs, or Computer Aided
>> Translation systems, but - AFAIK - they are all proprietary and closed
>> source.
>
> No. OmegaT is very much GPL and is listed in the Free Software directory. Java based and has recently shifted from using Oracle to AdoptOpenJDK 11.

In fact, the reason why I came (back) to emacs in the first place is, OmegaT...

I love OmegaT. I created the user support list in 2004 and I've been involved with it since 2002.

But I thought that instead of having a translation memory system in which editor functions were added, maybe having a text editor to which translation memory matching was added would be more efficient. That was my pipe dream them.

So, all that happened in 2003-2004 with the big Common Lisp revival, when Peter Seibel published Practical Common Lisp, when Slime was all over the place, when Bill Clementson had his amazing blog on what could be done with emacs / common lisp and slime, etc.

And I thought to myself that since emacs was a lisp environment, why not see what it's all about ? (There was also a Mac application, Alpaca I think, that was basically a text editor with CL inside).

Notice that in 15 years I have not made 1 inch of progress (or maybe just one, I can understand what goes wrong in my init file :). But at least I'm still around and I like it :)

OmegaT has evolved so much now that it has become one of the mainstream CAT tools (even if "market share" is not high at all). It is used in the EU Translation Bureau. It serves in universities to teach students basic CAT concepts, and it also works well with the Okapi Framework tools that are also Java based. And has a very friendly *multilingual* community available in pretty much all the time zones.

Emanuel Berg

unread,
May 29, 2020, 4:14:33 AM5/29/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> OK, so I assumed nobody did it, so here's my take.
> Probably not extremely well-done, but I just coded
> it in 15 minutes, so there you go.
> Comments welcome.

Byte-compile is your first stop for code comments:

Warning: defface for ‘ecat-highlight-face’ fails
to specify containing group

Warning: Use ‘with-current-buffer’ rather than
save-excursion+set-buffer

> (defun ecat-highlight-this-sentence () [...]
> (defun ecat-highlight-next-sentence () [...]
> (defun ecat-highlight-previous-sentence ()

Can't you do ecat-highlight-next-sentence and
ecat-highlight-previous-sentence by just moving point
to the next sentence and then do
ecat-highlight-this-sentence? Feels more natural...

Anyway, what other features do the proprietary
CATs have?

I always thought translation was just a matter of
reading one thing and then typing what it means,
looking up the occasional word or phrase for the
idiomatic equivalent.

Some idiomatic phrases are pitfalls tho. For example
the English "more or less" looks like the Swedish
"mer eller mindre" (which means "correct but with
room for fine details") but the way native speakers
use it seems to be more (?) "både och" which means
discussion can go both (disparate) ways and BOTH
are correct!

So perhaps one could have a list of these "trap
phrases" so when they turn up in the text, they are
highlighted to indicate "watch out! we are not just
piling words here!"

Who'd compile that list is another matter...

Good idea BTW :)

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Emanuel Berg

unread,
May 29, 2020, 4:22:41 AM5/29/20
to help-gn...@gnu.org
MENGUAL Jean-Philippe wrote:

> I mainly use, in Emacs, the po-mode (gettext-el).
> I still have the problem I described here, i.e.
> I would love the "Last-translator" to be up-to-date
> automatically with my info in the PO files, but
> except this, requiring me to do things manually,
> I like how it works. Also you need to have in your
> .emacs the po-wrap function, to ensure the file
> stays on the screen 80 columns.

Say what? :)

Yuri Khan

unread,
May 29, 2020, 4:29:26 AM5/29/20
to Emanuel Berg, help-gnu-emacs
On Fri, 29 May 2020 at 15:14, Emanuel Berg via Users list for the GNU
Emacs text editor <help-gn...@gnu.org> wrote:

> Anyway, what other features do the proprietary
> CATs have?
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

I have not used any professional CATs, but one important function is
having a vocabulary (also called translation memory).

Imagine translating a novel. When a new character is introduced, you
have to decide how his/her name is translated and spelled. You need to
record it so that you’re consistent. Same goes for any names, not just
of people.

If the translation is a joint effort, that vocabulary needs to be
shared so that the whole team calls characters the same names.

Marcin Borkowski

unread,
May 29, 2020, 4:29:41 AM5/29/20
to MENGUAL Jean-Philippe, help-gn...@gnu.org

On 2020-05-29, at 08:39, MENGUAL Jean-Philippe <mengual...@free.fr> wrote:

> Hi,
>
> I mainly use, in Emacs, the po-mode (gettext-el). I still have the
> problem I described here, i.e. I would love the "Last-translator" to
> be up-to-date automatically with my info in the PO files, but except
> this, requiring me to do things manually, I like how it works. Also
> you need to have in your .emacs the po-wrap function, to ensure the
> file stays on the screen 80 columns.

Thanks for the response. Do I get it correctly that you mean
translating software, so mainly short, possibly unrelated pieces of
text? If so, this seems pretty different to translating a paper or
a book...

Emanuel Berg

unread,
May 29, 2020, 4:30:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> OmegaT is very much GPL and is listed in the Free
> Software directory. Java based and has recently
> shifted from using Oracle to AdoptOpenJDK 11.

Indeed, its in the Debian repos:

$ aptitude show omegat
[...]
Description: Computer Assisted Translation (CAT) tool
OmegaT's main features are
* multiple source texts handling, retaining
complex folder hierarchies
* fuzzy matching with other segments in the
source file(s) or TMX files from previous
projects
* easy glossary terms management
* flexible regex-based sentence segmenting
(using an SRX-like method)
* powerful regex-based searches along with the
facility to apply a filter to display search
results in the editor
* ability to batch process documents from the
command line
* extended project statistics
* easy-to-understand documentation and tutorial
* plugin architecture with separate Lucene
stemmer (recognition of inflected forms) and
LanguageTool (style and grammar checker)
plugins
* integration with Hunspell for spelling
checking
* simple API to access source/target/selection
textual data

OmegaT supports 24 formats, including
documentation formats such as OpenDocument, Open
XML (MS Office 2007), DocBook and (x)HTML, and
also localization formats such as Java
properties and PO files. An Okapi plugin can
further extend the supported formats, for
example to include TTX (TradosTag).
Homepage: https://www.omegat.org

Heh, Java people: "ability to batch process documents
from the command line" :)

Emanuel Berg

unread,
May 29, 2020, 4:35:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> In fact, the reason why I came (back) to emacs in
> the first place is, OmegaT...
>
> I love OmegaT. I created the user support list in
> 2004 and I've been involved with it since 2002.

Be sure to add it to Gmane!

https://gmane.io

> But I thought that instead of having a translation
> memory system in which editor functions were added,
> maybe having a text editor to which translation
> memory matching was added would be more efficient.
> That was my pipe dream them.

What's a translation memory system/matching? :O

Emanuel Berg

unread,
May 29, 2020, 4:40:10 AM5/29/20
to help-gn...@gnu.org
Yuri Khan wrote:

>> Anyway, what other features do the proprietary
>> CATs have? I always thought translation was just
>> a matter of reading one thing and then typing what
>> it means, looking up the occasional word or phrase
>> for the idiomatic equivalent.
>
> I have not used any professional CATs, but one
> important function is having a vocabulary (also
> called translation memory).
>
> Imagine translating a novel. When a new character
> is introduced, you have to decide how his/her name
> is translated and spelled. You need to record it so
> that you’re consistent. Same goes for any names,
> not just of people.

Wait, don't tell me... let's use... A TEXT FILE?!?

Rivendell Vattnadal
Shire Fylke
Strider Vidstige

> If the translation is a joint effort, that
> vocabulary needs to be shared [...]

The Internet. Good enough for government work :)

Jean-Christophe Helary

unread,
May 29, 2020, 4:41:44 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list


> On May 29, 2020, at 17:14, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

It is. But *computer aided translation* tools make that easier by putting all the translation ressources (glossaries, legacy translations, dictionaries, searches, autocompletion) into one translation "IDE" that helps the translator not lose time on repetitive tasks.

Emanuel Berg

unread,
May 29, 2020, 4:45:08 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

>> I always thought translation was just a matter of
>> reading one thing and then typing what it means,
>> looking up the occasional word or phrase for the
>> idiomatic equivalent.
>
> It is. But *computer aided translation* tools make
> that easier by putting all the translation
> ressources (glossaries, legacy translations,
> dictionaries, searches, autocompletion) into one
> translation "IDE" that helps the translator not
> lose time on repetitive tasks.

... which are?

Jean-Christophe Helary

unread,
May 29, 2020, 5:28:17 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list


> On May 29, 2020, at 17:43, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>>> I always thought translation was just a matter of
>>> reading one thing and then typing what it means,
>>> looking up the occasional word or phrase for the
>>> idiomatic equivalent.
>>
>> It is. But *computer aided translation* tools make
>> that easier by putting all the translation
>> ressources (glossaries, legacy translations,
>> dictionaries, searches, autocompletion) into one
>> translation "IDE" that helps the translator not
>> lose time on repetitive tasks.
>
> ... which are?

Typing text :)

If there is ONE repetitive task in translation, it is typing text.

So anything that is already registered and which can be semi-automatically entered is a godsend.

For ex. You translate a sentence in which half is already registered as a legacy translation. The translation memory engine finds the match in the background (no need for you to search it) and presents its corresponding translation (that's called "fuzzy matching"). You hit a shortcut and boom, you have half the sentence translated. Now, the other half contains glossary terms that are in a TSV file (or equivalent), here again the search has happened in the background and you are presented with a choice of terms that you can enter with a keybinding. You just need to type the semantic "glue" between the terms.

Et voilà. The "IDE" did all the searches in the background when you started working on a given segment and autocompletion or keybindings give you easy access to what you need to enter.

to...@tuxteam.de

unread,
May 29, 2020, 5:59:19 AM5/29/20
to help-gn...@gnu.org
On Fri, May 29, 2020 at 10:35:09AM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:

[...]

> Wait, don't tell me... let's use... A TEXT FILE?!?
>
> Rivendell Vattnadal
> Shire Fylke
> Strider Vidstige

How do you capture context?

Cheers
-- t
signature.asc

Emanuel Berg

unread,
May 29, 2020, 6:40:30 AM5/29/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> For ex. You translate a sentence in which half is
> already registered as a legacy translation.
> The translation memory engine finds the match in
> the background (no need for you to search it) and
> presents its corresponding translation (that's
> called "fuzzy matching"). You hit a shortcut and
> boom, you have half the sentence translated. Now,
> the other half contains glossary terms that are in
> a TSV file (or equivalent), here again the search
> has happened in the background and you are
> presented with a choice of terms that you can enter
> with a keybinding. You just need to type the
> semantic "glue" between the terms.

OK. I personally don't like that but that's in Emacs,
for sure. abbrev-mode and more advanced yasnippet and
many, many other solutions and packs.

Emanuel Berg

unread,
May 29, 2020, 6:45:09 AM5/29/20
to help-gn...@gnu.org
tomas wrote:

>> Wait, don't tell me... let's use... A TEXT FILE?!?
>>
>> Rivendell Vattnadal
>> Shire Fylke
>> Strider Vidstige
>
> How do you capture context?

I don't know, a header?

"LOTR characters and place names"

Maybe the file name can also be something
descriptive, e.g. lotr-swedish.txt

to...@tuxteam.de

unread,
May 29, 2020, 7:34:49 AM5/29/20
to help-gn...@gnu.org
On Fri, May 29, 2020 at 12:44:18PM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:
> tomas wrote:
>
> >> Wait, don't tell me... let's use... A TEXT FILE?!?
> >>
> >> Rivendell Vattnadal
> >> Shire Fylke
> >> Strider Vidstige
> >
> > How do you capture context?
>
> I don't know, a header?
>
> "LOTR characters and place names"
>
> Maybe the file name can also be something
> descriptive, e.g. lotr-swedish.txt

"time flies like an arrow and fruit flies like banana"

(1) Translate this phrase into Swedish
(2a) How would you characterize the context of the first 'like'
above? Of the second one?
(2b) Write a short Emacs Lisp program which can distinguish
between both
(3) Do the same as (2a) and (2b) for both terms 'flies'

Enjoy ;-P

(Human languages are... interesting)

-- tomás
signature.asc

Emanuel Berg

unread,
May 29, 2020, 7:51:20 AM5/29/20
to help-gn...@gnu.org
tomas wrote:

> "time flies like an arrow and fruit flies like
> banana"
> (1) Translate this phrase into Swedish

That's impossible in a meaningful way if all that is
to be included as here time PASSES BY (går), it
doesn't fly. So maybe "Tiden springer iväg som
långdistanslöpare", as time can here can also "run
away"...

Obviously, every single expressions can't be
translated and still have every word translated to
its exact translation!

Emacs can't do that, CATs can't, no one can in a way
that makes sense.

Here, we care about what CAT _can_ do. Keep a list of
LOTR characters in a text file, send it by mail to
fellow translators in ~/.mailrc, and expand abbrevs
and snippet templates Emacs can already do, sorry.

> (2a) How would you characterize the context of the
> first 'like' above? Of the second one?

That's right, because I didn't know translating
fiction, non-fiction, and manuals involved solving
linguistic-logical puzzles from Chomsky's "Best Of"
multimedia CD-ROM...

Try this:

https://en.wikipedia.org/wiki/Context-sensitive_language

Jean-Christophe Helary

unread,
May 29, 2020, 8:02:16 AM5/29/20
to Emanuel Berg, Help Gnu Emacs mailing list
>> (2a) How would you characterize the context of the
>> first 'like' above? Of the second one?
>
> That's right, because I didn't know translating
> fiction, non-fiction, and manuals involved solving
> linguistic-logical puzzles from Chomsky's "Best Of"
> multimedia CD-ROM...

Indeed :) It's not like translation theory waited for discussions on help-gnu-emacs. Plus, DeepL does a very good job at translating the boring stuff.

Takesi Ayanokoji

unread,
May 29, 2020, 10:28:27 AM5/29/20
to Marcin Borkowski, Help Gnu Emacs mailing list
Hi,Marcin

> does anyone here perform translations within Emacs?

I am translating Emacs' info manuals using Emacs' po-mode and Po4a program.

--- "Why Po4a?" starts here.

I use PO format as translation memory because it is major format for *nix
i18n nowadays.

PO format is used gettext's tool-chain programs, but some tools in gettext
is limited for translating messages in various program sources.

These tools process source files includes messages convert/reflect to/from
translation memories.

For this reason, I use Po4a tools for generating po files from files
written in various formats, and vise versa.

--- "Why Po4a?" ends here.

> The first thing I would need is a way to highlight the "currently
> translated sentence" in the other window, where I would keep the
> original text, with an easy way to highlight the next/previous

In the po-mode, strings are highlighted "currently processing msgid".

Msgid is strings before translating.
These strings' range (sentence, paragraph, page, ...) is defined by Po4a
when extracting from original document files, and Po4a semms define msgid
as a paragraph.

For example, belows are 'Anti news' in Emacs manual before and after
translating.

before:
https://raw.githubusercontent.com/ayatakesi/emacs-24.5-doc-emacs/d826fbbb960688a04f46cec4e8e0131d2c39e218/anti.texi.po

after:
https://raw.githubusercontent.com/ayatakesi/emacs-24.5-doc-emacs/b0506307007fc3d36e8168c2f84bd125e97484fe/anti.texi.po

And in po4a, there are many commands operates on a msgid/msgstr.

Best,

Eric Abrahamsen

unread,
May 29, 2020, 1:39:41 PM5/29/20
to Yuri Khan, Emanuel Berg, help-gnu-emacs
Yuri Khan <yuri....@gmail.com> writes:

> On Fri, 29 May 2020 at 15:14, Emanuel Berg via Users list for the GNU
> Emacs text editor <help-gn...@gnu.org> wrote:
>
>> Anyway, what other features do the proprietary
>> CATs have?
>>
>> I always thought translation was just a matter of
>> reading one thing and then typing what it means,
>> looking up the occasional word or phrase for the
>> idiomatic equivalent.
>
> I have not used any professional CATs, but one important function is
> having a vocabulary (also called translation memory).
>
> Imagine translating a novel. When a new character is introduced, you
> have to decide how his/her name is translated and spelled. You need to
> record it so that you’re consistent. Same goes for any names, not just
> of people.
>
> If the translation is a joint effort, that vocabulary needs to be
> shared so that the whole team calls characters the same names.

I'm a translator, primarily of fiction, and do all of it in Emacs,
specifically in Org mode.

I've thought many times over the years about what I would really want an
Emacs-based translation environment to provide for me. I don't do
technical translation, so there's not a whole lot of value in
sentence-by-sentence correspondences. But as Yuri mentions it can be
very useful to keep track of how you've translated certain names, or
certain important terms, in different places throughout the text.
Basically I would want two things:

1. A way to keep track of location correspondences between the source
text and translated text. CAT tool split the text up by sentence, but
that's not very useful for fiction (particularly Chinese->English
translation) because there's rarely a one-to-one correspondence.
There /is/ a more reliable correspondence between paragraphs, though,
and I'd like to know which paragraph equals which. The point would
mostly be to find my place again when I start translating at the
beginning of the day, and to implement a more useful follow-mode. I
imagined this would happen when the mode was turned on: it would run
down the file and insert markers that would be used to find
correspondences. Special characters could be inserted into the file
to indicate that two paragraphs should be joined, or one paragraph
split.
2. Link terms in the translation to a glossary pulled from the original.
This would be character names, places, special terms, etc. They might
not always be translated the same way, but I need to know how I've
handled them earlier in the document. Glossary terms would be
highlighted in the source text, and when you came to the equivalent
spot in the translation, you'd use a command like
insert-translation-term that would prompt for the translation,
offering completion on earlier translations, and then insert that
term into the translated text with a link to the original in the
glossary. There would also be two multi-occur commands: one that
prompted for a translation and showed all the places in the source
text where it came from, and another that did the opposite: prompted
for an original glossary term and showed all the places in the
translation where it was translated.


Anyway, that's what I've been thinking about. Almost no code so far,
though!

Eric

Jean-Christophe Helary

unread,
May 29, 2020, 1:58:21 PM5/29/20
to Eric Abrahamsen, Yuri Khan, help-gnu-emacs, Emanuel Berg


> On May 30, 2020, at 2:39, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:

> I've thought many times over the years about what I would really want an
> Emacs-based translation environment to provide for me. I don't do
> technical translation, so there's not a whole lot of value in
> sentence-by-sentence correspondences.

Most translation tools I know (or I've used professionally) rely on a segmentation scheme set by the user. If the user wants paragraph based segmentation, so be it. What people call "sentence" segmentation is actually a regex based system that takes into account various signs in the source language.

> But as Yuri mentions it can be
> very useful to keep track of how you've translated certain names, or
> certain important terms, in different places throughout the text.
> Basically I would want two things:
>
> 1. A way to keep track of location correspondences between the source
> text and translated text. CAT tool split the text up by sentence,

(not true, see above)

> but
> that's not very useful for fiction (particularly Chinese->English
> translation) because there's rarely a one-to-one correspondence.
> There /is/ a more reliable correspondence between paragraphs, though,
> and I'd like to know which paragraph equals which. The point would
> mostly be to find my place again when I start translating at the
> beginning of the day, and to implement a more useful follow-mode.

I'm not sure I understand what you mean. What's the difficulty that you are facing ?

> I
> imagined this would happen when the mode was turned on: it would run
> down the file and insert markers that would be used to find
> correspondences. Special characters could be inserted into the file
> to indicate that two paragraphs should be joined, or one paragraph
> split.

What would be the use of such a marking ?

> 2. Link terms in the translation to a glossary pulled from the original.
> This would be character names, places, special terms, etc. They might
> not always be translated the same way, but I need to know how I've
> handled them earlier in the document. Glossary terms would be
> highlighted in the source text, and when you came to the equivalent
> spot in the translation, you'd use a command like
> insert-translation-term that would prompt for the translation,
> offering completion on earlier translations, and then insert that
> term into the translated text with a link to the original in the
> glossary. There would also be two multi-occur commands: one that
> prompted for a translation and showed all the places in the source
> text where it came from, and another that did the opposite: prompted
> for an original glossary term and showed all the places in the
> translation where it was translated.

Very nice ideas.

Eric Abrahamsen

unread,
May 29, 2020, 2:23:05 PM5/29/20
to Jean-Christophe Helary, help-gnu-emacs, Emanuel Berg, Yuri Khan
Jean-Christophe Helary <jean.christ...@traduction-libre.org>
writes:

>> On May 30, 2020, at 2:39, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>
>> I've thought many times over the years about what I would really want an
>> Emacs-based translation environment to provide for me. I don't do
>> technical translation, so there's not a whole lot of value in
>> sentence-by-sentence correspondences.
>
> Most translation tools I know (or I've used professionally) rely on a
> segmentation scheme set by the user. If the user wants paragraph based
> segmentation, so be it. What people call "sentence" segmentation is
> actually a regex based system that takes into account various signs in
> the source language.

Okay, that's good to know. I guess I would just set it to split by
paragraph, but would also like manual control in some cases.

>> But as Yuri mentions it can be
>> very useful to keep track of how you've translated certain names, or
>> certain important terms, in different places throughout the text.
>> Basically I would want two things:
>>
>> 1. A way to keep track of location correspondences between the source
>> text and translated text. CAT tool split the text up by sentence,
>
> (not true, see above)
>
>> but
>> that's not very useful for fiction (particularly Chinese->English
>> translation) because there's rarely a one-to-one correspondence.
>> There /is/ a more reliable correspondence between paragraphs, though,
>> and I'd like to know which paragraph equals which. The point would
>> mostly be to find my place again when I start translating at the
>> beginning of the day, and to implement a more useful follow-mode.
>
> I'm not sure I understand what you mean. What's the difficulty that you are facing ?
>
>> I
>> imagined this would happen when the mode was turned on: it would run
>> down the file and insert markers that would be used to find
>> correspondences. Special characters could be inserted into the file
>> to indicate that two paragraphs should be joined, or one paragraph
>> split.
>
> What would be the use of such a marking ?

A follow-mode, as I mentioned above. And just finding my place. I do my
translation in two sibling Org sub-trees, original and translation,
displayed in two side-by-side windows. I don't want to mess with
two-column-mode or anything like that. I want to be able to go to the
bottom of the translation, run a command, and have the second window
display the corresponding original. If I realize I've done something
wrong a couple of chapters previously, and I skip back up to that
location in the translation, I want to run the same command to display
the corresponding spot in the original.

>> 2. Link terms in the translation to a glossary pulled from the original.
>> This would be character names, places, special terms, etc. They might
>> not always be translated the same way, but I need to know how I've
>> handled them earlier in the document. Glossary terms would be
>> highlighted in the source text, and when you came to the equivalent
>> spot in the translation, you'd use a command like
>> insert-translation-term that would prompt for the translation,
>> offering completion on earlier translations, and then insert that
>> term into the translated text with a link to the original in the
>> glossary. There would also be two multi-occur commands: one that
>> prompted for a translation and showed all the places in the source
>> text where it came from, and another that did the opposite: prompted
>> for an original glossary term and showed all the places in the
>> translation where it was translated.
>
> Very nice ideas.

Maybe this will inspire me to write some code! The nice thing about the
glossary is that it wouldn't have to just be vocabulary. You could just
as easily use it for "every time the car crash is referenced", or
something like that. You'd just have to manually mark the passage in the
original, rather than automated marking by text search.

Emanuel Berg

unread,
May 29, 2020, 6:48:41 PM5/29/20
to help-gn...@gnu.org
Giovanni Bono wrote:

> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
> [...]

:O

Certainly not the way I'm used to see Emacs Lisp...

But as they say, styles make fights!

Gene

unread,
May 29, 2020, 7:17:11 PM5/29/20
to
On Friday, May 29, 2020 at 1:55:39 AM UTC-4, Marcin Borkowski wrote:
> Hi all,
>
> does anyone here perform translations within Emacs? Do you know of any
> tools facilitating that?

Funny you should ask. I just overcame a years-long obstacle with my personal attempts to exploit org-modes ability to evaluate code blocks.

Though most people don't consider so-called `natural' language -- EG no less `artificial' or `man made' than so-called `artificial' languages, such as programming languages, math notations, and such -- org-mode's assortment of notations permitted in code blocks is `translate' which allows google translate to be used to translate the natural language `code'

to use `translate' as a type of code within #+begin_src tranlate and #end_src parenthetical book ends a few details must be addressed:

(package-activate 'ob-translate )

AND

(org-babel-do-load-languages
'org-babel-load-languages
'(
; (eshell . t) ; commented out here to indicate that ob-translate is the active ingredient

; (emacs-lisp . t) ; only ob-translate is the active ingredient

(ob-translate . t)
) ;
)

From this juncture one should be ready to experiment.

Here's an experiment I conducted:

#+begin_src translate
acta deos numquam mortalia fallunt
#+end_src

#+RESULTS:
: Mortal deeds never deceive the gods



I hope this helps those perhaps only interested-in or requiring relatively simple, unsophisticated translation or perhaps lacking sophisticated programming abilities ... perhaps both.

Cheers

Emanuel Berg

unread,
May 29, 2020, 9:34:10 PM5/29/20
to help-gn...@gnu.org
Can't we compile a list of what the commercial CATs
offer? M Helary and Mr Abrahamsen?

I'll read thru this thread tomorrow (today)
God willing but I don't understand everything, in
particular examples would nice to get the exact
meaning of the desired functionality...

With examples we can also see if Emacs already can do
it. And if not: Elisp contest :)

Some features are probably silly, we don't have to
list or do them, or everything in the CATs, just what
really makes sense and is useful on an
every-day basis.

When we are done, we put it in the wiki or in a pack.

We can't have that Emacs doesn't have a firm grip on
this issue. Because translation is a very common task
with text!

Also, let's compile a list of what Emacs already has
to this end. It doesn't matter if some of that stuff
already appears somewhere else, modularity is
our friend.

Pepp pepp :)

Jean-Christophe Helary

unread,
May 29, 2020, 11:12:23 PM5/29/20
to Emanuel Berg, help-gn...@gnu.org


> On May 30, 2020, at 10:33, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Can't we compile a list of what the commercial CATs
> offer? M Helary and Mr Abrahamsen?

x commercial → ○ professional, if you don't mind :)
OmegaT is very much a professional tool and certainly not a "commercial" one.


My 20 years of practice but otherwise not technically so very informed idea is the following:


1) CAT tools extract translatable contents from various file formats into an easy-to-handle format, and put the translated contents back into the original format. That way the translator does not have to worry *too much* about the idiosyncrasies of the original format.

→ File filters are a core part of a CAT tool *but* as was suggested in the thread it is possible to rely on an external filter that will output contents in a standard localization "intermediate" format (current "industry" standards are PO and XLIFF). Such filters provide export and import functions so that the translated files are converted back to the original format.

File filters can also accept rules for not outputting non-translatable text (the current standard is ITS)

The PO format can be handled by po4a (perl), translate-toolkit (python) and the Okapi Framework tools (java).
XLIFF has the Okapi Framework, OpenXLIFF (electron/node) and the translate-toolkit. All are top-notch pro-grade free software and in the case of Okapi and OpenXLIFF have been developed by people who have participated to the standardization process (XLIFF/TMX/SRX/ITS/TBX, etc...)

→ emacs could rely on such external filters and only specialize in one "intermediate" format. The po-mode already does that for PO files.


2) Once the text is extracted, it needs to be segmented. Basic "no" segmentation usually means paragraph based segmentation. Paragraphs are defined differently depending on the original format (1, or 2 line breaks for a text file, a block tag for XML-based formats, etc.).
Fine-grained segmentation is obtained by using a set of native language based regex that includes break rules and no-break rules. A simple example is break after a "period followed by a space" but don't break after "Mr. " for English.

→ File filters usually handle the segmentation part based on user specifications. Once the file is segmented into the intermediate format, it is not structurally trivial to "split" or "merge" segments because the tool needs to remember what will go back into the original file structure.

→ emacs could rely on the external filters to handle the segmentation.


3) The real strength of a CAT tool shows where it helps the translator handle all the resources needed in the translation. Let me list potential resources:

- Legacy translations, called "translation memories" (TM), usually in multilingual "aligned" files where a given segment has equivalents in various languages. Translated PO files are used as TMs, the XML standard is TMX.

- Glossaries, usually in a similar but simpler format, sometimes only TSV, sometimes CSV, the XML-based standard is TBX.

- Internal translations, which are produced by the translator while translating. Each translated segment adding to the project "memory".

- Dictionaries are a more global form of glossaries, usually monolingual, format varies.

- external files, either local documents, or web documents, in various formats, usually monolingual (otherwise they'd be aligned and used as TMs)

→ each resource format needs a way to be parsed, memorized, fetched, recycled efficiently during the translation


4) Usually the process is the following:

- the translator "enters" a segment
- the tool displays "matches" from the resources that relatively closely correspond to the segment contents
- the translator inserts or modifies the matches
- when no matches are produced the translator enters a translation from scratch
- the translator can add glossary items to the project glossary
- the new translation is added to the "internal" memory set
- the translator moves to the next segment


5) The matching is usually some sort of levenstein distance-based algorithm. The "tokens" that are used in the "distance" calculation are usually produced by native language based tokenizers (the Lucene tokenizers are quite popular)

The better the match, the more efficient the tool is at helping the translator recycle resources. The matching process/quality is where tools profoundly differ (OmegaT is generally considered to have excellent quality matches, sometimes better than expensive commercial tools).

Some tools propose "context" matches where the previous and next segments are also taken into account, some tools propose "subsegment" matches where even if a whole segment won't match significant subparts can, etc.

The matching process must sometimes apply to extremely big resources (like many million lines of multilingual TMs in the case of the EU legal corpora) and must thus be able to handle the data quickly regardless of the set size.


6) Goodies that are time savers include:

- history based autocompletion
- glossary/TM/dictionary based autocompletion
- MT services access
- shortcuts that auto insert predefined text chunks
- spell-checking/grammar checking
- QA checks against glossary terms, completeness/length of the translation, integrity of the format structure, numbers used, etc. (QA checks are also available as external processes in some of the solutions mentioned above, or related solutions.)


> I'll read thru this thread tomorrow (today)
> God willing but I don't understand everything, in
> particular examples would nice to get the exact
> meaning of the desired functionality...

Go ahead if you have questions.

> With examples we can also see if Emacs already can do
> it. And if not: Elisp contest :)

:)

> Some features are probably silly, we don't have to
> list or do them, or everything in the CATs, just what
> really makes sense and is useful on an every-day basis.

A lot of the heavy-duty tasks can be handled by external processes.

> When we are done, we put it in the wiki or in a pack.
>
> We can't have that Emacs doesn't have a firm grip on
> this issue. Because translation is a very common task
> with text!
>
> Also, let's compile a list of what Emacs already has
> to this end. It doesn't matter if some of that stuff
> already appears somewhere else, modularity is
> our friend.

:)

Jean-Christophe Helary

unread,
May 29, 2020, 11:20:31 PM5/29/20
to Eric Abrahamsen, help-gnu-emacs, Emanuel Berg, Yuri Khan


> On May 30, 2020, at 3:22, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>
>>> I
>>> imagined this would happen when the mode was turned on: it would run
>>> down the file and insert markers that would be used to find
>>> correspondences. Special characters could be inserted into the file
>>> to indicate that two paragraphs should be joined, or one paragraph
>>> split.
>>
>> What would be the use of such a marking ?
>
> A follow-mode, as I mentioned above.

Is such a mode in emacs ?

> And just finding my place. I do my
> translation in two sibling Org sub-trees, original and translation,
> displayed in two side-by-side windows. I don't want to mess with
> two-column-mode or anything like that.

:) I'm not sure anybody uses that anymore. But it must have been big when it started because it get the F2 key assigned by default...

> I want to be able to go to the
> bottom of the translation, run a command, and have the second window
> display the corresponding original. If I realize I've done something
> wrong a couple of chapters previously, and I skip back up to that
> location in the translation, I want to run the same command to display
> the corresponding spot in the original.

I seem to remember a long discussion about bookmarks here or on devel a while ago. Did you consider that ?

Giovanni Bono

unread,
May 30, 2020, 2:21:04 AM5/30/20
to help-gn...@gnu.org
Emanuel Berg via Users list for the GNU Emacs text editor
<help-gn...@gnu.org> writes:

> Giovanni Bono wrote:
>
>> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
>> [...]
>
> :O
>
> Certainly not the way I'm used to see Emacs Lisp...
>
> But as they say, styles make fights!

If you are referring to variable scoping, I think it makes no actual
difference to the code I posted. I was probably just experimenting,

Giovanni


Eli Zaretskii

unread,
May 30, 2020, 2:25:00 AM5/30/20
to help-gn...@gnu.org
> From: Jean-Christophe Helary <jean.christ...@traduction-libre.org>
> Date: Sat, 30 May 2020 12:20:14 +0900
> Cc: help-gnu-emacs <help-gn...@gnu.org>, Emanuel Berg <moase...@zoho.eu>,
> Yuri Khan <yuri....@gmail.com>
>
> > A follow-mode, as I mentioned above.
>
> Is such a mode in emacs ?

Yes, it is. Type "C-h f follow-mode RET".

Jean-Christophe Helary

unread,
May 30, 2020, 2:36:22 AM5/30/20
to Eli Zaretskii, help-gn...@gnu.org
Ok, big confusion with the packages. I search in list-packages for a "follow-mode" and obvisouly did not find anything...

Eric Abrahamsen

unread,
May 30, 2020, 12:46:21 PM5/30/20
to Jean-Christophe Helary, help-gnu-emacs, Emanuel Berg, Yuri Khan
Jean-Christophe Helary <jean.christ...@traduction-libre.org>
writes:

>> On May 30, 2020, at 3:22, Eric Abrahamsen <er...@ericabrahamsen.net> wrote:
>>
>>>> I
>>>> imagined this would happen when the mode was turned on: it would run
>>>> down the file and insert markers that would be used to find
>>>> correspondences. Special characters could be inserted into the file
>>>> to indicate that two paragraphs should be joined, or one paragraph
>>>> split.
>>>
>>> What would be the use of such a marking ?

[...]

>> I want to be able to go to the
>> bottom of the translation, run a command, and have the second window
>> display the corresponding original. If I realize I've done something
>> wrong a couple of chapters previously, and I skip back up to that
>> location in the translation, I want to run the same command to display
>> the corresponding spot in the original.
>
> I seem to remember a long discussion about bookmarks here or on devel
> a while ago. Did you consider that ?

The only part of this code I ever actually wrote used bookmarks to save
where I was at the end of the work day. But usually you just save one
bookmark per file, indicating "where you are" in the file. That's a
different concern than splitting the two texts into segments, and
recording correspondences between segments in the texts. If you
segmented a whole novel by sentences, and then saved a bookmark per
sentence, I'm sure it would cause something to catch on fire.

At first I thought I'd run through the text when the mode was turned on,
insert a whole bunch of markers, then keep a list of marker-pairs. That
seemed like it would be hard to keep properly in sync, though, so now
I'm thinking of running through the text and actually inserting
separator characters, perhaps #x1f, either making them invisible or
putting some other nice display on them. That makes it easier to sync,
and has the advantage that it persists to disk and you only have to do
the major parsing once. Then strip them out during export.

Anyway, still experimenting...

Marcin Borkowski

unread,
May 31, 2020, 1:10:30 AM5/31/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-05-29, at 10:14, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Can't you do ecat-highlight-next-sentence and
> ecat-highlight-previous-sentence by just moving point
> to the next sentence and then do
> ecat-highlight-this-sentence? Feels more natural...

That would completely defeat the purpose. All CAT stuff (as many people
already told) is about efficiency. One of the main points of my (very
simple) code is that I do not have to move point anywhere.

> Anyway, what other features do the proprietary
> CATs have?
>
> I always thought translation was just a matter of
> reading one thing and then typing what it means,
> looking up the occasional word or phrase for the
> idiomatic equivalent.

Well, you already got your answers, but let me stress that one of the
important points is extracting text from some strange formats and
putting the translation back into it. Think Word documents with complex
formatting, or HTML with many tags/attributes. If you are to translate
things like

<p class="important-instruction">Click the <span
class="dancing-elephants">big red button</span> to launch the nuke</p>

and all the markup has to be there in the translation, you really don't
want to type it by hand - it's time-consuming and error-prone.

> Some idiomatic phrases are pitfalls tho. For example
> the English "more or less" looks like the Swedish
> "mer eller mindre" (which means "correct but with
> room for fine details") but the way native speakers
> use it seems to be more (?) "både och" which means
> discussion can go both (disparate) ways and BOTH
> are correct!
>
> So perhaps one could have a list of these "trap
> phrases" so when they turn up in the text, they are
> highlighted to indicate "watch out! we are not just
> piling words here!"
>
> Who'd compile that list is another matter...

I guess this is a very minor problem...

Best,

--
Marcin Borkowski
http://mbork.pl

Marcin Borkowski

unread,
May 31, 2020, 1:15:11 AM5/31/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-05-29, at 13:51, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Here, we care about what CAT _can_ do. Keep a list of
> LOTR characters in a text file, send it by mail to
> fellow translators in ~/.mailrc, and expand abbrevs
> and snippet templates Emacs can already do, sorry.

Extremely inefficient, error-prone and irritating. Again: CATs are
about automating reetitive things.

How would you expect people to know which mail contains which (version
of) the file? How would you expect people to know which file (out of
several dozen or more) to look at to find something? How would you
expect people to remember the names of the snippets? Etc., etc., etc.

Marcin Borkowski

unread,
May 31, 2020, 1:18:07 AM5/31/20
to Giovanni Bono, help-gn...@gnu.org

On 2020-05-29, at 17:02, Giovanni Bono <giovan...@unimi.it> wrote:

> Marcin Borkowski <mb...@mbork.pl> writes:
>
>> Hi all,
>>
>> does anyone here perform translations within Emacs? Do you know of any
>> tools facilitating that? There exist a few CATs, or Computer Aided
>> Translation systems, but - AFAIK - they are all proprietary and closed
>> source. Emacs seems capable of implementing at least a simple CAT, but
>> I could not find any existing solutions for that. (I skimmed through
>> the answers here:
>> https://www.reddit.com/r/emacs/comments/a35bs2/emacs_for_translations/,
>> but did not find anything useful.)
>>
>> The first thing I would need is a way to highlight the "currently
>> translated sentence" in the other window, where I would keep the
>> original text, with an easy way to highlight the next/previous one -
>> this seems very easy to do, but did anyone actually code anything like
>> this?
>>
>> TIA,
>
> hello Marcin,
>
> I translated a few books, a few years ago, using Emacs as a simple CAT.
> Here is a screenshot of the last iteration:
>
>
> On the left there are three windows with translated, current, an next
> sentences from the source text. Central windows are for translated and
> current sentences, and the bottom central window is for current word.
> The right window is for statistics, and (not shown here) Wordnet
> (/usr/bin/wn) lookup.
>
> The idea is to have some words (in bold in the sreenshot) that are
> controlled, so that while translating them you can keep trace of all
> other occurencies and prior translations. So every word in the source
> material need to be indexed and referenced to a (possibly empty) word in
> the ongoing translation.
>
> Work happens in the very central frame, where words are presented
> untranslated at first, and you can move them around or substitute them
> with prior or new (including empty) translations. After a while, it
> gets fast.
>
> I am attaching the code. Most of it is a painful and messy tratment of
> the publisher markup, and all of it is intended for personal use and for
> the particular book I was translating. But maybe you can adapt some of
> it to your needs. Regards,

Thanks a lot! This looks pretty impressive. If only I had time to
analyze yor code ATM...

I'll look into it one day, though!

Giovanni Bono

unread,
May 31, 2020, 2:58:42 AM5/31/20
to help-gn...@gnu.org
Sure! If you want to try it instead, I could send you the data (the
manuscript) off list. Then it is just a matter of loading the file (it
works on a recent Emacs), typing ‘M-x gi/roth/startup’ and trying the
keybindings (commented in the code). For that you would need a large
enough frame (240 columns x 60 rows), cause unfortunately windows
splitting is hardcoded. Regards,

Giovanni


Steinar Bang

unread,
Jun 1, 2020, 4:26:31 AM6/1/20
to help-gn...@gnu.org
>>>>> Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org>:

(so those are the names of the Swedish translation, interesting)

> Rivendell Vattnadal

(can't remember)
("Kløvendal" according to google)

> Shire Fylke

Syssel (which is a little ironic considering the Swedish translation...)

> Strider Vidstige

Vidvandre

(I've always rather preferred the Norwegian version of Isengard: Jarnagard)


Emanuel Berg

unread,
Jun 1, 2020, 4:33:18 AM6/1/20
to help-gn...@gnu.org
Steinar Bang wrote:

> (so those are the names of the Swedish translation,
> interesting)

Right, there are to my knowledge two translations of
LOTR and they are not similar in details. This is
from the first one, by Åke Ohlmaks. He was
a ctroversial figure and so was his translation, but
many people like it.

>> Rivendell Vattnadal
>
> (can't remember) ("Kløvendal" according to google)
>
>> Shire Fylke
>
> Syssel (which is a little ironic considering the
> Swedish translation...)
>
>> Strider Vidstige
>
> Vidvandre
>
> (I've always rather preferred the Norwegian version
> of Isengard: Jarnagard)

Heh, cool :)

Emanuel Berg

unread,
Jun 1, 2020, 4:39:49 AM6/1/20
to help-gn...@gnu.org
Giovanni Bono wrote:

>> Giovanni Bono wrote:
>>
>>> ;; -*- mode: emacs-lisp; lexical-binding: t; -*-
>>> [...]
>>
>> :O
>>
>> Certainly not the way I'm used to see Emacs
>> Lisp...
>>
>> But as they say, styles make fights!
>
> If you are referring to variable scoping, I think
> it makes no actual difference to the code I posted.
> I was probably just experimenting,

No no, just the whole style... it is just everything
at once I guess. You are skilled and used to write
code like that, no doubt.

Emanuel Berg

unread,
Jun 1, 2020, 4:51:39 AM6/1/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

>> Can't you do ecat-highlight-next-sentence and
>> ecat-highlight-previous-sentence by just moving
>> point to the next sentence and then do
>> ecat-highlight-this-sentence? Feels more
>> natural...
>
> That would completely defeat the purpose. All CAT
> stuff (as many people already told) is about
> efficiency. One of the main points of my (very
> simple) code is that I do not have to move
> point anywhere.

Because its inefficient? ... you are a fast
translator :)

but ... OK.

Only it still looks wierd with the same code two
extra times.

> Well, you already got your answers, but let me
> stress that one of the important points is
> extracting text from some strange formats and
> putting the translation back into it. Think Word
> documents with complex formatting, or HTML with
> many tags/attributes. If you are to translate
> things like
>
> <p class="important-instruction">Click the <span
> class="dancing-elephants">big red button</span> to
> launch the nuke</p>
>
> and all the markup has to be there in the
> translation, you really don't want to type it by
> hand - it's time-consuming and error-prone.

Well, that's a task for a parser rather to convert
between one format to another... very mechanical
and easy.

>> Some idiomatic phrases are pitfalls tho.
>> For example the English "more or less" looks like
>> the Swedish "mer eller mindre" (which means
>> "correct but with room for fine details") but the
>> way native speakers use it seems to be more (?)
>> "både och" which means discussion can go both
>> (disparate) ways and BOTH are correct! So perhaps
>> one could have a list of these "trap phrases" so
>> when they turn up in the text, they are
>> highlighted to indicate "watch out! we are not
>> just piling words here!" Who'd compile that list
>> is another matter...
>
> I guess this is a very minor problem...

It depends how many there are... should be systemized
just as everything unusual is in any trade.
Easy thing to do with huge gain and possible to build
on and extend...

Emanuel Berg

unread,
Jun 1, 2020, 4:55:12 AM6/1/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> How would you expect people to know which mail
> contains which (version of) the file? How would you
> expect people to know which file (out of several
> dozen or more) to look at to find something?
> How would you expect people to remember the names
> of the snippets? Etc., etc., etc.

I know, it is soo difficult! Sometimes I wonder how
I even manage to use my computer...

Marcin Borkowski

unread,
Jun 4, 2020, 3:47:58 PM6/4/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-01, at 10:51, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>>> Can't you do ecat-highlight-next-sentence and
>>> ecat-highlight-previous-sentence by just moving
>>> point to the next sentence and then do
>>> ecat-highlight-this-sentence? Feels more
>>> natural...
>>
>> That would completely defeat the purpose. All CAT
>> stuff (as many people already told) is about
>> efficiency. One of the main points of my (very
>> simple) code is that I do not have to move
>> point anywhere.
>
> Because its inefficient? ... you are a fast
> translator :)

It's not the question of speed, but of distractions. Annoying workflow
full of distractions (like moving the point in the other window
manually) is very inefficient.

> but ... OK.
>
> Only it still looks wierd with the same code two
> extra times.
>
>> Well, you already got your answers, but let me
>> stress that one of the important points is
>> extracting text from some strange formats and
>> putting the translation back into it. Think Word
>> documents with complex formatting, or HTML with
>> many tags/attributes. If you are to translate
>> things like
>>
>> <p class="important-instruction">Click the <span
>> class="dancing-elephants">big red button</span> to
>> launch the nuke</p>
>>
>> and all the markup has to be there in the
>> translation, you really don't want to type it by
>> hand - it's time-consuming and error-prone.
>
> Well, that's a task for a parser rather to convert
> between one format to another... very mechanical
> and easy.

Have you ever done a real-world translation like this...?

>
>>> Some idiomatic phrases are pitfalls tho.
>>> For example the English "more or less" looks like
>>> the Swedish "mer eller mindre" (which means
>>> "correct but with room for fine details") but the
>>> way native speakers use it seems to be more (?)
>>> "både och" which means discussion can go both
>>> (disparate) ways and BOTH are correct! So perhaps
>>> one could have a list of these "trap phrases" so
>>> when they turn up in the text, they are
>>> highlighted to indicate "watch out! we are not
>>> just piling words here!" Who'd compile that list
>>> is another matter...
>>
>> I guess this is a very minor problem...
>
> It depends how many there are... should be systemized
> just as everything unusual is in any trade.
> Easy thing to do with huge gain and possible to build
> on and extend...

As I said - you're right, but this is definitely not a major issue.

Emanuel Berg

unread,
Jun 4, 2020, 8:43:25 PM6/4/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> It's not the question of speed, but of
> distractions. Annoying workflow full of
> distractions (like moving the point in the other
> window manually) is very inefficient.

Aha, _manually_. No, of course you can't have that.
I thought you meant moving point in general.

But I still don't understand why to have virtually
the same code thrice...

>> Well, that's a task for a parser rather to convert
>> between one format to another... very mechanical
>> and easy.
>
> Have you ever done a real-world translation like
> this...?

Every time I ever compiled source code for example.
Also browsing the web involves the browser parsing
HTML and CSS. It happens all the time when using
a computer.

>> It depends how many there are... should be
>> systemized just as everything unusual is in any
>> trade. Easy thing to do with huge gain and
>> possible to build on and extend...
>
> As I said - you're right, but this is definitely
> not a major issue.

... idiomatic translations of phrases isn't
a major issue?

Jean-Christophe Helary

unread,
Jun 4, 2020, 8:49:42 PM6/4/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 5, 2020, at 9:43, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
>>> It depends how many there are... should be
>>> systemized just as everything unusual is in any
>>> trade. Easy thing to do with huge gain and
>>> possible to build on and extend...
>>
>> As I said - you're right, but this is definitely
>> not a major issue.
>
> ... idiomatic translations of phrases isn't
> a major issue?

No, because that is exactly what the translator is here for.

What we are discussing here is how the computer is helping the translator for all the other parts that require efficiency.

Emanuel Berg

unread,
Jun 4, 2020, 9:08:39 PM6/4/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

>> ... idiomatic translations of phrases isn't
>> a major issue?
>
> No, because that is exactly what the translator is
> here for.

OK, so translators are not allowed to use tools?

What about dictionaries, are you still allowed to
use them?

> What we are discussing here is how the computer is
> helping the translator for all the other parts that
> require efficiency.

I see, for other parts, it is allowed!

Jean-Christophe Helary

unread,
Jun 4, 2020, 10:58:24 PM6/4/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 5, 2020, at 10:08, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>>> ... idiomatic translations of phrases isn't
>>> a major issue?
>>
>> No, because that is exactly what the translator is
>> here for.
>
> OK, so translators are not allowed to use tools?
>
> What about dictionaries, are you still allowed to
> use them?
>
>> What we are discussing here is how the computer is
>> helping the translator for all the other parts that
>> require efficiency.
>
> I see, for other parts, it is allowed!


I'm not sure I understand where you're drifting.

Idiomatic translations can be helped with tools, but as we see with MT, even the best tools still fail to produce correct idiomatic translations.

Which is the reason why I wrote "that is exactly what the translator is here for", meaning that the translator is here to ensure that the translation is idiomatic.

The tool can help the translator to produce a string that the translator will validate as idiomatic.

Emanuel Berg

unread,
Jun 4, 2020, 11:15:33 PM6/4/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> I'm not sure I understand where you're drifting.
>
> Idiomatic translations can be helped with tools,
> but as we see with MT, even the best tools still
> fail to produce correct idiomatic translations.
>
> Which is the reason why I wrote "that is exactly
> what the translator is here for", meaning that the
> translator is here to ensure that the translation
> is idiomatic.
>
> The tool can help the translator to produce
> a string that the translator will validate
> as idiomatic.

A file with common pitfalls and idiomatic
translations can help both machine AND human
translators find the idiomatic translation, or, if
there isn't one really, what people would say
instead...

"Good enough for government work" - there is no such
expression in Swedish - "good enough for the intended
purpose" - "tillräckligt bra för det avsedda
ändamålet" (sounds extremely dorky, no one would ever
say that) - "det duger" - ?

"Have one on the house" - there is no ... - etc etc -
"vi kan bjucka på den"

???

Jean-Christophe Helary

unread,
Jun 5, 2020, 12:50:31 AM6/5/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 5, 2020, at 12:15, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> I'm not sure I understand where you're drifting.
>>
>> Idiomatic translations can be helped with tools,
>> but as we see with MT, even the best tools still
>> fail to produce correct idiomatic translations.
>>
>> Which is the reason why I wrote "that is exactly
>> what the translator is here for", meaning that the
>> translator is here to ensure that the translation
>> is idiomatic.
>>
>> The tool can help the translator to produce
>> a string that the translator will validate
>> as idiomatic.
>
> A file with common pitfalls and idiomatic
> translations can help both machine AND human
> translators find the idiomatic translation, or, if
> there isn't one really, what people would say
> instead...

Indeed, and that is just a special case of what we call "contextual matches" where the source text can match a number of translations but only a few are correct in the context of the few previous or following segments.

So in terms of CAT functionality, such data would be included in a translation memory, or a glossary file with comments, etc.

Emanuel Berg

unread,
Jun 5, 2020, 2:43:14 AM6/5/20
to help-gn...@gnu.org, Ingemar Holmgren
Jean-Christophe Helary wrote:

> x commercial → professional, if you don't mind :)
> OmegaT is very much a professional tool and
> certainly not a "commercial" one. [...]

OK, so this is what it is all about, it is having
a database of translations and then an algorithm that
searches that.

Other than that, one needs a pretty simple interface
that applies all that (DB + search), with the
sentences in the source langauge as indata - the
outdata is the translation suggestions found in
the DB.

And if there are no hits the translator types the
translation and then this in turn gets inserted into
the database...

Sounds very boring, virtually using a click-and-play
application, I would much rather just read and type.
But I suppose that isn't efficient enough this day
and age?

But as for this discussion how to do it, it suddenly
becomes a question of using Emacs (the interface) to
use OmegaT's resources (DB + search)...

But count me out...

Jean-Christophe Helary

unread,
Jun 5, 2020, 3:35:41 AM6/5/20
to Emanuel Berg, help-gn...@gnu.org, Ingemar Holmgren


> On Jun 5, 2020, at 15:43, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> x commercial → professional, if you don't mind :)
>> OmegaT is very much a professional tool and
>> certainly not a "commercial" one. [...]
>
> OK, so this is what it is all about, it is having
> a database of translations and then an algorithm that
> searches that.

That's correct. A lot of tools do that. Most of them don't give very good results, because the developers don't care much.

> Other than that, one needs a pretty simple interface
> that applies all that (DB + search), with the
> sentences in the source langauge as indata - the
> outdata is the translation suggestions found in
> the DB.

That's correct.

> And if there are no hits the translator types the
> translation and then this in turn gets inserted into
> the database...

That's correct too.

> Sounds very boring, virtually using a click-and-play
> application, I would much rather just read and type.
> But I suppose that isn't efficient enough this day
> and age?

? I don't know what you are talking about.

> But as for this discussion how to do it, it suddenly
> becomes a question of using Emacs (the interface) to
> use OmegaT's resources (DB + search)...

No.

You asked for a description of a translator's workflow. I gave you one, pretty detailed, and not dependent on any tool.

Emacs can work with databases (sqlite for ex) and have the matching engine written in emacslisp.

> But count me out...

I never counted on you to do whatever. You asked for a workflow, I gave you one.

Emanuel Berg

unread,
Jun 5, 2020, 3:55:17 AM6/5/20
to help-gn...@gnu.org, Ingemar Holmgren
Jean-Christophe Helary wrote:

>> But as for this discussion how to do it, it
>> suddenly becomes a question of using Emacs (the
>> interface) to use OmegaT's resources (DB +
>> search)...
>
> No.
>
> You asked for a description of a translator's
> workflow. I gave you one, pretty detailed, and not
> dependent on any tool.
>
> Emacs can work with databases (sqlite for ex) and
> have the matching engine written in emacslisp.

Right, but I suspect when people want to use Emacs
rather than OmegaT, it is because they use Emacs as
their editor and common interface to the whole
system, and they are used to it, and Emacs is used to
them (thru config/extension) - I think that this is
the entry point, not that anyone is dissatisfied with
the search/DB capabilities of OmegaT...

So if OmegaT already does the search well and comes
with a database (with data), and if all that is FOSS
it makes more sense to bring it over as it is, and
make the interface which would be an Emacs major mode
or a bunch of different modes - for starters, one
with the "segments", and one with suggestions and
keys:

a: suggestion 1
s: suggestion 2
d: .. 3

and more advanced keys like `e a' = insert suggestion
a only edit it first and so on...

But even with an Emacs keyboard-only interface, while
an IMMENSE improvement, it still doesn't sound like
fun :( Reading, thinking, and typing, all at once,
almost, that sounds like fun. But getting a selection
from a database and picking the best alternative,
that just doesn't appeal to me.

Jean-Christophe Helary

unread,
Jun 5, 2020, 4:14:36 AM6/5/20
to Emanuel Berg, Help Gnu Emacs mailing list, Ingemar Holmgren


> On Jun 5, 2020, at 16:55, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Reading, thinking, and typing, all at once,
> almost, that sounds like fun. But getting a selection
> from a database and picking the best alternative,
> that just doesn't appeal to me.

Well, do what you feel is best for you ! As a translator who pays my bills with my fingers, I don't have much of a choice.

Marcin Borkowski

unread,
Jun 5, 2020, 6:46:56 AM6/5/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-05, at 05:15, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Jean-Christophe Helary wrote:
>
>> I'm not sure I understand where you're drifting.
>>
>> Idiomatic translations can be helped with tools,
>> but as we see with MT, even the best tools still
>> fail to produce correct idiomatic translations.
>>
>> Which is the reason why I wrote "that is exactly
>> what the translator is here for", meaning that the
>> translator is here to ensure that the translation
>> is idiomatic.
>>
>> The tool can help the translator to produce
>> a string that the translator will validate
>> as idiomatic.
>
> A file with common pitfalls and idiomatic
> translations can help both machine AND human
> translators find the idiomatic translation, or, if
> there isn't one really, what people would say
> instead...

The point I am trying to make (and I guess Jean-Christophe, too) is that
most of the time, a professional translator won't _need_ such a file,
since knowing "common pitfalls and idiomatic translations" by heart is
one of the most important parts of their profession anyway. What you
suggest is like giving a C programmer a cheat sheet with the syntax of
`if', `for` and `while`.

Emanuel Berg

unread,
Jun 5, 2020, 8:12:11 AM6/5/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> The point I am trying to make (and I guess
> Jean-Christophe, too) is that most of the time,
> a professional translator won't _need_ such a file,
> since knowing "common pitfalls and idiomatic
> translations" by heart is one of the most important
> parts of their profession anyway.

OK, change that from a file with "common pitfalls and
idiomatic translations" to a file with EVERYTHING and
you get what the pros use.

That makes more sense: while the pros know the common
pitfalls by heart as you say, not even they can be
expected to know everything!

Marcin Borkowski

unread,
Jun 5, 2020, 1:20:18 PM6/5/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-05, at 02:43, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> But I still don't understand why to have virtually
> the same code thrice...

If you find a good way to refactor it so that it is more DRY, I'd love
to see it. That code was written as a PoC quickly and without a lot of
thought.

>>> Well, that's a task for a parser rather to convert
>>> between one format to another... very mechanical
>>> and easy.
>>
>> Have you ever done a real-world translation like
>> this...?
>
> Every time I ever compiled source code for example.
> Also browsing the web involves the browser parsing
> HTML and CSS. It happens all the time when using
> a computer.

Obviously, I meant "translation" as in "translation between two natural
languages, done by a human being". I'd bet you did not, otherwise you'd
never say that it is "mechanical and easy".

Marcin Borkowski

unread,
Jun 5, 2020, 1:32:58 PM6/5/20
to Emanuel Berg, help-gn...@gnu.org, Ingemar Holmgren

On 2020-06-05, at 09:55, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> But even with an Emacs keyboard-only interface, while
> an IMMENSE improvement, it still doesn't sound like
> fun :( Reading, thinking, and typing, all at once,
> almost, that sounds like fun. But getting a selection
> from a database and picking the best alternative,
> that just doesn't appeal to me.

Well, it is not question of "fun". It's a question of a job that needs
to be done.

Even taking that into account, the job of a translator can be a nice
one. I sometimes got a huge satisfaction when I managed to find a nice
translation of some wordplay, for instance.

And rememeber that "translation" is a very general term. Even though
I'm not a professional translator, I did a few of them in my life.
Translating:

- a scientific paper
- an RPG manual
- subtitles for a movie
- an interview for a journal

are all _very_ different tasks. In the first case, the repetitiveness
(and hence opportunity to use CAT's memory) is huge. It is smaller in
the second one, but then the main challenge is (a) coming up with good
translations of certain terms, and (b) being consistent with them,
especially if half the way through you need to modify your translations.
(I don't know if professional CATs would help me with the changes of my
"glossary"...)

In the last two ones, the repetitiveness is almost non-existent (though
my humble attempt at simplifying my job with Emacs - the one with
highlighting sentences - helps quite a bit).

Yuri Khan

unread,
Jun 5, 2020, 1:47:48 PM6/5/20
to Marcin Borkowski, Emanuel Berg, help-gnu-emacs, Ingemar Holmgren
On Sat, 6 Jun 2020 at 00:32, Marcin Borkowski <mb...@mbork.pl> wrote:
> Translating:
>
> - a scientific paper
> - an RPG manual
> - subtitles for a movie
> - an interview for a journal
>
> are all _very_ different tasks.
> In the last two ones, the repetitiveness is almost non-existent

Hey, you don’t say that. Single-shot closed-plot movies maybe, but as
soon as a sequel/prequel/sidequel is made, you get into translation
continuity story. Same with TV shows.

Jean-Christophe Helary

unread,
Jun 5, 2020, 7:24:16 PM6/5/20
to Marcin Borkowski, Emanuel Berg, Help Gnu Emacs mailing list, Ingemar Holmgren


> On Jun 6, 2020, at 2:32, Marcin Borkowski <mb...@mbork.pl> wrote:
>
> And rememeber that "translation" is a very general term. Even though
> I'm not a professional translator, I did a few of them in my life.
> Translating:
>
> - a scientific paper
> - an RPG manual
> - subtitles for a movie
> - an interview for a journal
>
> are all _very_ different tasks. In the first case, the repetitiveness
> (and hence opportunity to use CAT's memory) is huge. It is smaller in
> the second one, but then the main challenge is (a) coming up with good
> translations of certain terms, and (b) being consistent with them,
> especially if half the way through you need to modify your translations.
> (I don't know if professional CATs would help me with the changes of my
> "glossary"...)

I can only talk about OmegaT, but you can (and I do) search in source for a term and in target for another, filter the result and edit at will, or do simple replacements. Since the glossary is a text file you can modify it at will, you can do automatic checks to see if your translation respects the glossary and here again edit at will, etc.

> In the last two ones, the repetitiveness is almost non-existent (though
> my humble attempt at simplifying my job with Emacs - the one with
> highlighting sentences - helps quite a bit).

Indeed. Repetitiveness is one aspect of what CATs help with. Not missing a sentence is another one. I always use a cat when I translate, regardless of the characteristics of the document.

Marcin Borkowski

unread,
Jun 6, 2020, 4:01:13 AM6/6/20
to Yuri Khan, Emanuel Berg, help-gnu-emacs, Ingemar Holmgren

On 2020-06-05, at 19:47, Yuri Khan <yuri....@gmail.com> wrote:

> On Sat, 6 Jun 2020 at 00:32, Marcin Borkowski <mb...@mbork.pl> wrote:
>> Translating:
>>
>> - a scientific paper
>> - an RPG manual
>> - subtitles for a movie
>> - an interview for a journal
>>
>> are all _very_ different tasks.
>> In the last two ones, the repetitiveness is almost non-existent
>
> Hey, you don’t say that. Single-shot closed-plot movies maybe, but as
> soon as a sequel/prequel/sidequel is made, you get into translation
> continuity story. Same with TV shows.

Fair enough. I only translated subtitles for one movie w/o a sequel, so
consistemcy was not a big problem (but I admit, there were a few names
of places etc., so the repetitiveness was there, too.)

Emanuel Berg

unread,
Jun 7, 2020, 5:16:07 PM6/7/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> If you find a good way to refactor it so that it is
> more DRY, I'd love to see it.

;;; -*- lexical-binding: t -*-
;;;
;;; this file:
;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el

(defvar sentence-overlay nil)

(defun remove-highlight ()
(interactive)
(when (overlayp sentence-overlay)
(delete-overlay sentence-overlay) ))

(defun highlight-sentence ()
(interactive)
(remove-highlight)
(let ((overlay (make-overlay (progn (forward-sentence) (point))
(progn (backward-sentence) (point)) )))
(overlay-put overlay 'face 'font-lock-function-name-face)
(setq sentence-overlay overlay) ))

(defun highlight-sentence-next ()
(interactive)
(forward-sentence)
(highlight-sentence) )

(defun highlight-sentence-prev ()
(interactive)
(backward-sentence)
(highlight-sentence) )

Eric Abrahamsen

unread,
Jun 7, 2020, 5:37:38 PM6/7/20
to help-gn...@gnu.org
Emanuel Berg via Users list for the GNU Emacs text editor
<help-gn...@gnu.org> writes:

> Marcin Borkowski wrote:
>
>> If you find a good way to refactor it so that it is
>> more DRY, I'd love to see it.
>
> ;;; -*- lexical-binding: t -*-
> ;;;
> ;;; this file:
> ;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
> ;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el
>
> (defvar sentence-overlay nil)
>
> (defun remove-highlight ()
> (interactive)
> (when (overlayp sentence-overlay)
> (delete-overlay sentence-overlay) ))
>
> (defun highlight-sentence ()
> (interactive)
> (remove-highlight)
> (let ((overlay (make-overlay (progn (forward-sentence) (point))
> (progn (backward-sentence) (point)) )))
> (overlay-put overlay 'face 'font-lock-function-name-face)
> (setq sentence-overlay overlay) ))

You don't need to delete the overlay and make it again each time, just
use `move-overlay'. You can also move an overlay that's been deleted,
essentially recreating it, so long as you still have a reference to it.

How do I know? Because this thread finally motivated me to finish the
translation environment package I've been thinking of for a while, and I
just wrote pretty much the same code a couple of days ago :)

Emanuel Berg

unread,
Jun 7, 2020, 5:55:28 PM6/7/20
to help-gn...@gnu.org
Eric Abrahamsen wrote:

> You don't need to delete the overlay and make it
> again each time, just use `move-overlay'.

Alrighty, how about this? Only highlight-sentence has
changed...

;;; -*- lexical-binding: t -*-
;;;
;;; this file:
;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el

(defvar sentence-overlay nil)

(defun remove-highlight ()
(interactive)
(when (overlayp sentence-overlay)
(delete-overlay sentence-overlay) ))

(defun highlight-sentence ()
(interactive)
(let ((beg (progn (forward-sentence) (point)))
(end (progn (backward-sentence) (point))) )
(if (overlayp sentence-overlay)
(move-overlay sentence-overlay beg end)
(let ((overlay (make-overlay beg end)))
(overlay-put overlay 'face 'font-lock-function-name-face)
(setq sentence-overlay overlay) ))))

(defun highlight-sentence-next ()
(interactive)
(forward-sentence)
(highlight-sentence) )

(defun highlight-sentence-prev ()
(interactive)
(backward-sentence)
(highlight-sentence) )

> How do I know? Because this thread finally
> motivated me to finish the translation environment
> package I've been thinking of for a while, and
> I just wrote pretty much the same code a couple of
> days ago :)

Well... OK :)

Eric Abrahamsen

unread,
Jun 7, 2020, 6:24:59 PM6/7/20
to help-gn...@gnu.org
Emanuel Berg via Users list for the GNU Emacs text editor
<help-gn...@gnu.org> writes:

Looks good to me!

>> How do I know? Because this thread finally
>> motivated me to finish the translation environment
>> package I've been thinking of for a while, and
>> I just wrote pretty much the same code a couple of
>> days ago :)
>
> Well... OK :)

Maybe some merging can be done when everyone's finished their version...

Emanuel Berg

unread,
Jun 7, 2020, 6:30:17 PM6/7/20
to help-gn...@gnu.org
Eric Abrahamsen wrote:

> Maybe some merging can be done when everyone's
> finished their version...

Like I said, the interface will be the easy part.
The database and algorithm will be worse...

Marcin Borkowski

unread,
Jun 8, 2020, 2:05:13 AM6/8/20
to Emanuel Berg, help-gn...@gnu.org
Thanks, but this won't work in this form, as you don't move the point to
the overlay before. But the idea LGTM. As I said, I wrote my code in
a hurry and spent more time thinking how to wrap it in a minor mode than
how to make the commands themselves better.

Best,

On 2020-06-07, at 23:15, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>> If you find a good way to refactor it so that it is
>> more DRY, I'd love to see it.
>
> ;;; -*- lexical-binding: t -*-
> ;;;
> ;;; this file:
> ;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
> ;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el
>
> (defvar sentence-overlay nil)
>
> (defun remove-highlight ()
> (interactive)
> (when (overlayp sentence-overlay)
> (delete-overlay sentence-overlay) ))
>
> (defun highlight-sentence ()
> (interactive)
> (remove-highlight)
> (let ((overlay (make-overlay (progn (forward-sentence) (point))
> (progn (backward-sentence) (point)) )))
> (overlay-put overlay 'face 'font-lock-function-name-face)
> (setq sentence-overlay overlay) ))
>
> (defun highlight-sentence-next ()
> (interactive)
> (forward-sentence)
> (highlight-sentence) )
>
> (defun highlight-sentence-prev ()
> (interactive)
> (backward-sentence)
> (highlight-sentence) )


Emanuel Berg

unread,
Jun 8, 2020, 2:31:11 AM6/8/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> Thanks, but this won't work in this form

It works, check out this dump:
<https://dataswamp.org/~incal/pimgs/cat.png>

> as you don't move the point to the overlay before.

... not sure I'm following, but there is only one
overlay, point is moved both ways with
`forward-sentence' and `backward-sentence', the
overlay is moved with `move-overlay'. point is always
at the beginning of a sentence.

Emanuel Berg

unread,
Jun 8, 2020, 7:59:28 AM6/8/20
to help-gn...@gnu.org
> It works, check out this dump:
> <https://dataswamp.org/~incal/pimgs/cat.png>

One thing tho, M. Helary said something about the
chopping up of the input into segments, my intuition
tells me they are shorter (the input more
segmentized) than what you get with
`forward-sentence' and `backward-sentence'. (My
intuition also tells me backward-sentence is
(forward-sentence -1) ...)

Maybe `sentence-end' already has been configured
somewhere to get the most restrictive definition,
i.e., here, with the purpose of getting the shortest
possible segments that still make sense...

Emanuel Berg

unread,
Jun 8, 2020, 8:15:26 AM6/8/20
to help-gn...@gnu.org
> One thing tho, M. Helary said something about the
> chopping up of the input into segments, my
> intuition tells me they are shorter (the input more
> segmentized) than what you get with
> `forward-sentence' and `backward-sentence'. (My
> intuition also tells me backward-sentence is
> (forward-sentence -1) ...)
>
> Maybe `sentence-end' already has been configured
> somewhere to get the most restrictive definition,
> i.e., here, with the purpose of getting the
> shortest possible segments that still make sense...

Unless... unless the DB is really fined
tuned already. Then we should do our own segmentation
rules, we should get the exact same as they (OmegaT
or whoever has it) uses...

Jean-Christophe Helary

unread,
Jun 8, 2020, 8:35:11 AM6/8/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 8, 2020, at 21:15, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
>> One thing tho, M. Helary said something about the
>> chopping up of the input into segments, my
>> intuition tells me they are shorter (the input more
>> segmentized) than what you get with
>> `forward-sentence' and `backward-sentence'. (My
>> intuition also tells me backward-sentence is
>> (forward-sentence -1) ...)
>>
>> Maybe `sentence-end' already has been configured
>> somewhere to get the most restrictive definition,
>> i.e., here, with the purpose of getting the
>> shortest possible segments that still make sense...
>
> Unless... unless the DB is really fined
> tuned already. Then we should do our own segmentation
> rules, we should get the exact same as they (OmegaT
> or whoever has it) uses...

CAT segmentation rules are defined by the SRX standard. They are basically a set of cascading regex rules (break/don't break).

It is possible to fine-tune a translation by modifying a rule set before or during the translation.

Marcin Borkowski

unread,
Jun 8, 2020, 3:45:34 PM6/8/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-08, at 08:30, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>> Thanks, but this won't work in this form
>
> It works, check out this dump:
> <https://dataswamp.org/~incal/pimgs/cat.png>
>
>> as you don't move the point to the overlay before.
>
> ... not sure I'm following, but there is only one
> overlay, point is moved both ways with
> `forward-sentence' and `backward-sentence', the
> overlay is moved with `move-overlay'. point is always
> at the beginning of a sentence.

Well, it works, but it does not what I would want it to do. (Well,
I haven't tested it, but I am pretty sure, since the code is quite
simple.)

What I want is this: I have the source in, say, buffer A, and I perform
the translation in, say, buffer B. The overlay is in buffer A, and
I want to move it to the next sentence (in buffer A) _without moving the
point away from buffer B_.

My code does exactly that (even if inefficiently).

Best,

Emanuel Berg

unread,
Jun 8, 2020, 4:44:17 PM6/8/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> What I want is this: I have the source in, say,
> buffer A, and I perform the translation in, say,
> buffer B. The overlay is in buffer A, and I want to
> move it to the next sentence (in buffer A) _without
> moving the point away from buffer B_.

Well, technically, there is a point in every buffer
(and window, so there can be several for one buffer
even), but I understand what you mean:

;;; -*- lexical-binding: t -*-
;;;
;;; this file:
;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el

(defvar sentence-overlay nil)

(defvar source-buffer nil)

(defun set-source-buffer ()
(interactive)
(setq source-buffer (current-buffer)) )

(defun remove-highlight ()
(interactive)
(when (overlayp sentence-overlay)
(delete-overlay sentence-overlay) ))

(defun highlight-sentence ()
(let ((beg (progn (forward-sentence) (point)))
(end (progn (forward-sentence -1) (point))) )
(if (overlayp sentence-overlay)
(move-overlay sentence-overlay beg end)
(let ((overlay (make-overlay beg end)))
(overlay-put overlay 'face 'font-lock-comment-face)
(setq sentence-overlay overlay) ))))

(defun highlight-sentence-move (next)
(if (bufferp source-buffer)
(with-current-buffer source-buffer
(forward-sentence (if next 1 -1))
(highlight-sentence) )
(error "source-buffer not set") ))

(defun highlight-sentence-next ()
(interactive)
(highlight-sentence-move t) )

(defun highlight-sentence-prev ()
(interactive)
(highlight-sentence-move nil) )

Marcin Borkowski

unread,
Jun 9, 2020, 3:33:23 PM6/9/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-08, at 22:44, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>> What I want is this: I have the source in, say,
>> buffer A, and I perform the translation in, say,
>> buffer B. The overlay is in buffer A, and I want to
>> move it to the next sentence (in buffer A) _without
>> moving the point away from buffer B_.
>
> Well, technically, there is a point in every buffer
> (and window, so there can be several for one buffer
> even), but I understand what you mean:

Well, if that code works for you, then great. But it has two issues
from my POV.

1. It is inefficient, in the sense that every overlay belongs to some
buffer. No need to keep the variable `source-buffer'.

2. Your code seems to assume that the point in the source buffer lies
within the highlighted sentence, no? This need not be true, since

3. the source buffer may be (in my use-case) the same as the destination
buffer, i.e., I sometimes keep both the English and Polish (let's say)
versions in the same file. My code covers that case as well as two
separate buffers.

BTW, I like the trick with negative one as argument to
`forward-sentence'. I didn't think about it, I guess that may be
exactly the missing piece I looked for when I suspected my code has too
much repetition. Thanks!

Best,
mb
Marcin Borkowski
http://mbork.pl

Emanuel Berg

unread,
Jun 9, 2020, 4:54:58 PM6/9/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> 1. It is inefficient, in the sense that every
> overlay belongs to some buffer. No need to keep the
> variable `source-buffer'.

I don't understand?

> 2. Your code seems to assume that the point in the
> source buffer lies within the highlighted sentence,
> no? This need not be true, since
>
> 3. the source buffer may be (in my use-case) the
> same as the destination buffer, i.e., I sometimes
> keep both the English and Polish (let's say)
> versions in the same file. My code covers that case
> as well as two separate buffers.

Right, but I think its a good idea to keep them
apart. And then have different modes...

Still, one can do that as well... with a variable
instead of point to keep track of it the overlay.
Maybe editing screws it up. Its just better to have
different buffers for different purposes, then people
can also rearrange stuff visually more easily, and
many other advantages...

Emanuel Berg

unread,
Jun 9, 2020, 5:03:30 PM6/9/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> 2. Your code seems to assume that the point in the
> source buffer lies within the highlighted sentence,
> no? This need not be true

It doesn't assume it but that seems to be what's
happening in part...

When you go to another sentence point moves, but then
you can move point around and the overlay is
still there.

Again I think one could do it easily with a variable
instead of point but again what's the problem with
point following the currently highlighted sentence?

Doesn't that even makes sense?

Emanuel Berg

unread,
Jun 9, 2020, 5:25:24 PM6/9/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> CAT segmentation rules are defined by the SRX
> standard. They are basically a set of cascading
> regex rules (break/don't break).

OK? do (sentence-end) to get, maybe,

"\\([.?!…‽][]\"'”’)}]*\\($\\|[  ]\\)\\|[。.?!]+\\)[  
]"*

Because "ll paragraph boundaries also end sentences" ...

paragraph-separate is a variable defined in
‘paragraphs.el’. Its value is "[ ]*$"

This variable is safe as a file local variable if
its value satisfies the predicate ‘stringp’.

Documentation:
Regexp for beginning of a line that separates
paragraphs. If you change this, you may have to
change ‘paragraph-start’ also.

So we'll just change it to comply with the SRX
standard :)

> It is possible to fine-tune a translation by
> modifying a rule set ...

It is possible but with Emacs, even more so :)

Jean-Christophe Helary

unread,
Jun 9, 2020, 7:29:05 PM6/9/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 10, 2020, at 6:13, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> CAT segmentation rules are defined by the SRX
>> standard. They are basically a set of cascading
>> regex rules (break/don't break).
>
> Documentation:
> Regexp for beginning of a line that separates
> paragraphs. If you change this, you may have to
> change ‘paragraph-start’ also.
>
> So we'll just change it to comply with the SRX
> standard :)

Sure. You may want to take a look at the standard.

Emanuel Berg

unread,
Jun 9, 2020, 8:12:37 PM6/9/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> Sure. You may want to take a look at the standard.

I think you know it better...

Emanuel Berg

unread,
Jun 9, 2020, 9:02:59 PM6/9/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> Well, if that code works for you, then great.
> But it has two issues from my POV.
>
> 1. It is inefficient, in the sense that every
> overlay belongs to some buffer. No need to keep the
> variable `source-buffer'.

Ah, now I understand what I mean, well, I don't know
if it really makes it more efficient to extract it
every time (depends on definition of efficiency) but
maybe the code gets prettier that way, so OK:

;;; -*- lexical-binding: t -*-
;;;
;;; this file:
;;; http://user.it.uu.se/~embe8573/emacs-init/incal-ecat.el
;;; https://dataswamp.org/~incal/emacs-init/incal-ecat.el

(defvar sentence-overlay nil)

(defun remove-highlight ()
(interactive)
(when (overlayp sentence-overlay)
(delete-overlay sentence-overlay) ))

(defun highlight-sentence ()
(interactive)
(let ((beg (progn (forward-sentence) (point)))
(end (progn (forward-sentence -1) (point))) )
(if (overlayp sentence-overlay)
(move-overlay sentence-overlay beg end)
(let ((overlay (make-overlay beg end)))
(overlay-put overlay 'face 'font-lock-comment-face)
(setq sentence-overlay overlay) ))))
(defalias 'hs-init #'highlight-sentence)

(defun highlight-sentence-move (next)
(if (overlayp sentence-overlay)
(with-current-buffer (overlay-buffer sentence-overlay)
(forward-sentence (if next 1 -1))
(highlight-sentence) )
(highlight-sentence) ))

(defun highlight-sentence-next ()
(interactive)
(highlight-sentence-move t) )

(defun highlight-sentence-prev ()
(interactive)
(highlight-sentence-move nil) )

> BTW, I like the trick with negative one as argument
> to `forward-sentence'.

Right, its cool :)

Emanuel Berg

unread,
Jun 9, 2020, 9:40:10 PM6/9/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> Sure. You may want to take a look at the standard.

Do you have the actual set of rules?

I found this quote:

SRX make use of the ICU Regular Expression
syntax,^[3] but not all programming languages
support all ICU expressions, making implementing
SRX in some languages difficult or impossible.
Java is an example of this. [1]

Heh, poor Java, well if I had the rules I'm pretty
confident we can implement them in one form or
another...

And I found a list

* Pangolin is a free open-source SRX editor.
<https://github.com/davidmason/Pangolin>

* Ratel is a free open-source and cross-platform
application to create and maintain SRX 2.0 files [...]
<http://okapiframework.org/wiki/index.php?title=Ratel>

* SRXEditor is a free open source cross-platform
editor of segmentation rules by Maxprograms,
designed to use Segmentation Rules eXchange (SRX) 2.0.
<http://www.maxprograms.com/products/srxeditor.html> [1]

None of that is in the Debian repos what I can
see...

No mention of SRX in the [M]ELPAs and no (?) good
Google hits for Emacs and SRX.

Is it this file: [2] Then why the archive link? (from
the SRXEditor page, see URL above) Is this standard
obsoleted or unofficial, perhaps?


[1] https://en.wikipedia.org/wiki/SRX_Segmentation_Rules_eXchage_LISA_OSCAR_XML_based_Standard
[2] http://web.archive.org/web/20090709131535/http://www.lisa.org/fileadmin/standards/srx20.html

Emanuel Berg

unread,
Jun 9, 2020, 10:03:27 PM6/9/20
to help-gn...@gnu.org
Re: point, one could have one pair of functions
that just move the overlay in terms of the overlay,
next/prev sentence simple as that.

Needs a recenter in terms of the overlay as well...

Then one could have another function that sets the
overlay to where point is (i.e., init/reset)...

Jean-Christophe Helary

unread,
Jun 9, 2020, 11:22:56 PM6/9/20
to Emanuel Berg, Help Gnu Emacs mailing list


> On Jun 10, 2020, at 9:12, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> Sure. You may want to take a look at the standard. (SRX)
>
> I think you know it better...

Just a bit. What matters here is that if you want to have a "compatible" thing, thinking about standards is important. If it does not matter (and.I don't think it matters much in Marcin's application), then whatever is practical is fine.

Jean-Christophe Helary

unread,
Jun 9, 2020, 11:29:14 PM6/9/20
to Emanuel Berg, Help Gnu Emacs mailing list
> On Jun 10, 2020, at 10:39, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> Sure. You may want to take a look at the standard.
>
> Do you have the actual set of rules?
>
> I found this quote:
>
> SRX make use of the ICU Regular Expression
> syntax,^[3] but not all programming languages
> support all ICU expressions, making implementing
> SRX in some languages difficult or impossible.
> Java is an example of this. [1]
>
> Heh, poor Java, well if I had the rules I'm pretty
> confident we can implement them in one form or
> another...
>
> And I found a list
>
> * Pangolin is a free open-source SRX editor.
> <https://github.com/davidmason/Pangolin>

Interesting. Web based and based on Ratel (below)

> * Ratel is a free open-source and cross-platform
> application to create and maintain SRX 2.0 files [...]
> <http://okapiframework.org/wiki/index.php?title=Ratel>

The Okapi Framework should really be in Debian :) Some of its main contributors were editors for related standards.

> * SRXEditor is a free open source cross-platform
> editor of segmentation rules by Maxprograms,
> designed to use Segmentation Rules eXchange (SRX) 2.0.
> <http://www.maxprograms.com/products/srxeditor.html> [1]

Maxprograms has been releasing its code as FOSS for a little while, only charging for the installers. The main developer was editor for a few related standards.

I mentioned both Okapi and Maxprograms in the mail where I wrote about the processes.

> None of that is in the Debian repos what I can
> see...
>
> No mention of SRX in the [M]ELPAs and no (?) good
> Google hits for Emacs and SRX.

Not really surprising.

> Is it this file: [2] Then why the archive link? (from
> the SRXEditor page, see URL above) Is this standard
> obsoleted or unofficial, perhaps?

SRX is not obsolete or unofficial, LISA has been disbanded, in 2011.

https://en.wikipedia.org/wiki/Localization_Industry_Standards_Association

Jean-Christophe
> --
> underground experts united
> http://user.it.uu.se/~embe8573
> https://dataswamp.org/~incal
>
>

Emanuel Berg

unread,
Jun 9, 2020, 11:48:03 PM6/9/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> SRX is not obsolete or unofficial, LISA has been
> disbanded, in 2011.

OK, but where are the actual rules?

Jean-Christophe Helary

unread,
Jun 9, 2020, 11:57:31 PM6/9/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 10, 2020, at 12:47, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> SRX is not obsolete or unofficial, LISA has been
>> disbanded, in 2011.
>
> OK, but where are the actual rules?

SRX is a standard that defines how to create rules, it is not a set of rules. Sorry if that was not clear.

The SRX DTD is here for ex:
https://github.com/rmraya/SRXEditor/blob/master/catalog/srx/srx.dtd

The SRX schema for version 2.1 is
https://github.com/rmraya/SRXEditor/blob/master/catalog/srx/srx21.xsd


The default SRX rules for Maxprograms' OpenXLIFF is:
https://github.com/rmraya/OpenXLIFF/blob/master/srx/default.srx

Emanuel Berg

unread,
Jun 10, 2020, 12:13:35 AM6/10/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> SRX is a standard that defines how to create rules,
> it is not a set of rules. Sorry if that was
> not clear.

Please post the definition of a segment. If there
isn't any onee definition we might as well just use
the Emacs default sentence definition already posted
in the thread, case closed.

Jean-Christophe Helary

unread,
Jun 10, 2020, 2:20:15 AM6/10/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 10, 2020, at 13:13, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> SRX is a standard that defines how to create rules,
>> it is not a set of rules. Sorry if that was
>> not clear.
>
> Please post the definition of a segment.

You define the segment the way you want.

SRX is a definition format.

> If there
> isn't any onee definition we might as well just use
> the Emacs default sentence definition already posted
> in the thread, case closed.

There never was a case in the first place :)

Emacs default sentence definition can be expressed as a SRX rules set. SRX is useful for eXchanging information. But the format you use internally is irrelevant.

Emanuel Berg

unread,
Jun 10, 2020, 7:49:43 AM6/10/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

> Emacs default sentence definition can be expressed
> as a SRX rules set. SRX is useful for eXchanging
> information. But the format you use internally
> is irrelevant.

The Emacs sentence format and associated functions
certainly makes sense to use but with tweaked
settings to get in particular shorter segments
I think would benefit both searching the DB and
getting better results.

For example this is one sentence by the default
rules:

Both of these are library functions that do a lot
under the hood, but if they don't meet your needs
(or you just want to experiment and learn) you can
also use system calls directly.

But, would you get a good translation suggestion out
of all of that?

Or is it better to simplify in terms of the computer,
and increase the interactivity/human checking by
feeding

1. Both of these are library functions
2. that do a lot under the hood
3. but if they don't meet your needs
4. or you just want to experiment and learn
5. you can also use system calls directly

?

Anyway, if there isn't anything let's drop this, what
remains is the DB of translation suggestions, and the
algorithm to search and quantify, so e.g., for (1),
one would get, if one translates into Swedish

segment: Both of these are library functions hit
----------------------------------------------------------
suggestion 1 [a]: Båda två är biblioteksfunktioner 90%
suggestion 2 [s]: Båda två är bibliotekets funktioner 7%
... ... ...
suggestion n [~]: Den här översättningen suger 1%

Then one would hit [a] to insert suggestion 1!

So yes, where do you get the database?

to...@tuxteam.de

unread,
Jun 10, 2020, 9:35:18 AM6/10/20
to help-gn...@gnu.org
On Tue, Jun 09, 2020 at 10:54:44PM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:
> Marcin Borkowski wrote:
>
> > 1. It is inefficient, in the sense that every
> > overlay belongs to some buffer. No need to keep the
> > variable `source-buffer'.
>
> I don't understand?

The overlay already "knows" to which buffer it belongs. See
function (overlay-buffer OVERLAY).

Cheers
-- t
signature.asc

to...@tuxteam.de

unread,
Jun 10, 2020, 9:38:16 AM6/10/20
to help-gn...@gnu.org
Uh -- I see downtrhead you already got it. Sorry for chiming in late
to the party.

Cheers
-- t
signature.asc

Jean-Christophe Helary

unread,
Jun 10, 2020, 9:39:53 AM6/10/20
to Emanuel Berg, help-gn...@gnu.org


> On Jun 10, 2020, at 20:49, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:
>
> Jean-Christophe Helary wrote:
>
>> Emacs default sentence definition can be expressed
>> as a SRX rules set. SRX is useful for eXchanging
>> information. But the format you use internally
>> is irrelevant.
>
> The Emacs sentence format and associated functions
> certainly makes sense to use but with tweaked
> settings to get in particular shorter segments
> I think would benefit both searching the DB and
> getting better results.

Sure, but it's not trivial to find "natural" subsegments with the tools at hand.

> For example this is one sentence by the default
> rules:
>
> Both of these are library functions that do a lot
> under the hood, but if they don't meet your needs
> (or you just want to experiment and learn) you can
> also use system calls directly.
>
> But, would you get a good translation suggestion out
> of all of that?

It depends on what's been translated already, I guess.

> Or is it better to simplify in terms of the computer,
> and increase the interactivity/human checking by
> feeding
>
> 1. Both of these are library functions
> 2. that do a lot under the hood
> 3. but if they don't meet your needs
> 4. or you just want to experiment and learn
> 5. you can also use system calls directly
>
> ?

I honestly have no idea how complex matching algorithms work to produce subsegment matches.

> Anyway, if there isn't anything let's drop this, what
> remains is the DB of translation suggestions, and the
> algorithm to search and quantify, so e.g., for (1),
> one would get, if one translates into Swedish
>
> segment: Both of these are library functions hit
> ----------------------------------------------------------
> suggestion 1 [a]: Båda två är biblioteksfunktioner 90%
> suggestion 2 [s]: Båda två är bibliotekets funktioner 7%
> ... ... ...
> suggestion n [~]: Den här översättningen suger 1%
>
> Then one would hit [a] to insert suggestion 1!
>
> So yes, where do you get the database?

In most of the cases, it's something the translator build from her own translations.

And then, there is the matching algorithm that produces the most relevant match.

Emanuel Berg

unread,
Jun 10, 2020, 11:33:35 AM6/10/20
to help-gn...@gnu.org
Jean-Christophe Helary wrote:

>> The Emacs sentence format and associated functions
>> certainly makes sense to use but with tweaked
>> settings to get in particular shorter segments
>> I think would benefit both searching the DB and
>> getting better results.
>
> Sure, but it's not trivial to find "natural"
> subsegments with the tools at hand.

That's why I hoped someone already did it :(

Well, obviously someone did!

It isn't trivial, no, and regexps will only get you
that far. Here one should get proper parsing IMO.

But let's postpone this for now...

>> So yes, where do you get the database?
>
> In most of the cases, it's something the translator
> build from her own translations.

Really? That's cool, then you can have your own style
from day 1, and the more you do it, the stronger you
get...

Any suggestions as to the format of the database?

Just because a database can be as simple as a text
file [1] it doesn't mean it should be, always.


[1] https://dataswamp.org/~incal/bike/TIRE

Marcin Borkowski

unread,
Jun 10, 2020, 4:57:11 PM6/10/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-09, at 22:54, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>> 1. It is inefficient, in the sense that every
>> overlay belongs to some buffer. No need to keep the
>> variable `source-buffer'.
>
> I don't understand?

Check out the function `overlay-buffer'.

>> 2. Your code seems to assume that the point in the
>> source buffer lies within the highlighted sentence,
>> no? This need not be true, since
>>
>> 3. the source buffer may be (in my use-case) the
>> same as the destination buffer, i.e., I sometimes
>> keep both the English and Polish (let's say)
>> versions in the same file. My code covers that case
>> as well as two separate buffers.
>
> Right, but I think its a good idea to keep them
> apart. And then have different modes...

It depends. How about preapring a LaTeX file with two language versions
typeset side-by-side? (There are other use-cases where having both
language versions in the same file makes sense.)

> Still, one can do that as well... with a variable
> instead of point to keep track of it the overlay.
> Maybe editing screws it up. Its just better to have
> different buffers for different purposes, then people
> can also rearrange stuff visually more easily, and
> many other advantages...

But my solution, in which I only use the overlay itself (no point, no
markers, no variables pointing to buffers) seems to cover all such
cases, and in quite an elegant way.

Best,

Emanuel Berg

unread,
Jun 10, 2020, 5:28:36 PM6/10/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

>> Right, but I think its a good idea to keep them
>> apart. And then have different modes...
>
> It depends. How about preapring a LaTeX file with two
> language versions typeset side-by-side?

Better to use the translation tool somewhere else,
when its done one can do whatever with the material
including typesetting. LaTeX mode and markup will
just complicate everything.

Do it all in one buffer in general: DNC

>> Still, one can do that as well... with a variable
>> instead of point to keep track of it the overlay.
>> Maybe editing screws it up. Its just better to
>> have different buffers for different purposes,
>> then people can also rearrange stuff visually more
>> easily, and many other advantages...
>
> But my solution, in which I only use the overlay
> itself (no point, no markers, no variables pointing
> to buffers) seems to cover all such cases, and in
> quite an elegant way.

There is only one variable and that holds the
overlay. Moving point I consider a good thing, only
perhaps one should keep two sets of functions, to
make it more clear what happens and when, and make it
more versatile.

As for your solution, I have only seen one version,
the first you posted here...

Emanuel Berg

unread,
Jun 10, 2020, 5:30:09 PM6/10/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> Check out the function `overlay-buffer'.

You mean like this:

Marcin Borkowski

unread,
Jun 12, 2020, 3:46:56 PM6/12/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-10, at 23:28, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>>> Right, but I think its a good idea to keep them
>>> apart. And then have different modes...
>>
>> It depends. How about preapring a LaTeX file with two
>> language versions typeset side-by-side?
>
> Better to use the translation tool somewhere else,
> when its done one can do whatever with the material
> including typesetting. LaTeX mode and markup will
> just complicate everything.

Well, to each his own.

> Do it all in one buffer in general: DNC
>
>>> Still, one can do that as well... with a variable
>>> instead of point to keep track of it the overlay.
>>> Maybe editing screws it up. Its just better to
>>> have different buffers for different purposes,
>>> then people can also rearrange stuff visually more
>>> easily, and many other advantages...
>>
>> But my solution, in which I only use the overlay
>> itself (no point, no markers, no variables pointing
>> to buffers) seems to cover all such cases, and in
>> quite an elegant way.
>
> There is only one variable and that holds the
> overlay. Moving point I consider a good thing, only
> perhaps one should keep two sets of functions, to
> make it more clear what happens and when, and make it
> more versatile.

It may be the case that moving point is a good idea, too, yes.

> As for your solution, I have only seen one version,
> the first you posted here...

Yes, I mean that one. Also, the new version here:
https://github.com/mbork/emacs-cat.

Emanuel Berg

unread,
Jun 12, 2020, 6:02:31 PM6/12/20
to help-gn...@gnu.org
Marcin Borkowski wrote:

> It may be the case that moving point is a good
> idea, too, yes.

I'm gonna drop this project for now, and maybe pick
it up if I even get the opportunity/a specific reason
to use it.

The main problem is the segments/sentences are too
long so inserting them into a database and hoping for
them to appear the exact same way and thus be a help
against repetition is vain. But if one could solve
that, with some sort of grammar parser to get
subsegments out of the sentences, then I think one
could do quite an efficient Emacs CAT with not that
much of an effort...

> Yes, I mean that one. Also, the new version here:
> https://github.com/mbork/emacs-cat

Byte compiler warnings:

emacs-cat.el:
In toplevel form:
emacs-cat.el:35:1:Warning: defface for
‘emacs-cat-highlight-face’ fails to specify
containing group

In emacs-cat-highlight-next-sentence:
emacs-cat.el:57:4:Warning: Use ‘with-current-buffer’
rather than save-excursion+set-buffer

In emacs-cat-highlight-previous-sentence:
emacs-cat.el:66:4:Warning: Use ‘with-current-buffer’
rather than save-excursion+set-buffer

emacs-cat.el:78:7:Warning: assignment to free
variable ‘emacs-cat-basic-map’

emacs-cat.el:80:13:Warning: reference to free
variable ‘emacs-cat-basic-map’

Marcin Borkowski

unread,
Jun 12, 2020, 6:23:55 PM6/12/20
to Emanuel Berg, help-gn...@gnu.org

On 2020-06-13, at 00:02, Emanuel Berg via Users list for the GNU Emacs text editor <help-gn...@gnu.org> wrote:

> Marcin Borkowski wrote:
>
>> It may be the case that moving point is a good
>> idea, too, yes.
>
> I'm gonna drop this project for now, and maybe pick
> it up if I even get the opportunity/a specific reason
> to use it.
>
> The main problem is the segments/sentences are too
> long so inserting them into a database and hoping for
> them to appear the exact same way and thus be a help
> against repetition is vain. But if one could solve
> that, with some sort of grammar parser to get
> subsegments out of the sentences, then I think one
> could do quite an efficient Emacs CAT with not that
> much of an effort...
>
>> Yes, I mean that one. Also, the new version here:
>> https://github.com/mbork/emacs-cat
>
> Byte compiler warnings:
>
> emacs-cat.el:
> In toplevel form:
> emacs-cat.el:35:1:Warning: defface for
> ‘emacs-cat-highlight-face’ fails to specify
> containing group

Easy to fix, will do.

> In emacs-cat-highlight-next-sentence:
> emacs-cat.el:57:4:Warning: Use ‘with-current-buffer’
> rather than save-excursion+set-buffer

I know about this one, and it looks like a bug in the compiler (I'll
expand on it later).

> emacs-cat.el:78:7:Warning: assignment to free
> variable ‘emacs-cat-basic-map’
>
> emacs-cat.el:80:13:Warning: reference to free
> variable ‘emacs-cat-basic-map’

Easy to fix, too.

Best,
It is loading more messages.
0 new messages