I'm glad that you like this work. We had the same difficulties you had
when we started using Moses (in fact, so big that the idea of MMM was
born ;-) ). You will need to have a good deal of aligned segments (in
our case, 14 million, but very good results already with just 6). Since
Moses uses "phrases" (that is, arbitrarily and sometimes very small
sequences of words - frequently just one or two - that have nothing to
do with our notion of grammar), I would not be too peculiar in my corpus
choices (though, if you want it to translate reasonably, you'll have to
find a way to put the representative vocabulary in it and I'd also
choose styles of texts adequate for the task at hand). Some languages
are much more forgiving than others (first and foremost the EN language,
but also the romance languages). Others, morphologically richer (DE, FI
and so on), might not give as good results.
I agree with your impression that the synergy between MT and TM is
important.
Thanks for your input,
Jo�o Rosas
On 11/10/10 11:11, Ignacio Garcia wrote:
> Jost Zetzsche publishes a very well read Tool Kit for professional
> translators dealing with technical matters. In its last 178th issue he
> refered to MMM
>
> +++
> Achim Ruopp brought to my attention Moses for Mere Mortals, a
> specially packaged and presumably much, much easier-to-use version of
> the open-source statistical MT Moses monster. So in theory, it's a
> free MT engine that is easy enough for the freelance translator or
> smallish LSP to use and feed with their own TM data and come out with
> some kind of results for specialized MT purposes. Again, I have no
> experience with it and would love some feedback for that as well.
> +++
>
> I've pasted below the feedback I sent to him responding to that:
>
> +++
> Hi Jost
> Since you asked�
> I�ve been using the Moses for Mere Mortals these last weeks � not for
> translating, though, but for research/training-related reasons.
> I tried to install Moses first from the step-by-step guide in
> statmt.org and run into all sort of problems (the guide is good, but
> my Linux literacy is limited). With much less effort I was able to put
> Moses together following the quite clear MMM guides, and gathering
> some available data from the statmt.org site. And it works!, although
> not as good as Google Translate - yet.
> MMM has also some Windows add-ins that allow to export Moses output
> into TMX, and to import TMX into the format Moses reads so, in theory,
> yes, it can be easily integrated into our TenT�s workflow.
> Would it be really useful, right now, for freelance translators?
> Perhaps for those who had collected over the years (or are able to
> access) BIG memories closely related to the type of texts they still
> translate. For me and for the few freelancers I�m in contact with,
> that�s probably not the case.
> But even if you had the memories, my guess now is that it will take
> too many hours to finetune the system(s) � finding out what the right
> proportion of �general� versus �in-domain� data should be - to get
> results that are significantly better than those of the free online
> systems.
> Having said that, I loved MMM and I�m sure the future of technical
> translation will end up integrating TM (ok, TenTs) and MT in that way.
> But for the freelance (as against corporations and big LSPs) to
> benefit, clean �general� and �domain-tagged� monolingual and bilingual
Hi, Ignacio,
I'm glad that you like this work. We had the same difficulties you had when we started using Moses (in fact, so big that the idea of MMM was born ;-) ). You will need to have a good deal of aligned segments (in our case, 14 million, but very good results already with just 6). Since Moses uses "phrases" (that is, arbitrarily and sometimes very small sequences of words - frequently just one or two - that have nothing to do with our notion of grammar), I would not be too peculiar in my corpus choices (though, if you want it to translate reasonably, you'll have to find a way to put the representative vocabulary in it and I'd also choose styles of texts adequate for the task at hand). Some languages are much more forgiving than others (first and foremost the EN language, but also the romance languages). Others, morphologically richer (DE, FI and so on), might not give as good results.
I agree with your impression that the synergy between MT and TM is important.
Thanks for your input,
João Rosas
On 11/10/10 11:11, Ignacio Garcia wrote:
Jost Zetzsche publishes a very well read Tool Kit for professional
translators dealing with technical matters. In its last 178th issue he
refered to MMM
+++
Achim Ruopp brought to my attention Moses for Mere Mortals, a
specially packaged and presumably much, much easier-to-use version of
the open-source statistical MT Moses monster. So in theory, it's a
free MT engine that is easy enough for the freelance translator or
smallish LSP to use and feed with their own TM data and come out with
some kind of results for specialized MT purposes. Again, I have no
experience with it and would love some feedback for that as well.
+++
I've pasted below the feedback I sent to him responding to that:
+++
Hi Jost
Since you asked…
I’ve been using the Moses for Mere Mortals these last weeks – not for
translating, though, but for research/training-related reasons.
I tried to install Moses first from the step-by-step guide in
statmt.org and run into all sort of problems (the guide is good, but
my Linux literacy is limited). With much less effort I was able to put
Moses together following the quite clear MMM guides, and gathering
some available data from the statmt.org site. And it works!, although
not as good as Google Translate - yet.
MMM has also some Windows add-ins that allow to export Moses output
into TMX, and to import TMX into the format Moses reads so, in theory,
yes, it can be easily integrated into our TenT’s workflow.
Would it be really useful, right now, for freelance translators?
Perhaps for those who had collected over the years (or are able to
access) BIG memories closely related to the type of texts they still
translate. For me and for the few freelancers I’m in contact with,
that’s probably not the case.
But even if you had the memories, my guess now is that it will take
too many hours to finetune the system(s) – finding out what the right
proportion of “general” versus “in-domain” data should be - to get
results that are significantly better than those of the free online
systems.
Having said that, I loved MMM and I’m sure the future of technical
translation will end up integrating TM (ok, TenTs) and MT in that way.
But for the freelance (as against corporations and big LSPs) to
benefit, clean “general” and “domain-tagged” monolingual and bilingual
Hi João,
I couldn't agree with you more. In my opinion, the three most important aspects of training corpus, in order of priority, are: 1) matching the data to the purpose, 2) preparing the data for training, and 3) growing the data an effective size.
We're preparing to announce the BETA release of the Do Moses Yourself (DoMY) product. With today's updates (v 1.31), DoMY covers almost all of the Linux functionality in MMM. Ubuntu PPA's not only automate Installation, but enable users to automatically update to the newest version via the Update Manager. Shell scripts train new models and support all LM types (including KenLM and SRILM if manually installed outside of PPA). The Recaser can use any LM type. DoMY is missing scripts to move models to different hosts.
DoMY also includes an embedded version of Corpus Filtergraph. It prepares training data and runs the translation process. Eventually, it will replace all shell scripts. Our W <above>