what will happen if I choose "fr" (French) for all LTR target languages and "ar" (Arabic) for all RTL target languages?

28 views
Skip to first unread message

handika dwi

unread,
Jul 29, 2021, 8:41:37 AM7/29/21
to okapi-users
is there caveat on this?

handika dwi

unread,
Jul 29, 2021, 9:14:00 AM7/29/21
to okapi-users
will this screw up the merged file even if the contents are not the actual target language?
for example:

<target xml:lang="ar"><mrk mid="0" mtype="seg"><g id="1">האם ניתן להסיר את תגי ה- g שמסביב?</g>.</mrk><mrk mid="1" mtype="seg"><g id="2">האם ניתן להסיר את תגי ה- g שמסביב?</g></mrk></target>

notice the xml:lang is ar but the contents are Hebrew



On Thursday, July 29, 2021 at 7:41:37 PM UTC+7 handika dwi wrote:
is there caveat on this?

Chase Tingley

unread,
Jul 29, 2021, 8:23:16 PM7/29/21
to handika dwi, okapi-users
Okapi doesn't attempt to validate the script against the specified language.  (Your Arabic document could be quoting something in Hebrew, after all.)  However when merging, make sure the language on the trans-unit/target matches the target language of the merge process itself (ie, the -tl flag in tikal, or the target language set in the Rainbow UI).  Otherwise, your translation may not be picked up.

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/50e1e5d5-e49f-4b03-9901-cd73218f81ddn%40googlegroups.com.

Mihai Nita

unread,
Jul 29, 2021, 11:03:46 PM7/29/21
to handika dwi, okapi-users
Mismatch in source can be more problematic because it will affect segmentation and word count.
And it's bad because it does not "explode", you just end up overpaying (your loss) or underpaying (stealing) your translators.

A bit less of a problem in target, although it will probably affect validation.

But why would you do that?

Mihai

On Thu, Jul 29, 2021 at 5:41 AM handika dwi <handik...@gmail.com> wrote:
is there caveat on this?

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.

handika dwi

unread,
Jul 29, 2021, 11:16:16 PM7/29/21
to okapi-users
@mihai no. it's not in the source segments. but in the target segments especially when the merging happens

Mihai Nita

unread,
Jul 29, 2021, 11:22:04 PM7/29/21
to handika dwi, okapi-users
Well, the answer does not change.

Nothing breaks, as Chase said, at least not immediately, but at some point "someone will get hurt" :-)

I mean, what happens in TMs, if you mix all German / Greek / Japanese / Russian and pretend it is French?
What happens with glossaries?

Why do it, to begin with?

Mihai

handika dwi

unread,
Jul 29, 2021, 11:24:59 PM7/29/21
to okapi-users
"I mean, what happens in TMs, if you mix all German / Greek / Japanese / Russian and pretend it is French?"
would you elaborate this?

Mihai Nita

unread,
Jul 29, 2021, 11:27:41 PM7/29/21
to handika dwi, okapi-users
BTW other stuff that can go wrong: Japanese, Chinese, don't use spaces between words.
And don't use spaces between sentences.
But the spaces between sentences are usually left out of the segments, and are removed in a separate cleanup step.
I don't know for sure that it will break something, but I might.

Why risk misusing things knowingly? Nobody tried to know for sure what happens, because it sounds like a bad idea.
So "maybe nothing will break, but nobody knows for sure, so you are on your own" :-)

Mihai

Mihai Nita

unread,
Jul 29, 2021, 11:43:22 PM7/29/21
to handika dwi, okapi-users
> would you elaborate this?

Most translation memory systems store all language combinations together in one table, something like this:

id     original_id    source_locale   target_locale   source_string    target_string
1      btnCancel      en              ro              Cancel           Abandonează
2      btnCancel      en              fr              Cancel           Annuler
3      btnCancel      en              ja              Cancel           キャンセル
4      btnCancel      en              de              Cancel           Abbrechen

If you tag all target languages with the same ID (French) it means that the TM database will be like the one above,
but the "target_locale" column (all the red stuff) will all have the same value, "fr"

So now when you get a new file to translate and want to leverage an existing translation, the (simplified) TM logic is:
find the entry where the source == "Cancel", the source_locale == "en", and the target_locale == "fr"
(well, that's the gist, some also look at the ID, context, etc, but let's go with the simplified version)

And it will find 4 entries. Which one do you use?

Worse, even the request to the TM is a lie, you say "give me an existing French translation" when in reality you want a Russian one,
but because it is not tagged properly the target locale you ask for the wrong one, and you get an even worse result, because it was not tagged properly.

Mihai


--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.

handika dwi

unread,
Jul 29, 2021, 11:59:47 PM7/29/21
to okapi-users
Our TM system is not like that. We use Okapi only for extract and merge documents

Mihai Nita

unread,
Jul 30, 2021, 3:17:31 PM7/30/21
to handika dwi, okapi-users
Of course, only you know how you intend to use this, in what environment, and why.
So I think nobody in this group can say with confidence that there are no caveats.

My personal take (not representing others, of "Okapi" as project):
- This "feels wrong"
- There is no guarantee that things don't break, now or later
- I don't know why someone would want to do this.
  You did not explain the use case, why do this, etc, just "I want to do it"
- There are caveats, some of them unknown, and I would not do it
- You are on your own if something goes wrong (well, pretty much the Apache License v2 bullets 7-9 :-)

In general I encourage people to explain the real problem they are trying to solve,
instead of working through a solution, getting stuck half-way, and asking about the blocker, without context.

Is it a well known pattern when asking for support (https://en.wikipedia.org/wiki/XY_problem), and
    "This leads to enormous amounts of wasted time and energy,
    both on the part of people asking for help, and on the part of those providing help"
    (quote from https://xyproblem.info/)

Regards,
Mihai


Reply all
Reply to author
Forward
0 new messages