Pensieve TM not overwritting old translations as expected

2 views
Skip to first unread message

Pablo Gomez

unread,
Sep 23, 2021, 5:08:50 AM9/23/21
to okapi...@googlegroups.com

Hi,

 

I am experiencing a problem with Pensieve TM and perhaps it is something wrong I am doing.

 

It seems that Pensieve TM is storing old versions of the translated <tu>s although I am executing it with –over. As an example, when I export from TMX I get many sections like this:

 

[…]

<tu tuid="autoID4">

<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>

<tuv xml:lang="nl"><seg>Disclaimer</seg></tuv>

</tu>

[…]

<tu tuid="autoID11">

<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>

<tuv xml:lang="nl"><seg>Aansprakelijkheid</seg></tuv>

</tu>

[…]

<tu tuid="autoID22">

<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>

<tuv xml:lang="nl"><seg>Aansprakelijkheid</seg></tuv>

</tu>

[…]

 

It seems that, although I am using this command:

 

tikal.bat -sl EN -tl NL -ie UTF8 -oe UTF8 -fc okf_tmx -imp standard_sections\copyright\copyright gen_OT_20210923-level2.tmx -over

 

And the referred tmx includes a single “Disclaimer” translation:

 

    <tu>

      <tuv xml:lang="en">

        <seg>Disclaimer</seg>

      </tuv>

      <tuv xml:lang="nl" changeid="PGO" changedate="20210923T084514Z">

        <seg>Aansprakelijkheid</seg>

      </tuv>

    </tu>

 

The preexisting tu’s in Pensieve are not –over(written). This is what manual states:

 

-over

If this option is set and an entry to import has the same source text as one or more entries in the existing TM, all the existing entries will be replaced by the new one. Do not set this option if you want to allow different translations for the same source.

 

An additional problem is that I cannot rewrite the Pensieve TM each time I want to update it as it includes translations for several languages which would become lost and also terms in the same language which are not to be rewritten.

 

 

Thanks for your help,

Pablo Gomez

Yves Savourel

unread,
Sep 24, 2021, 8:11:37 AM9/24/21
to okapi...@googlegroups.com

Hi Pablo,

 

It seems to work, as far as I can tell.

Here is my test:

 

Import a TMX from scratch, one of the source entry has two translations.

 

-------------------------------------------------------------------------------

Okapi Tikal - Localization Toolset

Version: 2.1.42.0-SNAPSHOT

-------------------------------------------------------------------------------

Importing to Pensieve TM

Source language: en-US

Target language: de-DE

Default input encoding: windows-1252

Filter configuration: okf_tmx

Output: testtm.pentm

Input: /C:/Dev/Okapi/dev/trunk/deployment/maven/dist_win32-x86_64/test1-de.tmx

Done in 1.546s

 

We make sure those entries are there:

 

C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -q "entry 2" -pen testtm.pentm -sl en-us -tl de-de

-------------------------------------------------------------------------------

Okapi Tikal - Localization Toolset

Version: 2.1.42.0-SNAPSHOT

-------------------------------------------------------------------------------

= From net.sf.okapi.connectors.pensieve.PensieveTMConnector (en-US->de-DE)

  Threshold=95, Maximum hits=25

score: 100, origin: 'testtm.pentm'

  Source: "entry 2"

  Target: "entry trad1 DE"

score: 100, origin: 'testtm.pentm'

  Source: "entry 2"

  Target: "entry 2 trad2 DE"

 

Then we import a new file, with one source entry identical but another translation, and we use -over:

 

C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -imp testtm test1-de-new.tmx -sl en-us -tl de-de -over

-------------------------------------------------------------------------------

Okapi Tikal - Localization Toolset

Version: 2.1.42.0-SNAPSHOT

-------------------------------------------------------------------------------

Importing to Pensieve TM

Source language: en-US

Target language: de-DE

Default input encoding: windows-1252

Filter configuration: okf_tmx

Output: testtm.pentm

Input: /C:/Dev/Okapi/dev/trunk/deployment/maven/dist_win32-x86_64/test1-de-new.tmx

Done in 1.435s

 

Then we check what’s the result: Only the new translation.

 

C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -q "entry 2" -pen testtm.pentm -sl en-us -tl de-de

-------------------------------------------------------------------------------

Okapi Tikal - Localization Toolset

Version: 2.1.42.0-SNAPSHOT

-------------------------------------------------------------------------------

 

= From net.sf.okapi.connectors.pensieve.PensieveTMConnector (en-US->de-DE)

  Threshold=95, Maximum hits=25

score: 100, origin: 'testtm.pentm'

  Source: "entry 2"

  Target: "entry trad1 v2 DE"

 

Also, if I recall correctly (I may be wrong though), the TM is just for one language pair, so trying to store several target languages is likely not going to work.

 

I hope this help.

 

Cheers,

-yves

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/187179f39f2a43c6a4b27c28296e7d15%40let.be.

Pablo Gomez

unread,
Sep 24, 2021, 9:19:19 AM9/24/21
to Yves Savourel, okapi...@googlegroups.com

Hi Yves,

 

Thanks a lot for the time (and intelligence!) you have invested in these tests.

 

Apparently, the same tests I made, but with different results. So I am going to repeat mine, double checking what you did and where are the differences.

 

In fact, I have just made a script for locating and filtering out the repeated entries from the output, and it finds many, so for some reason they are there. Of course I am talking about identical source segments.

 

I’d much prefer not to use the script and have tikal/pensieve doing all the work.

 

And yes, you are right, each Pensieve TM is to be used for a unique source/target languages pair. Damn! J

 

Thanks again,

Pablo G

Reply all
Reply to author
Forward
0 new messages