Hi,
I am experiencing a problem with Pensieve TM and perhaps it is something wrong I am doing.
It seems that Pensieve TM is storing old versions of the translated <tu>s although I am executing it with –over. As an example, when I export from TMX I get many sections like this:
[…]
<tu tuid="autoID4">
<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>
<tuv xml:lang="nl"><seg>Disclaimer</seg></tuv>
</tu>
[…]
<tu tuid="autoID11">
<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>
<tuv xml:lang="nl"><seg>Aansprakelijkheid</seg></tuv>
</tu>
[…]
<tu tuid="autoID22">
<tuv xml:lang="en"><seg>Disclaimer</seg></tuv>
<tuv xml:lang="nl"><seg>Aansprakelijkheid</seg></tuv>
</tu>
[…]
It seems that, although I am using this command:
tikal.bat -sl EN -tl NL -ie UTF8 -oe UTF8 -fc okf_tmx -imp standard_sections\copyright\copyright gen_OT_20210923-level2.tmx -over
And the referred tmx includes a single “Disclaimer” translation:
<tu>
<tuv xml:lang="en">
<seg>Disclaimer</seg>
</tuv>
<tuv xml:lang="nl" changeid="PGO" changedate="20210923T084514Z">
<seg>Aansprakelijkheid</seg>
</tuv>
</tu>
The preexisting tu’s in Pensieve are not –over(written). This is what manual states:
-over |
If this option is set and an entry to import has the same source text as one or more entries in the existing TM, all the existing entries will be replaced by the new one. Do not set this option if you want to allow different translations for the same source. |
An additional problem is that I cannot rewrite the Pensieve TM each time I want to update it as it includes translations for several languages which would become lost and also terms in the same language which are not to be rewritten.
Thanks for your help,
Pablo Gomez
Hi Pablo,
It seems to work, as far as I can tell.
Here is my test:
Import a TMX from scratch, one of the source entry has two translations.
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.42.0-SNAPSHOT
-------------------------------------------------------------------------------
Importing to Pensieve TM
Source language: en-US
Target language: de-DE
Default input encoding: windows-1252
Filter configuration: okf_tmx
Output: testtm.pentm
Input: /C:/Dev/Okapi/dev/trunk/deployment/maven/dist_win32-x86_64/test1-de.tmx
Done in 1.546s
We make sure those entries are there:
C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -q "entry 2" -pen testtm.pentm -sl en-us -tl de-de
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.42.0-SNAPSHOT
-------------------------------------------------------------------------------
= From net.sf.okapi.connectors.pensieve.PensieveTMConnector (en-US->de-DE)
Threshold=95, Maximum hits=25
score: 100, origin: 'testtm.pentm'
Source: "entry 2"
Target: "entry trad1 DE"
score: 100, origin: 'testtm.pentm'
Source: "entry 2"
Target: "entry 2 trad2 DE"
Then we import a new file, with one source entry identical but another translation, and we use -over:
C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -imp testtm test1-de-new.tmx -sl en-us -tl de-de -over
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.42.0-SNAPSHOT
-------------------------------------------------------------------------------
Importing to Pensieve TM
Source language: en-US
Target language: de-DE
Default input encoding: windows-1252
Filter configuration: okf_tmx
Output: testtm.pentm
Input: /C:/Dev/Okapi/dev/trunk/deployment/maven/dist_win32-x86_64/test1-de-new.tmx
Done in 1.435s
Then we check what’s the result: Only the new translation.
C:\Dev\Okapi\dev\trunk\deployment\maven\dist_win32-x86_64>tikal -q "entry 2" -pen testtm.pentm -sl en-us -tl de-de
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.42.0-SNAPSHOT
-------------------------------------------------------------------------------
= From net.sf.okapi.connectors.pensieve.PensieveTMConnector (en-US->de-DE)
Threshold=95, Maximum hits=25
score: 100, origin: 'testtm.pentm'
Source: "entry 2"
Target: "entry trad1 v2 DE"
Also, if I recall correctly (I may be wrong though), the TM is just for one language pair, so trying to store several target languages is likely not going to work.
I hope this help.
Cheers,
-yves
--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/187179f39f2a43c6a4b27c28296e7d15%40let.be.
Hi Yves,
Thanks a lot for the time (and intelligence!) you have invested in these tests.
Apparently, the same tests I made, but with different results. So I am going to repeat mine, double checking what you did and where are the differences.
In fact, I have just made a script for locating and filtering out the repeated entries from the output, and it finds many, so for some reason they are there. Of course I am talking about identical source segments.
I’d much prefer not to use the script and have tikal/pensieve doing all the work.
And yes, you are right, each Pensieve TM is to be used for a unique source/target languages pair. Damn! J
Thanks again,
Pablo G
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/001201d7b13d%2451191a40%24f34b4ec0%24%40gmail.com.