Leverage changes

Yves Savourel

unread,

Nov 7, 2010, 9:39:12 AM11/7/10

to okapi...@googlegroups.com

I had an action item last meeting to summarize the changes I did in the leveraging-related classes.

Here is a list of the main changes I made after Jim's changes:

--- in BaseConnector.leverage(): added default match-type EXACT for 100 and FUZYY for [99-1], left 0 to UNKNOWN.

--- changed:
QueryManager.leverage(TextUnit tu, Boolean fillTarget)
To:
QueryManager.leverage(TextUnit tu, int thresholdToFill, boolean downgradeIdenticalBestMatches)

thresholdToFill : This allows to fill the target based on a threshold. Use a value >100 to fill nothing. This way we getting annotations and filling target with the best match can be tuned independently.

downgradeIdenticalBestMatches: This allows to change to 99 the score of identical best exact matches. I changed where it was several times. At first I've put it in the baseConnector.leverage() but that was not addressing cross-connector situations.
The idea is to avoid having several identical matches that have different translation be exact, because in some component an exact match triggers some automated behavior and when we get several translations those behaviors should not be triggered. Maybe executing this should be moved outside the QM.leverage() method, by just calling a QM.downgradeIdenticalBestMatches(), or a helper method, but having the option there may help the caller to think about that case.

--- in QueryManager.leverage() I've also change the target loop to always process both the text container and the segments instead of either one. The reason is that a custom IQuery.leverage() could attached annotations to both, and we were skipping the segments if we had an annotation on the container.

--- I've added all new options in the LeveragingStep UI so they can be set by the user if needed.

--- I've removed all references to ScoresAnnotations and ScoreInfo, and tried to replace them by call to the AltTranslationsAnnotation. So there are changed in the TMXWriter, XLIFFWriter, and even in the GenericSkeletonWriter.

Rainbow's "Create Translation Package" utility is not completely working yet as dropping ScoresAnnotation required many changes, especially in how the ancillary TMX are generated. I'll keep testing/fixing that.

--- I've updated in QM the internal counters used to show how many entries were leveraged by exact or fuzzy, but obviously I expect those to go away at some point when the word-count/scope reporting is stable and can replace them. I have questions about that actually: counting matches in the alternate translation annotation and the best ones is different. But that's another topic.

--- QualityManager.adjustNewFragment() is not used anymore and has been removed. We now use TextUnitUtil.adjustTargetCodes() everywhere. I've moved the adjustNewFragment() unit tests to TextUnitUtil. But we still have to compare both methods they are almost the same but we want to make sure adjustTargetCodes() does handle the cases adjustNewFragment() did. I'll try to do this asap. So far all cases I've seems seem ok.

That's about it
-ys

Jim Hargrave

unread,

Nov 8, 2010, 12:12:28 PM11/8/10

to okapi...@googlegroups.com

Holy cow! Didn't know this was such a pain....see comment below on
downgrade method...

On Sun, Nov 7, 2010 at 7:39 AM, Yves Savourel <yv...@opentag.com> wrote:
> I had an action item last meeting to summarize the changes I did in the leveraging-related classes.
>
> Here is a list of the main changes I made after Jim's changes:
>
> --- in BaseConnector.leverage(): added default match-type EXACT for 100 and FUZYY for [99-1], left 0 to UNKNOWN.
>
> --- changed:
> QueryManager.leverage(TextUnit tu, Boolean fillTarget)
> To:
> QueryManager.leverage(TextUnit tu, int thresholdToFill, boolean downgradeIdenticalBestMatches)
>
> thresholdToFill : This allows to fill the target based on a threshold. Use a value >100 to fill nothing. This way we getting annotations and filling target with the best match can be tuned independently.
>
> downgradeIdenticalBestMatches: This allows to change to 99 the score of identical best exact matches. I changed where it was several times. At first I've put it in the baseConnector.leverage() but that was not addressing cross-connector situations.
> The idea is to avoid having several identical matches that have different translation be exact, because in some component an exact match triggers some automated behavior and when we get several translations those behaviors should not be triggered. Maybe executing this should be moved outside the QM.leverage() method, by just calling a QM.downgradeIdenticalBestMatches(), or a helper method, but having the option there may help the caller to think about that case.
>

Is this called before removeDuplicates? Personally I think the
creation date should also be used in these cases - removeDuplicates
will remove all but the latest duplicate matches. But some TMs
probably don't store date information. We just need to be careful when
and where the downgrade method is called - so I agree we should move
the method outside just like removeDuplicates so that implementers can
use whatever logic seems best for their connector..

But then we have to agree on the global logic when many connectors are
used in the QM - worth a discussion sometime down the road.

Yves Savourel

unread,

Nov 8, 2010, 12:35:38 PM11/8/10

to okapi...@googlegroups.com

>> downgradeIdenticalBestMatches: This allows to change to 99

>> ... the score of identical best exact matches.

>
> Is this called before removeDuplicates? Personally I think
> the creation date should also be used in these cases -
> removeDuplicates will remove all but the latest duplicate
> matches. But some TMs probably don't store date information.

Exactly: not all TM have date info.

But it goes beyond that: in many cases we allow different translations for the same source. So we end up with several best matches, and when there is no automated context we have to let the user choose, and therefore treat those matches like fuzzies. (quite a few of our translators still use Trados RTF files where we can't have several matches, just one exact or fuzzy).

>... But then we have to agree on the global logic when

> many connectors are used in the QM - worth a discussion
> sometime down the road.

Yep.

BTW: I'll run some more tests and fixes here and there and release a snapshot today. Hopefully we can get a few end-users to play with some of the functionality that was changed in leveraging and catch remaining issues.

-ys

Reply all

Reply to author

Forward