Issue with sub and extraId in XliffFilter...

1 view
Skip to first unread message

jim

unread,
Nov 15, 2021, 2:30:37 PM11/15/21
to Group: okapi-devel, Chase Tingley
I'm making the code alignment during merge more strict by only matching
on id and tag type. That means the filters have to be consistent in
creating code id's in bilingual content. I already fixed an error with
TMX, but the xliff filter has an issue when there are subflows inside a
code.

The large number extraId is used and decremented. There is no check
across source and target so these codes get different id's (an integer
off by one). This causes the merge to fail.

I think Patrick Huy originally added this code. I can modify it to make
sure an id generated in the source in the same in the target (probably
using some hash value on sub content). Does this sound ok?

A general heads up to be careful with code id's as they are expected to
be consistent after filtering - normally only an issue for bilingual
formats if present. But can also be a problem for monolingual filters if
codes reuse the same id.

cheers,

Jim

yves.s...@gmail.com

unread,
Nov 15, 2021, 5:00:13 PM11/15/21
to okapi...@googlegroups.com, Chase Tingley
If we use a hash value, we should have our own util method for this instead of using the string/int runtime: it's not always the same implementation across VMs.
-ys
--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/084cf5b4-8b0b-61e8-adb9-b4947acfc784%40gmail.com.

jim

unread,
Nov 16, 2021, 1:54:39 PM11/16/21
to okapi...@googlegroups.com, yves.s...@gmail.com, Chase Tingley
I added this to StringUtil. We should use it anytime we need to maintain an id value across jvm implementations.

/**
 * JVM independent hashCode implementation
 * used to generate numeric id's from strings
 * @param s
 * @return an integer calculated from the give strings
 * Some collisions are expected but should be rare for longer strings.
 */
public static int hashCode(String s) {
   int h = 0;
   char[] value = s == null ? new char[0] : s.toCharArray();
   if (value.length > 0) {
      char val[] = value;
      for (int i = 0; i < value.length; i++) {
         h = 31 * h + val[i];
      }
   } else {
      // a 0 id may conflict with other Code id's
      // so we use the MAX value
      h = Integer.MAX_VALUE;
   }

   return h;
}
Reply all
Reply to author
Forward
0 new messages