Starting from the "PDF-Ligature-Copy-Paste"-problem I got sucked into playing with ICU general transformations, and am now writing a short notebook of fun things you can do with it. While trying to explain global and local filters, I created 3 different, very nonsensical, compound IDs:
1. "[[:Uppercase_Letter:]] any-remove; any-upper"
2. "any-null; [[:Uppercase_Letter:]] any-remove; any-upper"
3. "[[:Uppercase_Letter:]]; any-remove; any-upper"
From my reading of the documentation[1, 2], I expected case 1 and 2 to be equivalent. Unfortunately, my test showed 1 and 3 to be the same!
Input: "This is a test. 123ABC!"
1. "his is a test. 123!"
2. "HIS IS A TEST. 123!"
3. "his is a test. 123!"
I do believe the output of the second version to be the (intended) output of the compound ID 1. Is my reading correct?
From my (rather rushed and absolutely uninformed) look into the codebase my intuition tells me that inside of the function TransliteratorIDParser::parseCompoundID [3], the call to function TransliteratorIDParser::parseGlobalFilter [4] will succeed even if the filter is actually local. Or is the code correct as intended, and the documentation missing something?
Please excuse the lack of pull-request, I did not write any C++ for a long time, and ICU as a code-base is a bit overwhelming...
(Also: Atlassian, rather rudely, was unable to correctly log me in, as far as the
unicode-org.atlassian.net domain is concerned, so I could not write an issue there...)
Have a nice day