Hi ICU Support,
I have a question regarding character conversion behavior when using ICU versions 3.2 and 7.8 for converting from EBCDIC to code page 943.
Input encoding: ibm-16684_P110-2003
Output encoding: ibm-943_P130-1999
Original source data (GRAPHIC(10)):
Hex '5440 4040 4040 4040 4040 4040 4040 4040' (Length: 20)
(Note: Hex '5440' represents a garbled/invalid EBCDIC character.)
Converted data:
ICU 3.2:
Hex 'FCFC 4080 4080 4080 4080 4080 4080 4080' (Length: 20)
ICU 7.8:
Hex 'FCFC 4080 4080 4080 4080 4080 4080 4080 FCFC' (Length: 22)
In ICU 7.8, the invalid EBCDIC character ('5440') is replaced with the DBCS substitution character ('FCFC'), and an additional 'FCFC' appears at the end, resulting in a longer output.
Could you please confirm the following:
1. Has the specification or behavior of ICU changed between versions 3.2 and 7.8 regarding this type of conversion?
2. Is the behavior observed in ICU 7.8 expected?
Best regards,
Issei Ikejiri