Length decreased due to accented characters when I use UTF-8 encoding

13 views

Skip to first unread message

Mohamed Rassaa

unread,

Jan 14, 2016, 2:05:21 PM1/14/16

to legstar-user

Hi all,

I have an issue linked to LegStar and UTF-8 encoding :

I have an input file (XML file) with UTF-8 encoding and I want to generate an output file (TXT file) with UTF-8 encoding too passed by java code generated by LegStar.

It's OK except for values of fields with accented characters. For this case, I have one shift by accented character even if it's the same encoding. The length of fields are decreased due to accented characters.

Here is an example :

Input field : "Marie-José"

Output Field (must be 38 characters) : "Marie-José " (37 characters, not 38)

My copycobol :

05 XILAXACN-PRN PIC X(38)

I have the same problem with the others fields when I have accented characters.

I check my java code generated by LegStar : the lengths are OK.

Did you see this kind of problem before ?

Thank you for your help and sorry for my bad english.

Mohamed

Fady

unread,

Jan 17, 2016, 8:11:31 AM1/17/16

to legstar-user

Hello Mohamed,

Not sure if this is the right answer but e with acute is encoded as 2 bytes in utf-8 (0xc3a9). So even if you are seeing 37 characters, there might actually be 38 bytes produced.