Length decreased due to accented characters when I use UTF-8 encoding

12 views
Skip to first unread message

Mohamed Rassaa

unread,
Jan 14, 2016, 2:05:21 PM1/14/16
to legstar-user
Hi all,

I have an issue linked to LegStar and UTF-8 encoding :
I have an input file (XML file) with UTF-8 encoding and I want to generate an output file (TXT file) with UTF-8 encoding too passed by java code generated by LegStar.
It's OK except for values of fields with accented characters. For this case, I have one shift by accented character even if it's the same encoding. The length of fields are decreased due to accented characters.
Here is an example :
Input field : "Marie-José"
Output Field (must be 38 characters) : "Marie-José                           " (37 characters, not 38)
My copycobol :
05 XILAXACN-PRN                               PIC X(38)

I have the same problem with the others fields when I have accented characters.
I check my java code generated by LegStar : the lengths are OK.

Did you see this kind of problem before ?

Thank you for your help and sorry for my bad english.

Mohamed

Fady

unread,
Jan 17, 2016, 8:11:31 AM1/17/16
to legstar-user
Hello Mohamed,

Not sure if this is the right answer but e with acute is encoded as 2 bytes in utf-8 (0xc3a9). So even if you are seeing 37 characters, there might actually be 38 bytes produced.

Fady
Reply all
Reply to author
Forward
0 new messages