Input codepage selection

14 views
Skip to first unread message

Martin Ward

unread,
Sep 29, 2022, 11:06:13 AM9/29/22
to z390

Summary: z390 seems to ignore the COPDEPAGE option for input files.

I recently updated to Java 19 from Oracle:

https://download.oracle.com/java/19/latest/jdk-19_linux-x64_bin.tar.gz

One of my assembler source files has the character 0xA6 for a broken
vertical bar, which is a valid character for the ISO-8859-1 encoding.
This should come out as X'6A' in the listing (EBCDIC encoding IBM1047
for a vertical bar). Instead, with the new version of Java I get
the three character sequence 0xEF, 0xBF, 0xBD which is the UTF-8
"replacement character" used when a character is not recognised.
This is what Java gives when a character 0xA6 is encountered
when reading a UTF-8 character stream.

I don't think z390 is honouring the CHARSET option for the input.

I have a file TEST.MLC which just declares a single vertical
bar character using DC (see attachment).

With the old version of java, if I run this command:

java -classpath z390.jar -Xrs mz390 TEST 'CODEPAGE(ISO-8859-1+IBM1047+LIST)'

I get X'8D' in the listing, not the expected X'6A'

If I set the environment variable LC_CTYPE to en_GB.ISO-8859-1
then it works (I get X'6A' in the listing) but when I use
the new version of Java it stops working.

The TEST.ERR file with the old version of Java and the environment
variable set says:

MZ390 Default ascii Charset codepage is - ISO-8859-1
MZ390 Selected ascii Charset codepage is - ISO-8859-1

without the environment variable setting I get:

MZ390 Default ascii Charset codepage is - UTF-8
MZ390 Selected ascii Charset codepage is - ISO-8859-1

and I get this result with the new version of Java regardless
of the setting of LC_CTYPE.

Note: If I run this command:

java -classpath z390.jar -Xrs mz390 TEST 'CODEPAGE(UTF-8+IBM1047+LIST)'

I get this result in TEST.ERR:

MZ390 Default ascii Charset codepage is - ISO-8859-1
MZ390 Selected ascii Charset codepage is - UTF-8

This is saying that I have selected codepage UTF-8:
but the listing still contains X'6A' for the vertical bar.

Conclusion: It seems that z390 is using the default codepage
for input files, regardless of the setting of the CODEPAGE option
and regardless of what it says in the ERR file.

With older versions of Java I could change the default codepage
by setting the environment variable LC_CTYPE to the required codepage,
but the latest version of Java does not seem to honour this variable.

--
Martin

Dr Martin Ward | Email: mar...@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
TEST.zip

Martin Ward

unread,
Sep 29, 2022, 11:43:13 AM9/29/22
to z390

Line 2026 in src/mz390.java appears to be where files are opened:

mac_file_buff[cur_mac_file] = new
BufferedReader(new FileReader(mac_file[cur_mac_file]));

This does not set the input encoding.

I found that this would work:

mac_file_buff[cur_mac_file] = new BufferedReader(new
InputStreamReader(new FileInputStream(mac_file[cur_mac_file]),
StandardCharsets.ISO_8859_1));

I added a couple of extra imports to make it work:

import java.nio.charset.StandardCharsets;
import java.io.*;

Note that this "hard wires" the input encoding to ISO_8859_1

I don't know enough Java to be able to set it to the selected encoding.

John Ganci

unread,
Oct 1, 2022, 10:09:07 AM10/1/22
to Martin Ward, z390
Martin,

If you look in tz390.java, line 366, you will see that the default z/OS compatible CODEPAGE value is "CODEPAGE(ISO-8859-1+IBM1047)". During initialization, tz390 extracts the ASCII and EBCDIC values into fields unless NOCODEPAGE was specified for assembly, in which case the fields will be their initial values - an empty string. You can adjust the update you made to mz390.java line 2026 as follows:

if (!tz390.ascii_charset_name.equals(""))
{
    mac_file_buff[cur_mac_file] = new BufferedReader(
        new InputStreamReader(
            new FileInputStream(
                mac_file[cur_mac_file]),
                    tz390.ascii_charset_name));    
}
else
{
   mac_file_buff[cur_mac_file] = new BufferedReader(
       new FileReader(mac_file[cur_mac_file]));
}

The tz390.ascii_charset_name will be "ISO-8859-1" unless CODEPAGE is overridden at assembly time.

I tried this with your TEST.MLC. Seems to work - the TEST.PRN output shows X'6A' on the left and the broken vertical bar in the C"value'.

Don or Abe might want to comment/

Regards,

John Ganci

--
You received this message because you are subscribed to the Google Groups "z390" group.
To unsubscribe from this group and stop receiving emails from it, send an email to z390+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/z390/33464573-9c47-3161-1284-17ffb4ed5a9e%40gkc.org.uk.

Martin Ward

unread,
Oct 1, 2022, 10:58:28 AM10/1/22
to John Ganci, z390
On 01/10/2022 15:08, John Ganci wrote:
> I tried this with your TEST.MLC. Seems to work - the TEST.PRN output
> shows X'6A' on the left and the broken vertical bar in the C"value'.

In my tests it seems to work and also causes z390 to finally
take account of the CODEPAGE command line option.

Martin Ward

unread,
Oct 1, 2022, 11:11:29 AM10/1/22
to John Ganci, z390
On 01/10/2022 15:58, Martin Ward wrote:
> On 01/10/2022 15:08, John Ganci wrote:
>> I tried this with your TEST.MLC. Seems to work - the TEST.PRN output
>> shows X'6A' on the left and the broken vertical bar in the C"value'.
>
> In my tests it seems to work and also causes z390 to finally
> take account of the CODEPAGE command line option.

I spoke too soon: I just noticed that with java-19 the listing
has changed from a single character to two characters:

000000 6A (1/1)1 TEST DC C'¦'

The characters are 0xc2 and 0xa6

We need to ensure that the listing output file uses the same ASCII
codepage as we are using in the file read.

John Ganci

unread,
Oct 1, 2022, 11:12:16 PM10/1/22
to Martin Ward, z390, z390development
That's good news.

This response is also going to the z390development group.

To those on z390development:

Please read through this email sequence. It looks like Martin has found a problem where z390 ignores the CODEPAGE values when reading text files. Martin's fix, slightly enhanced to remove the "hardwired" ASCII charset name and allow no ASCII charset name to be supplied (the original mz390.java line 2026 is used for this case), seems to satisfy Martin's concerns. The fix replaces one line in mz390.java (line 2026 in current mz390.java):

current:

    mac_file_buff[cur_mac_file] = new BufferedReader(new FileReader(mac_file[cur_mac_file]));

new:

     if (!tz390.ascii_charset_name.equals(""))
   {
       mac_file_buff[cur_mac_file] = new BufferedReader(
           new InputStreamReader(
               new FileInputStream(
                   mac_file[cur_mac_file]),
                       tz390.ascii_charset_name));    
     }
     else
     {
         mac_file_buff[cur_mac_file] = new BufferedReader(
             new FileReader(mac_file[cur_mac_file]));
     }

Imports of java.io.FileInputStream and java.io.InputStreamReader were added at top of mz390.java source.

Assemblies of Martin's TEST.MLC source (run from the z390 directory)

    bash/asm <path to test directory>/TEST
    bash/asm <path to test directory>/TEST 'codepage(ISO-8859-1+IBM1047+LIST)'
    bash/asm <path to test directory>/TEST 'codepage(UTF-8+IBM1947+LIST)'
    bash/asm <path to test directory>/TEST nocodepage

The first two show the following in TEST.PRN

Assembler Listing
000000 6A                                     (1/1)1 TEST     DC    C'¦'

The X'6A' and broken vertical bar as expected; the second invocation also correctly shows CODEPAGE information in TEST.ERR.

The last two show the following in TESTPRN:

Assembler Listing
000000 8D                                     (1/1)1 TEST     DC    C'�'

The X'8D' and unprintable character in the DC statement are as expected. However, the "nocodepage" run resulted in an assembly error:

$ bash/asm <path to test directory>/TEST nocodepage
11:48:44 TEST      MZ390 START USING z390 v1.8.1 ON J2SE 1.8.0_201 10/01/22
MZ390E error 138         (1/1)1 invalid ascii source line 1 in <path to test directory>/TEST.MLC
11:48:44 TEST      MZ390 ENDED   RC=12 SEC= 0 MEM(MB)= 74 IO=31
asm ERROR: mz390 rc=12; see errors in mz390 generated <path to test directory>/TEST.BAL/ERR/PRN file(s) and on console
$

This also happens with the original mz390.java. The error message is issued when the method tz390.verify_ascii_source(String temp_line) returns false. That method uses an array named ascii_table to verify all characters (as integers) in the argument temp_line are (1) not 9 (2) not '.' and (3) the ascii_table entry is not '.'. The variable ascii_table has a default value (most entries are '.') which is overridden (only one entry is '.') if CODEPAGE is used. The X'8D' value has '.' for the default ascii_table entry, so false is returned, resulting in the error message.

This may be a bug. Further research is required.

Note: All regression tests run by build.sh were also run with the modified mz390.java. No test failed.

All future discussion seems more appropriate for just the z390development group, but the discussion can continue here in the z390 group if so desired.

Comments? Suggestions?

Regards,

John Ganci
Reply all
Reply to author
Forward
0 new messages