Running USS with locale other than 1047

Lindy Mayfield

unread,

Oct 12, 2016, 1:27:34 PM10/12/16

to

Hello,

I read in places where IBM gives both pros and cons to running USS with a code page other than 1047, if for example your 3270 terminal emulator is set to Danish. I also find instructions such as this, though I'm not sure if this is all that is necessary to switch USS encoding:

http://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxb200/danish.htm

I'm curious how common it is for people to system wide set USS to be in a code page other than 1047 which matches their 3270 emulator?

If so, is this statement true for code points that don't match? The shell script will fail if it has $ or curly braces or brackets, etc., characters outside of that cp.

What happens if I use putty or similar ssh client and then edit a shell script using vi? Would that be the same as using ISHELL to edit the same shell script, and my 3270 emulator is set to 1143?

Apologies if my questions aren't clear, as this topic isn't very clear to me at the moment. Hopefully someone with experience with this will help me understand it better.

Kind regards,
Lindy Mayfield

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to list...@listserv.ua.edu with the message: INFO IBM-MAIN

Paul Gilmartin

unread,

Oct 12, 2016, 4:28:07 PM10/12/16

to

On Wed, 12 Oct 2016 17:26:47 +0000, Lindy Mayfield wrote:

>Hello,
>
>I read in places where IBM gives both pros and cons to running USS with a code page other than 1047, if for example your 3270 terminal emulator is set to Danish. I also find instructions such as this, though I'm not sure if this is all that is necessary to switch USS encoding:
>
>http://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxb200/danish.htm
>
>I'm curious how common it is for people to system wide set USS to be in a code page other than 1047 which matches their 3270 emulator?
>

Eek!:

o Underreaching. This ought to be as simple as the users' issuing the locale command
to choose their preferred character sets.

o The instructions seem to be for a system-wide setting. What if you nave a far-flung
user base with one user preferring Danish (277) and another preferring Polish (870)?

>If so, is this statement true for code points that don't match? The shell script will fail if it has $ or curly braces or brackets, etc., characters outside of that cp.
>
>What happens if I use putty or similar ssh client and then edit a shell script using vi? Would that be the same as using ISHELL to edit the same shell script, and my 3270 emulator is set to 1143?
>

Do tagging and autoconversion help? Or is autoconversion limited to 819<-->1047?
(or is that actually 819<-->[site's configured locale]?)

>Apologies if my questions aren't clear, as this topic isn't very clear to me at the moment. Hopefully someone with experience with this will help me understand it better.
>

ISPF Edit under ISPF 3.17 seems to deal splendidly with UTF-8 (at least when the
terminal code page is 1047 and all the UTF-8 characters are displayable). But I
tried setting my terminal to Russian (880) and only Latin characters display
correctly. Can't experiment much further. My x3270 claims to support Finnish (278),
Icelandic (871) and Norwegian (277), but not Danish. Wait! the URL you cited calls
Danish 277? Are they the same?

Should some of this be on ISPF-L or MVS-OE?

-- gil

R.S.

unread,

Oct 12, 2016, 5:37:04 PM10/12/16

to

W dniu 2016-10-12 o 19:26, Lindy Mayfield pisze:

> Hello,
>
> I read in places where IBM gives both pros and cons to running USS with a code page other than 1047, if for example your 3270 terminal emulator is set to Danish. I also find instructions such as this, though I'm not sure if this is all that is necessary to switch USS encoding:
>
> http://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxb200/danish.htm
>
> I'm curious how common it is for people to system wide set USS to be in a code page other than 1047 which matches their 3270 emulator?
>
> If so, is this statement true for code points that don't match? The shell script will fail if it has $ or curly braces or brackets, etc., characters outside of that cp.
>
> What happens if I use putty or similar ssh client and then edit a shell script using vi? Would that be the same as using ISHELL to edit the same shell script, and my 3270 emulator is set to 1143?
>
> Apologies if my questions aren't clear, as this topic isn't very clear to me at the moment. Hopefully someone with experience with this will help me understand it better.

Well, I always use non-US codepage on my 3270 emulator. Reason: I'm
Pole, speak Polish, sometimes write Polish naational characters. ;-)
Usually I use CP 870, but there is also similar codepage with EURO sign.
For C programming I've used to use CP 1047 [brackets].
However we don't set anything in USS. It's not needed for text (with
polish characters) writing or reading.

BTW: Polish codepage is a little bit tricky even on MVS side, REXX
concatenation sign, usually called pipe: ||
In Polish 3270 I see exclamation, so I have to code !!
Note, the exclamation mark is not part of Polich characer set, it's just
punctuation mark, like in other languages (maybe with exception for Spain)

--
Radoslaw Skorupka
Lodz, Poland

---
Tre tej wiadomo ci mo e zawiera informacje prawnie chronione Banku przeznaczone wy cznie do u ytku s u bowego adresata. Odbiorc mo e by jedynie jej adresat z wy czeniem dost pu osób trzecich. Je eli nie jeste adresatem niniejszej wiadomo ci lub pracownikiem upowa nionym do jej przekazania adresatowi, informujemy, e jej rozpowszechnianie, kopiowanie, rozprowadzanie lub inne dzia anie o podobnym charakterze jest prawnie zabronione i mo e by karalne. Je eli otrzyma e t wiadomo omy kowo, prosimy niezw ocznie zawiadomi nadawc wysy aj c odpowied oraz trwale usun t wiadomo w czaj c w to wszelkie jej kopie wydrukowane lub zapisane na dysku.

This e-mail may contain legally privileged information of the Bank and is intended solely for business use of the addressee. This e-mail may only be received by the addressee and may not be disclosed to any third parties. If you are not the intended addressee of this e-mail or the employee authorized to forward it to the addressee, be advised that any dissemination, copying, distribution or any other similar activity is legally prohibited and may be punishable. If you received this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete permanently this e-mail including any copies of it either printed or saved to hard drive.

mBank S.A. z siedzib w Warszawie, ul. Senatorska 18, 00-950 Warszawa, www.mBank.pl, e-mail: kon...@mBank.pl
S d Rejonowy dla m. st. Warszawy XII Wydzia Gospodarczy Krajowego Rejestru S dowego, nr rejestru przedsi biorców KRS 0000025237, NIP: 526-021-50-88. Wed ug stanu na dzie 01.01.2016 r. kapita zak adowy mBanku S.A. (w ca o ci wp acony) wynosi 168.955.696 z otych.

Paul Gilmartin

unread,

Oct 12, 2016, 6:30:11 PM10/12/16

to

On 2016-10-12 15:20, R.S. wrote:
>
> Well, I always use non-US codepage on my 3270 emulator. Reason: I'm Pole, speak Polish, sometimes write Polish naational characters. ;-) Usually I use CP 870, but there is also similar codepage with EURO sign. For C programming I've used to use CP 1047 [brackets].
> However we don't set anything in USS. It's not needed for text (with polish characters) writing or reading.
>
> BTW: Polish codepage is a little bit tricky even on MVS side, REXX concatenation sign, usually called pipe: ||
> In Polish 3270 I see exclamation, so I have to code !!
> Note, the exclamation mark is not part of Polich characer set, it's just punctuation mark, like in other languages (maybe with exception for Spain)
>

Does this look right? (I hope LISTSERV allows Unicode):
******************************************************* Top of Data ******************
-CAUTION- Data contains invalid (non-display) characters. Use command
===> FIND P'.' to position cursor to these
Host: IBM-1047 output: from_IBM-870
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240
0 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0

0 0 & - ˇ ˘ ° ą · { } \ 0
1 1 é / É a j ~ Ą A J ÷ 1
2 2 â ę Â Ę b k s ż B K S 2
3 3 ä ë Ä Ë c l t Ţ C L T 3
4 4 ţ ů ˝ Ů d m u Ż D M U 4
5 5 á í Á Í e n v § E N V 5
6 6 ă î Ă Î f o w ž F O W 6
7 7 č ľ Č Ľ g p x ź G P X 7
8 8 ç ĺ Ç Ĺ h q y Ž H Q Y 8
9 9 ć ß Ć ` i r z Ź I R Z 9
10 a [ ] | : ś ł Ś Ł Ě ď Ď
11 b . $ , # ň ń Ň Ń ô ű Ô Ű
12 c < * % @ đ š Đ Š ö ü Ö Ü
13 d ( ) _ ' ý ¸ Ý ¨ ŕ ť Ŕ Ť
14 e + ; > = ř ˛ Ř ´ ó ú Ó Ú
15 f ! ^ ? " ş ¤ Ş × ő ě Ő
****************************************************** Bottom of Data ****************
┌──────────────────────────────────────────────────────┐
│ CHARS '.' - not found on any lines (cols 1 to 255). │
└──────────────────────────────────────────────────────┘
The "invalid ... characters" message is spurious. It was fixed by PTF for CP 1047.
It appears it must be fixed individually for CP 870 and all additional code pages.

This was a UTF-8 UNIX file, tagged UTF-8 and containing the Unicode values of
the Polish alphabet. The behavior is less likely to occur with a Classic data set.

What's x'CA'?

-- gil

Lindy Mayfield

unread,

Oct 12, 2016, 7:27:44 PM10/12/16

to

"Eeek" pretty much sums it all. Even between latin1, latin9 and utf-8 is a huge eek. They got their own problems, too.

ISPF-L, no, MVS-OE list, yes. But since it's often a system wide setting that IBM may or may not recommend, that’s why I chose IBM-Main first to ask. It affect the entire machine, not just OMVS guys.

I thought about x-posting, but everyone always says, "sorry for x-posting, but...". Seems like a bad thing, so I thought to get the big picture from here first then go to the OMVS list for specifics.

Kind regards,
Lindy

R.S.

unread,

Oct 13, 2016, 5:55:52 AM10/13/16

to

W dniu 2016-10-13 o 00:29, Paul Gilmartin pisze:

x'CA' looks like minus, but the "proper" minus is x'60'.
x'4F' is an exclamation mark, not the "pipe".

BTW: a set of Polish characters:
ąćęłńóśżźĄĆĘŁŃÓŚŻŹ
A4599C8BBB67BBEABB
092ABEA27192ABEA49

"syntax" from ISPF Edit. Ą is x'B1', etc.

--
Radoslaw Skorupka
Lodz, Poland

(that's actually Łódź)

---
Treść tej wiadomości może zawierać informacje prawnie chronione Banku przeznaczone wyłącznie do użytku służbowego adresata. Odbiorcą może być jedynie jej adresat z wyłączeniem dostępu osób trzecich. Jeżeli nie jesteś adresatem niniejszej wiadomości lub pracownikiem upoważnionym do jej przekazania adresatowi, informujemy, że jej rozpowszechnianie, kopiowanie, rozprowadzanie lub inne działanie o podobnym charakterze jest prawnie zabronione i może być karalne. Jeżeli otrzymałeś tę wiadomość omyłkowo, prosimy niezwłocznie zawiadomić nadawcę wysyłając odpowiedź oraz trwale usunąć tę wiadomość włączając w to wszelkie jej kopie wydrukowane lub zapisane na dysku.

This e-mail may contain legally privileged information of the Bank and is intended solely for business use of the addressee. This e-mail may only be received by the addressee and may not be disclosed to any third parties. If you are not the intended addressee of this e-mail or the employee authorized to forward it to the addressee, be advised that any dissemination, copying, distribution or any other similar activity is legally prohibited and may be punishable. If you received this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete permanently this e-mail including any copies of it either printed or saved to hard drive.

mBank S.A. z siedzibą w Warszawie, ul. Senatorska 18, 00-950 Warszawa, www.mBank.pl, e-mail: kon...@mBank.pl
Sąd Rejonowy dla m. st. Warszawy XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr rejestru przedsiębiorców KRS 0000025237, NIP: 526-021-50-88. Według stanu na dzień 01.01.2016 r. kapitał zakładowy mBanku S.A. (w całości wpłacony) wynosi 168.955.696 złotych.

Paul Gilmartin

unread,

Oct 13, 2016, 4:24:49 PM10/13/16

to

On Wed, 12 Oct 2016 23:27:28 +0000, Lindy Mayfield <Lindy.M...@SAS.COM> wrote:
>
> ISPF-L, no, MVS-OE list, yes. But since it's often a system wide setting that IBM may or may not recommend, that’s why I chose IBM-Main first to ask. It affect the entire machine, not just OMVS guys.
>

This motivates several questions:

o On editing a Classic file or a new or untagged UNIX file,
should the programmer be allowed to select a CCSID? The
only options at present are UTF-8 and ASCII (I assume this
means ISO8859-1).

o Should the default be the CCSID of the terminal?

o Should it be possible to change the CCSID of the Edit
session in progress?

o When a new UNIX file is saved, should it automatically be
tagged with the CCSID of the Edit session?

o On a COPY of a file with a tagged CCSID different from the
Edit session, what should happen? Should the COPY dialog
allow the programmer to specify or override the file's CCSID?

Edit does a very good job of dealing with a file CCSID different
from the terminal's CCSID, for example 1208 and 1047. But if I
change a character to (nondisplayable) x'00'; HEX OFF; FIND P'.';
Edit places the cursor at an incorrect location and tells me
"x'20' found." That's a (displayable) ASCII blank that is present
at that incorrect cursor location.

UTF-8 introduces several varieties of "nondisplayable":

o A valid UTF-8/Unicode representation of a character not
in the terminal's repertoire.

o A valid UTF-8 representation of an unassigned Unicode point.
(How badly do we actually need a brontosaurus emoji!?)

o An overlong UTF-8 code.
https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

o An undecipherable UTF-8 code.

Should the diagnostics differentiate among these?

> -----Original Message-----
> From: Paul Gilmartin
> Sent: keskiviikkona 12. lokakuuta 2016 23.28
>

Hmmm... You asked about Danish, but your Mail Agent seems to
be speaking Finnish.

-- gil

Rick Troth

unread,

Oct 14, 2016, 4:24:35 PM10/14/16

to

Some history, and some hope.

On 10/13/16 16:24, Paul Gilmartin wrote:
> Hmmm... You asked about Danish,
> but your Mail Agent seems to be speaking Finnish.

:-)

The advantage in the non-EBCDIC* world is that the lower half of 8-bit
space is rather more consistent. And that space is where we have some
serious trouble on this side of the line (pipe symbol versus
exclamation, square brackets, curly braces).

Years ago, Edwin Hart (then at JHU APL) and others worked through SHARE
to normalize EBCDIC into a code page which could be translated to/from
non-EBCDIC* consistently and reliably. We've discussed it in the
lists/fora, perhaps this particular list/forum, even recently. (I've
slept since then.) The result of the SHARE effort was what some call
"Code Page 37 version 2". IBM never fully took-up the customer-produced
code page, but they did listen and they gave us CP 1047.

Outside of IBM, most have an affinity for a _one-to-one reversible
mapping_ which treats the EBCDIC side as CP37V2 and the non-EBCDIC* side
as ISO-8859-1. This doesn't help the Poles, I suppose. (It would have
been nice if IBM had a Polish code page which could use the /same
translate table/ and match-up with a Polish non-EBCDIC code page.)

Witness Dignus: aside from newline (see below) their default
/translation is the same/ as that gleaned from this two-decades-old
SHARE effort. Nice work. Good job.

CP 1047 is the best we have, if we are to live in the world IBM has
created for us.
(And some people accept the "CP1047" tag even though they're really
talking CP37V2.)
Sadly, CP 1047 doesn't help the Poles (nor the Danes, nor the Finns).
But now it appears we can change locale. Fabulous!

Thankfully locale variables (LANG, LC_CTYPE, et al) are indicated using
an even smaller subset of EBCDIC than those code points which map from
"low order non-EBCDIC".

There is still the problem that a stream of bytes might not be
recognized. Tagging files with charset ABC or code page 123 is clumsy at
best.

*Here's hope: *

Newline is always non-printable whether EBCDIC or non-EBCDIC*.
Given a stream of bytes of unknown meaning (but reasonably expecting
"plain text") on can trigger on 0x15 and be reasonably sure the
preceding is EBCDIC or trigger on 0x0A and be reasonably sure the
preceding is not. (And one can strip off or append 0x0D as needed.)

If the content is a shell script, locale variables can be recognized and
respected.
XML, HTML, and source code can trivially include reliable cues to the
proper locale for rendering.

Again, for a byte stream text file, look for EBCDIC "NL" newline or look
for non-EBCDIC "LF" linefeed. EBCDIC NL will never appear in non-EBCDIC
printable plain text. Non-EBCDIC LF will never appear in EBCDIC
printable plain text. It's a good test.

This is where even Dignus doesn't quite get it: They translate EBCDIC
0x15 to non-EBCDIC 0x0A. (Actual non-EBCDIC for "newline" is 0x85.) But
their table only helps with the above test, and _makes sense_ for cases
where someone did an un-measured translation. So I can't fault them.

Once the result of the EBCDIC (or not) check is known, one can apply
locale and "convert" appropriately. i.e., beyond the cramped walls of
8-bit space.

-- R; <><

* I say non-EBCDIC here because "ASCII" has baggage for many. Y'all know
what I mean.

Paul Gilmartin

unread,

Oct 14, 2016, 6:50:52 PM10/14/16

to

On Fri, 14 Oct 2016 16:24:18 -0400, Rick Troth wrote:
>
>The advantage in the non-EBCDIC* world is that the lower half of 8-bit
>space is rather more consistent. And that space is where we have some
>serious trouble on this side of the line (pipe symbol versus
>exclamation, square brackets, curly braces).
>

At least they should have stabilized the graphemes in the s/360 PrincOps.
But that might still leave conflicts among caret, logical not and pipe, leaky
pipe.

>... The result of the SHARE effort was what some call

>"Code Page 37 version 2". IBM never fully took-up the customer-produced
>code page, but they did listen and they gave us CP 1047.
>

Feels as if CP37V2 fell victim to pernicious NIH. I suspect IBM still doesn't
have a CCSID matching CP37V2.

>CP 1047 is the best we have, if we are to live in the world IBM has
>created for us.
>

Ref. Elon Musk.

>There is still the problem that a stream of bytes might not be
>recognized. Tagging files with charset ABC or code page 123 is clumsy at
>best.
>

"It's the best we have," but not available for Classic data sets.
The non-EBCDIC world gets along nicely with no tagging and a
presumption of UTF-8.

>This is where even Dignus doesn't quite get it: They translate EBCDIC
>0x15 to non-EBCDIC 0x0A. (Actual non-EBCDIC for "newline" is 0x85.) But
>their table only helps with the above test, and _makes sense_ for cases
>where someone did an un-measured translation. So I can't fault them.
>

CMS Pipelines (perhaps other CMS utilities) use 0x25 instead of 0x15.
There's some very Bad History behind all this.

>Once the result of the EBCDIC (or not) check is known, one can apply
>locale and "convert" appropriately. i.e., beyond the cramped walls of
>8-bit space.
>

But one must somehow know locale to differentiate among ISO-8859-x
and UTF-8 and the far greater number of EBCDIC CCSIDs.

>* I say non-EBCDIC here because "ASCII" has baggage for many. Y'all know
>what I mean.
>

Yes, but he hasn't been active on these lists for several months.

Answering my question earlier in this thread, I used ISPF 3.17
with an IBM-1047 terminal to view a UTF-8 file containing the 1047
character matrix. Displayed splendidly (Yaaay!) Then I used the
ISPF Edit Copy command to append another copy of the same file
(same tags, of course). It appears garbled. They could have done
better. If the active file is UTF-8 (pretty universal) and the copied
file is fully tagged, Copy might be expected either to convert it also
to UTF-8 or copy it in literally. Neither seems to have happened.

-- gil

I suppose I can mail my test data off-list to anyone interested.

Rick Troth

unread,

Oct 20, 2016, 1:17:20 PM10/20/16

to

On 10/14/16 18:50, Paul Gilmartin wrote:
> CMS Pipelines (perhaps other CMS utilities) use 0x25 instead of 0x15.
> There's some very Bad History behind all this.

So you're saying even Hartmann makes mistakes? Shock! :-)

0x25 is EBCDIC "linefeed".
Sadly, that's printable if misinterpreted as ASCII. The value of 0x15
and 0x0A is they're both non-printable in both worlds.

0x15 is EBCDIC "newline".
Even prior to USS, certain EBCDIC systems used newline (meaning 0x15)
where record boundaries were unavailable or not effective.

0x0A is referred to as newline in Unix land, but is officially
"linefeed", so would map to EBCDIC 0x25. Certain pedantic sticklers for
doco (against the grain of actual /usage/) ... cough ... IBM ... cough
... would insist on translating EBCDIC 0x25 to/from ASCII 0x0A. Bad
History indeed!

ASCII has a "newline" at 0x85. (aka NEL) Historically ASCII was a 7-bit
animal, so there is no 0x85 (or "was"), so Unix used linefeed as its
end-of-line marker, called it "newline", and made its contribution to
the Bad History.

Precedent:
EBCDIC systems use 0x15 to indicate end-of-line.
ASCII systems use 0x0A to indicate end-of-line.
(I hear the voice of David Warner as the MCP. Time to start quoting old
movies?)

>> >Once the result of the EBCDIC (or not) check is known, one can apply
>> >locale and "convert" appropriately. i.e., beyond the cramped walls of
>> >8-bit space.
> But one must somehow know locale to differentiate among ISO-8859-x
> and UTF-8 and the far greater number of EBCDIC CCSIDs.

True dat.
"Oh, an African swallow, may-be."
And if we're going to handle code pages or CCSIDs or what not then we're
going to have musical translate tables. Tropical zone? Temperate zone?
This sucks.

"Just not a European swallow, that's all I'm talkin about."

Problem with translate tables is they're not migratory.

-- R; <><

Paul Gilmartin

unread,

Oct 20, 2016, 1:43:33 PM10/20/16

to

On Thu, 20 Oct 2016 13:17:07 -0400, Rick Troth wrote:

>On 10/14/16 18:50, Paul Gilmartin wrote:
>> CMS Pipelines (perhaps other CMS utilities) use 0x25 instead of 0x15.
>> There's some very Bad History behind all this.
>
>So you're saying even Hartmann makes mistakes? Shock! :-)
>

Rather, I'd call him a "pedantic stickler[] for doco".

>0x25 is EBCDIC "linefeed".
>Sadly, that's printable if misinterpreted as ASCII. The value of 0x15
>and 0x0A is they're both non-printable in both worlds.
>

Many ASCII editors (and, I believe, Info-Zip) make decisions on statistics.
Any UTF-8 file is valid ISO8859-1. Most ISO8859-1 files are not valid
UTF-8. Files containing no characters >=0x80 are probably ASCII,
etc.

>0x0A is referred to as newline in Unix land, but is officially
>"linefeed", so would map to EBCDIC 0x25. Certain pedantic sticklers for
>doco (against the grain of actual /usage/) ... cough ... IBM ... cough
>... would insist on translating EBCDIC 0x25 to/from ASCII 0x0A. Bad
>History indeed!
>

The burden of that Bad History impelled IBM to transgress its own
specs when implementing iconv.

Linux iconv insists ... I might less rather call the developers "pedantic
sticklers" than merely naive adherents to the doco. IBM could alleviate
this by defining a code page, call it 1047x, with LF at 0x15 and NL at
0x25. But pedantic sticklers of another ilk declare, "Curst be that
moves my [control characters]."

And IBM could have evaded all this and many other conflicts if IBM
had chosen to make OpenEdition ASCII-based instead of EBCDIC.

-- gil