FTP distributed system EBCDIC encoded file

Frank Swarbrick

unread,

Jul 27, 2021, 6:02:05 PM7/27/21

to

We have a vendor that is providing a file that is EBCDIC (IBM-1140) encoded, but also includes an NL record/line terminator. The source system is NOT a mainframe system. I'm trying to figure out how to FTP the file to the mainframe and have it treat NL as, well, NL; i.e. a record terminator. Binary mode (no SITE options) doesn't work because it stores the NL characters. ASCII mode (no SITE options) doesn't work, I believe because it still expects the CRLF delimiter. I tried specifying "SITE TYPE E" (EBCDIC) and that also does not eliminate the NL delimiter.

Any thoughts? We're seeing if the vendor can just not use a delimiter at all, but no luck yet.

Note: They can create it in UTF-8, but they are including the UTF-8 Byte Order Mark (BOM). I am able to get z/OS to strip the BOM, but I have to specify the transmission as being "multi-byte", so the destination has to be VB. Which we can deal with, but we'd prefer FB as that is how we have it from the old vendor.

FYI, here are the 3 "SITE" commands mentioned in the note above:
encoding=mbcs
mbdataconn=(ibm-1140,utf-8)
UnicodeFileSystemBOM=never

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to list...@listserv.ua.edu with the message: INFO IBM-MAIN

Phil Smith III

unread,

Jul 27, 2021, 6:48:01 PM7/27/21

to

What Charles said. If you have Pipelines, this is trivial.

...phsiii

Paul Gilmartin

unread,

Jul 27, 2021, 7:05:25 PM7/27/21

to

On Tue, 27 Jul 2021 17:01:56 -0500, Frank Swarbrick wrote:

>We have a vendor that is providing a file that is EBCDIC (IBM-1140) encoded, but also includes an NL record/line terminator. The source system is NOT a mainframe system. I'm trying to figure out how to FTP the file to the mainframe and have it treat NL as, well, NL; i.e. a record terminator. Binary mode (no SITE options) doesn't work because it stores the NL characters. ASCII mode (no SITE options) doesn't work, I believe because it still expects the CRLF delimiter. I tried specifying "SITE TYPE E" (EBCDIC) and that also does not eliminate the NL delimiter.
>
>Any thoughts? We're seeing if the vendor can just not use a delimiter at all, but no luck yet.
>

Doesn't z/OS use NL as its line separator? Verify/refute this with:
echo 'foo
bar' | od -tx1

I'd expect you to see:
0000000 86 96 96 15 82 81 99 25
0000010

where the x'15' is the NL. I expect transfer in binary to preserve the NL and simply work.

>Note: They can create it in UTF-8, but they are including the UTF-8 Byte Order Mark (BOM). I am able to get z/OS to strip the BOM, but I have to specify the transmission as being "multi-byte", so the destination has to be VB. Which we can deal with, but we'd prefer FB as that is how we have it from the old vendor.
>

Use of a BOM with UTF-8 is generally deprecated.

-- gil

Pommier, Rex

unread,

Jul 27, 2021, 7:21:40 PM7/27/21

to

Frank,

Have you tried FTPing it as binary into a Unix file instead of a dataset? I would think the Unix file system would happily accept the newline delimiter.

Rex

On Tue, 27 Jul 2021 17:01:56 -0500, Frank Swarbrick wrote:

>We have a vendor that is providing a file that is EBCDIC (IBM-1140) encoded, but also includes an NL record/line terminator. The source system is NOT a mainframe system. I'm trying to figure out how to FTP the file to the mainframe and have it treat NL as, well, NL; i.e. a record terminator. Binary mode (no SITE options) doesn't work because it stores the NL characters. ASCII mode (no SITE options) doesn't work, I believe because it still expects the CRLF delimiter. I tried specifying "SITE TYPE E" (EBCDIC) and that also does not eliminate the NL delimiter.
>
>Any thoughts? We're seeing if the vendor can just not use a delimiter at all, but no luck yet.
>

>Note: They can create it in UTF-8, but they are including the UTF-8 Byte Order Mark (BOM). I am able to get z/OS to strip the BOM, but I have to specify the transmission as being "multi-byte", so the destination has to be VB. Which we can deal with, but we'd prefer FB as that is how we have it from the old vendor.
>

The information contained in this message is confidential, protected from disclosure and may be legally privileged. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any disclosure, distribution, copying, or any action taken or action omitted in reliance on it, is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to this message and destroy the material in its entirety, whether in electronic or hard copy format. Thank you.

Seymour J Metz

unread,

Jul 28, 2021, 7:36:11 AM7/28/21

to

Sometimes it's outright prohibitted, e.g., RFC 8259: "Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error."

Further, IETF is moving in the direction of protocols in which UTF-8 is mandatory, and RFC 3629, section 6. Byte order mark (BOM), states

In the meantime, the uncertainty unfortunately remains and may affect
Internet protocols. Protocol specifications MAY restrict usage of
U+FEFF as a signature in order to reduce or eliminate the potential
ill effects of this uncertainty. In the interest of striking a
balance between the advantages (reduction of uncertainty) and
drawbacks (loss of the signature function) of such restrictions, it
is useful to distinguish a few cases:

o A protocol SHOULD forbid use of U+FEFF as a signature for those
textual protocol elements that the protocol mandates to be always
UTF-8, the signature function being totally useless in those
cases.

o A protocol SHOULD also forbid use of U+FEFF as a signature for
those textual protocol elements for which the protocol provides
character encoding identification mechanisms, when it is expected
that implementations of the protocol will be in a position to
always use the mechanisms properly. This will be the case when
the protocol elements are maintained tightly under the control of
the implementation from the time of their creation to the time of
their (properly labeled) transmission.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [IBM-...@LISTSERV.UA.EDU] on behalf of Paul Gilmartin [0000000433f0781...@LISTSERV.UA.EDU]
Sent: Tuesday, July 27, 2021 7:05 PM
To: IBM-...@LISTSERV.UA.EDU
Subject: Re: FTP distributed system EBCDIC encoded file

Seymour J Metz

unread,

Jul 28, 2021, 8:29:36 AM7/28/21

to

To start, there is a difference between how certain code points are defined and how various operating systems use them.

For historical reasons, Unix misused the Line Feed (LF) character as a logical new line instead of using the more appropriate 2-character CRLF; other systems, e.g., PC-DOS used CRLF. While ASCII has no new line character, EBCDIC and Unicode each have one. Unix System Services uses NL for a logical new line, so if you binary FTP to a Unix file and tag it as EBCDIC then everything should be good to go.

Does anybody know whether Unix System Services uses LF or NEL as a logical new line for files tagged as UTF-8?

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________

From: IBM Mainframe Discussion List [IBM-...@LISTSERV.UA.EDU] on behalf of Frank Swarbrick [frank.s...@OUTLOOK.COM]
Sent: Tuesday, July 27, 2021 6:01 PM
To: IBM-...@LISTSERV.UA.EDU
Subject: FTP distributed system EBCDIC encoded file

Roger Bolan

unread,

Jul 28, 2021, 9:10:04 AM7/28/21

to

If you can get the original file with the NL to your mainframe with purely
binary transfers there are a couple of things that can help.
On a zVM CMS system I made a Rexx to do it like this:
/*rexx*/
Parse arg fn ft fm .
Address command
/* splits on 0x15 */
'PIPE <' fn ft fm '| DEBLOCK LINEEND | >' fn ft 'A'

On TSO starting from a dataset I used
/* REXX COMMAND TO FIX UNSPLIT HFS FILES */
/* Roger Bolan May 18, 2010 */
Trace C
ADDRESS tso
PARSE ARG olddsn
if olddsn = '?' then call explain
stripdsn = Strip(olddsn,'B',"'")
"OPUT '"||stripdsn||"' '/tmp/"||stripdsn||"'"
"OGET '/tmp/"||stripdsn||"' '"||stripdsn||"'" "TEXT"
"OSHELL rm /tmp/"||stripdsn
return RC
explain:
say 'Use from ISPF 3.4'
say 'Example format: %split / '
say 'use / to enter the quoted fully qualified name from ISPF 3.4'
exit 0

Try that.

Regards,
--Roger

On Tue, Jul 27, 2021, 4:02 PM Frank Swarbrick <frank.s...@outlook.com>
wrote:

Paul Gilmartin

unread,

Jul 28, 2021, 9:44:46 AM7/28/21

to

On Wed, 28 Jul 2021 12:29:22 +0000, Seymour J Metz wrote:
> ...
>For historical reasons, Unix misused the Line Feed (LF) character as a logical new line ...
>
For similar bad historical reasons, z/OS iconv (and UNICODE services generally?)
mistranslates ASCII* LF<->NL EBCDIC, causing compatibility problems.

>Does anybody know whether Unix System Services uses LF or NEL as a logical new line for files tagged as UTF-8?
>

I'd expect the answer to be the same as for files taggeed ISO-8859-x: UNIX or DOS-think.

Should 819 differ from 1252 in that respect?

-- gil

Seymour J Metz

unread,

Jul 28, 2021, 10:02:27 AM7/28/21

to

The traditional separator was CRLF, but the Multics developers decide to use a single character, and Unix followed suit; I don't know why they didn't choose RS ('1E'X) which, IMHO, would have been a much more sensible choice. Maybe the wanted to support multi-line messages?

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________

From: IBM Mainframe Discussion List [IBM-...@LISTSERV.UA.EDU] on behalf of Paul Gilmartin [0000000433f0781...@LISTSERV.UA.EDU]
Sent: Wednesday, July 28, 2021 9:44 AM
To: IBM-...@LISTSERV.UA.EDU
Subject: Re: FTP distributed system EBCDIC encoded file

Paul Gilmartin

unread,

Jul 28, 2021, 11:22:23 AM7/28/21

to

On Wed, 28 Jul 2021 14:02:14 +0000, Seymour J Metz wrote:

>The traditional separator was CRLF, but the Multics developers decide to use a single character, and Unix followed suit; I don't know why they didn't choose RS ('1E'X) which, IMHO, would have been a much more sensible choice.
>

I heartily agree. CRLF is device-bound thinking, akin to Machine Carriage
Control. Rendering should be the responsibility of the device driver.

And Classic MacOS chose CR because that's the code generated by ENTER on
some programmer's keyboard.

>Maybe the wanted to support multi-line messages?
>

Why couldn't RS have served that function, even as LF operates in UNIX?:
682 $ printf 'first line\nsecond line.\n'
first line
second line.
683 $

Seymour J Metz

unread,

Jul 28, 2021, 11:44:29 AM7/28/21

to

Agree with what? I consider CRLF to be the best choice, given the limitations of ASCII, with RS reasonable if you don't want to support multi-line records.

The Multics developers chose not to use CR because that would have prevented overprinting.

If you use RS to separate lines then you can't use it to separate groups of lines. There's no equivalent to

foo <CRLF> bar <CRLF> baz <RS> Tom <CRLF> Dick <CRLF> Harry

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [IBM-...@LISTSERV.UA.EDU] on behalf of Paul Gilmartin [0000000433f0781...@LISTSERV.UA.EDU]

Sent: Wednesday, July 28, 2021 11:22 AM

To: IBM-...@LISTSERV.UA.EDU
Subject: Re: FTP distributed system EBCDIC encoded file

Paul Gilmartin

unread,

Jul 28, 2021, 12:10:35 PM7/28/21

to

On Wed, 28 Jul 2021 15:44:16 +0000, Seymour J Metz wrote:
>
>The Multics developers chose not to use CR because that would have prevented overprinting.
>
>If you use RS to separate lines then you can't use it to separate groups of lines. There's no equivalent to
>
> foo <CRLF> bar <CRLF> baz <RS> Tom <CRLF> Dick <CRLF> Harry
>

How about, then:
foo <RS> bar <RS> baz <GS> Tom <RS> Dick <RS> Harry

.... choosing arbitrarily among:

ASCII code 28 = FS ( File separator )
ASCII code 29 = GS ( Group separator )
ASCII code 30 = RS ( Record separator )
ASCII code 31 = US ( Unit separator )

... but that's venturing into markup language issues. How many nesting
levels should be supported? "There are only three nice numbers:
zero, one, and 'as many as you like'." (Source obscure)

Seymour J Metz

unread,

Jul 28, 2021, 1:54:54 PM7/28/21

to

Is a unit limited to a single line? Choosing any of those separators loses functionality, unlike using CRLF. Of course, with Unicode there's NEL, but that ship has already sailed.

The number of nesting levels is limited by the number of separator characters defined, which is why I wish that ARPAnet had opted for binary protocols way back when instead of character delimited protocols. That, too, is not likely to change.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [IBM-...@LISTSERV.UA.EDU] on behalf of Paul Gilmartin [0000000433f0781...@LISTSERV.UA.EDU]

Sent: Wednesday, July 28, 2021 12:10 PM

To: IBM-...@LISTSERV.UA.EDU
Subject: Re: FTP distributed system EBCDIC encoded file