[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Kai Peter Nacke via llvm-dev

unread,

Jun 10, 2020, 3:11:22 PM6/10/20

to llvm...@lists.llvm.org

As part of IBM’s ongoing efforts to improve the z/OS ecosystem, our
current plans involve adding support for the z/OS platform to LLVM and
Clang. Our goal is to have a viable C and C++ LLVM compiler and runtime
library that generates code for, and runs on z/OS.

Long term, we expect to have a compiler and library that supports the
platform more fully. We intend to support the native character encoding
(EBCDIC), file system - z/OS UNIX System Services (z/OS UNIX) files and
datasets, addressing modes (31-bit and 64-bit, and possibly 24-bit),
different floating and fixed point formats, systems programming
capabilities, language co-processors, and generating code output in the
z/OS object file formats (GOFF and XOBJ).

The tentative plan is to initially support the z/OS Unix System Services
(z/OS UNIX) interface with EBCDIC and ASCII 64-bit code generation. In
particular, our intent would be to focus on:
1) C++ standard library support. We would be making changes to the library
so it can work on z/OS. We would need some design discussions with the
community for issues such as character encodings.
2) EBCDIC source file input. We would need reviews at the Clang level when
dealing with reading source files and dealing with multiple code pages.
3) GOFF object file output. We would need reviews in LLVM to add a new
object file output format.
Our plans include setting up z/OS build bots and we will update the list
when we have them ready to go.

To begin, we plan to add patches that would:
- Set the new triple for z/OS
- Make changes to the build recipes and tools (cmake, etc.) as needed to
allow building for z/OS
Following that, we intend to start on the focus areas listed above:

1) Add patches to enable building and using the C++ standard library on
z/OS. In particular, issues dealing with EBCDIC would need to be
addressed. We would need to have functions in the headers (e.g. iostream)
that work on ASCII encoded strings, and functions that work on EBCDIC
encoded strings. These would need to work with the underlying system C
library (e.g. printf) that provides the actual functionality. For example,
currently, the z/OS C library has (at least) two sets of functions (ASCII
and EBCDIC versions). The one used by the application is selected at
compile time during the system header file processing which selects the
correct function via mapping the programmer function name (e.g. printf)
into one that the application will link to (e.g. __printf for EBCDIC and
\174\174A00118 for ASCII). We would also add patches to disable
functionality when on z/OS where there is no support for the
functionality. For example, thread specific locales would be disabled when
in a non-POSIX mode.
Our intent is that follow on patches would incrementally add support in
tandem with the compiler for features that require it and for other z/OS
specifics such as various floating/fixed point formats.

2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded
input source files. This would be done at the file open time to allow the
rest of Clang to operate as if the source was UTF-8 and so require no
changes downstream. Feedback on this plan is welcome from the Clang
community.
Our intent is that later patches would handle execution character set
differences. Collaboration with the community here would be useful in
areas such as adding in exec-charset and library selection options and
strategies.
Our intent is also to make changes to support any platform issues,
processing native C header files, and idiosyncrasies on z/OS such as
having no native strnlen function. We would update test tooling to handle
character encoding issues as needed. Further design discussion will take
place on the Clang mailing list.

3) Add patches to LLVM that will stub out GOFF object binary generation.
We would not be generating assembly (HLASM in z/OS), and instead only
generate the binary object directly for the initial round of changes.
Assembly generation would follow in later stages once we have a working
compiler on z/OS. Feedback on this plan of direction is appreciated.
Our intent is that patches that incrementally add support for GOFF
object generation such as code sections and records would follow. The next
steps after support for the object file format would be handling the z/OS
XPLINK calling convention. This would involve changes to both Clang and
LLVM and we intend to follow the same style of functional component
responsibility as is done for other platforms calling conventions. If we
believe deviations from this is necessary, we plan on notifying the
community and ensuring the reasons behind the deviations are valid and
accepted.

Any feedback or comments are welcome.

Notice: IBM’s statements regarding its plans, directions, and intent are
subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our
general product direction and it should not be relied on in making a
purchasing decision. The information mentioned regarding potential future
products is not a commitment, promise, or legal obligation to deliver any
material, code or functionality. Information about potential future
products may not be incorporated into any contract. The development,
release, and timing of any future features or functionality described for
our products remains at our sole discretion.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Eli Friedman via llvm-dev

unread,

Jun 10, 2020, 3:30:42 PM6/10/20

to Kai Peter Nacke, llvm...@lists.llvm.org

I'm not that familiar with the C++ library bits, but looks fine at first glance.

For EBCDIC source files, what you've outlined matches what we've discussed previously when the topic has come up on cfe-dev.

For GOFF object generation, we want to make sure that we can write reasonable tests for patches at each stage. If there's no assembly output, we have to verify GOFF binary output. And probably the only reasonable way to do that is to first teach llvm-objdump to understand GOFF.

-Eli

Hubert Tong via llvm-dev

unread,

Jun 10, 2020, 5:52:27 PM6/10/20

to Kai Peter Nacke, llvm-dev

On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm...@lists.llvm.org> wrote:

2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded
input source files. This would be done at the file open time to allow the
rest of Clang to operate as if the source was UTF-8 and so require no
changes downstream. Feedback on this plan is welcome from the Clang
community.

Is there a statement that can be made with respect to accepting UTF-8 encoded source files in a z/OS hosted environment or is it implied that it works with no changes (and there are no changes that will break this functionality)?

Also, would these changes enable the consumption of non-UTF-8 encoded source files on Clang as hosted on other platforms?

Fangrui Song via llvm-dev

unread,

Jun 10, 2020, 6:20:31 PM6/10/20

to Hubert Tong, llvm-dev

An 2018 thread has some discussions about non-UTF-8 encoding.
http://lists.llvm.org/pipermail/cfe-dev/2018-January/056745.html

Which case does z/OS want to support?

Fangrui Song via llvm-dev

unread,

Jun 10, 2020, 7:33:15 PM6/10/20

to Kai Peter Nacke, llvm...@lists.llvm.org

On 2020-06-10, Kai Peter Nacke via llvm-dev wrote:
>As part of IBM’s ongoing efforts to improve the z/OS ecosystem, our
>current plans involve adding support for the z/OS platform to LLVM and
>Clang. Our goal is to have a viable C and C++ LLVM compiler and runtime
>library that generates code for, and runs on z/OS.
>
>Long term, we expect to have a compiler and library that supports the
>platform more fully. We intend to support the native character encoding
>(EBCDIC), file system - z/OS UNIX System Services (z/OS UNIX) files and
>datasets, addressing modes (31-bit and 64-bit, and possibly 24-bit),
>different floating and fixed point formats, systems programming
>capabilities, language co-processors, and generating code output in the
>z/OS object file formats (GOFF and XOBJ).
>
>The tentative plan is to initially support the z/OS Unix System Services
>(z/OS UNIX) interface with EBCDIC and ASCII 64-bit code generation. In
>particular, our intent would be to focus on:
>1) C++ standard library support. We would be making changes to the library
>so it can work on z/OS. We would need some design discussions with the
>community for issues such as character encodings.

I am a bit worried about introducing another encoding but there are
other subthreads discussing this matter.

>2) EBCDIC source file input. We would need reviews at the Clang level when
>dealing with reading source files and dealing with multiple code pages.
>3) GOFF object file output. We would need reviews in LLVM to add a new
>object file output format.
>Our plans include setting up z/OS build bots and we will update the list
>when we have them ready to go.

I think most contributors don't have access and probably don't have any
desire to buy z/OS devices if their changes somehow break z/OS. When a
commit caused problems on z/OS, is it the responsibility of the z/OS
community to fix:) ?

If the patch author wants to be nice and fix the issues, are z/OS
developers willing to provide VM access or other resources if such a
scenario arises? Are the binary formats experimental like experimental
targets? (Someone may disagree with experimental targets, but I believe
a new binary format has much larger impact than a new target and it
seems we will have two more)

As someone who cares a lot on the lowlevel stuff, I have more questions
regarding this bullet point.

A MCGOFFStreamer inheriting MCObjectStreamer and possibly also a
MCXOBJStreamer? Do you expect GOFF and XOBJ will fit well in the current
MC framework and will not cause too much burden for developers who are
refactoring generic MC code?

Will z/OS reuse binary utilities (llvm-ar llvm-cov llvm-cxxfilt llvm-nm
llvm-objcopy llvm-objdump llvm-readobj llvm-size llvm-strings llvm-strip
llvm-symbolizer)? Are their any utilities where z/OS's counterparts
diverge enough (from commonplace platforms) that creating new utilities
may be better than reusing existing ones?

Kai Peter Nacke via llvm-dev

unread,

Jun 11, 2020, 5:24:00 AM6/11/20

to Eli Friedman, llvm...@lists.llvm.org

Eli Friedman <efri...@quicinc.com> wrote on 10.06.2020 21:30:26:

> From: Eli Friedman <efri...@quicinc.com>
> To: Kai Peter Nacke <kai....@de.ibm.com>, "llvm-
> d...@lists.llvm.org" <llvm...@lists.llvm.org>
> Date: 10.06.2020 21:30
> Subject: [EXTERNAL] RE: [llvm-dev] RFC: Adding support for the z/OS
> platform to LLVM and clang
>

> I'm not that familiar with the C++ library bits, but looks fine at
> first glance.
>
> For EBCDIC source files, what you've outlined matches what we've
> discussed previously when the topic has come up on cfe-dev.
>
> For GOFF object generation, we want to make sure that we can write
> reasonable tests for patches at each stage. If there's no assembly
> output, we have to verify GOFF binary output. And probably the only
> reasonable way to do that is to first teach llvm-objdump to understand
GOFF.

Yes, for sure. Our idea is to use llvm-readobj and llvm-objdump for tests.

Kai Peter Nacke via llvm-dev

unread,

Jun 11, 2020, 7:32:18 AM6/11/20

to Fangrui Song, llvm...@lists.llvm.org

Fangrui Song <mas...@google.com> wrote on 11.06.2020 01:32:55:

> >Our plans include setting up z/OS build bots and we will update the
list
> >when we have them ready to go.
>
> I think most contributors don't have access and probably don't have any
> desire to buy z/OS devices if their changes somehow break z/OS. When a
> commit caused problems on z/OS, is it the responsibility of the z/OS
> community to fix:) ?
>
> If the patch author wants to be nice and fix the issues, are z/OS
> developers willing to provide VM access or other resources if such a
> scenario arises? Are the binary formats experimental like experimental
> targets? (Someone may disagree with experimental targets, but I believe
> a new binary format has much larger impact than a new target and it
> seems we will have two more)

I'm not sure if I really understand your concern. Sorry for that.
I assume that no LLVM developer has access to all supported hardware, so
in
this respect z/OS is no different than any other target. Following the
developer
policy (compiling the code and running the tests) should catch most
problems here. Of course, changing z/OS specific source code is a
different
story. I expect that this is mostly limited to the Support library and
should
pose no general problem.

The intention is to add support for GOFF format first. So far, we just
implemented
the required MC interfaces and do not use something like an "experimental
binary
format". As long as GOFF is not requested through the triple, the new
format is not
used and therefore should not affect other platforms.
But I'm sure you have a specific scenario in mind. May be can elaborate a
bit on it?

> > Our intent is that patches that incrementally add support for GOFF
> >object generation such as code sections and records would follow. The
next
> >steps after support for the object file format would be handling the
z/OS
> >XPLINK calling convention. This would involve changes to both Clang and
> >LLVM and we intend to follow the same style of functional component
> >responsibility as is done for other platforms calling conventions. If
we
> >believe deviations from this is necessary, we plan on notifying the
> >community and ensuring the reasons behind the deviations are valid and
> >accepted.
>
> As someone who cares a lot on the lowlevel stuff, I have more questions
> regarding this bullet point.
>
> A MCGOFFStreamer inheriting MCObjectStreamer and possibly also a
> MCXOBJStreamer? Do you expect GOFF and XOBJ will fit well in the current
> MC framework and will not cause too much burden for developers who are
> refactoring generic MC code?

We'll begin with GOFF first. This fits well with the current
MCObjectStreamer
hierarchy and should therefore not cause a burden when refactoring MC
code.

> Will z/OS reuse binary utilities (llvm-ar llvm-cov llvm-cxxfilt llvm-nm
> llvm-objcopy llvm-objdump llvm-readobj llvm-size llvm-strings llvm-strip
> llvm-symbolizer)? Are their any utilities where z/OS's counterparts
> diverge enough (from commonplace platforms) that creating new utilities
> may be better than reusing existing ones?

We reuse at least the binary utilities which are required for testing,
e.g.
llvm-readobj and llvm-objdump are very valuable here. The z/OS UNIX System
Services are a certified UNIX, so the commonplace utilities have all a
familar interface. Of course, there are differences in detail.

Thanks for your feedback!

Kai Peter Nacke via llvm-dev

unread,

Jun 11, 2020, 12:08:13 PM6/11/20

to Hubert Tong, llvm-dev

Hubert Tong <hubert.rein...@gmail.com> wrote on 10.06.2020
23:51:54:

> From: Hubert Tong <hubert.rein...@gmail.com>
> To: Kai Peter Nacke <kai....@de.ibm.com>

> Cc: llvm-dev <llvm...@lists.llvm.org>
> Date: 10.06.2020 23:52
> Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS
> platform to LLVM and clang

The intention is to use the auto-conversion feature from the
language environment. Currently, this platform feature does not
handle conversions of multi-byte encodings, so at this time
consumption of UTF-8 encoded source files is not possible.
For the same reason, this does not enable the consumption of
non-UTF-8 encoded source files on other platforms.

Corentin via llvm-dev

unread,

Jun 11, 2020, 4:14:02 PM6/11/20

to llvm...@lists.llvm.org

Hello.

> 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded

input source files. This would be done at the file open time to allow the 
rest of Clang to operate as if the source was UTF-8 and so require no 
changes downstream. Feedback on this plan is welcome from the Clang 
community.

Would it be correct to assume that this EBCDIC -> UTF-8 mapping would be as prescribed by
UTF-EBCDIC / IBM CDRA, notably for the control characters that do not map exactly?
Notably, if the execution encoding is EBCDIC, is '0x06' equivalent to '0086', etc?

The question "Is Unicode sufficient to represent all characters present in the input source without using the Private Use Area?" is one that

is relevant to both Clang and the C/C++ standard. ( I do hope that it is the case!)

Thanks,

Corentin

Hubert Tong via llvm-dev

unread,

Jun 11, 2020, 4:53:43 PM6/11/20

to Kai Peter Nacke, llvm-dev

If the internal representation is still UTF-8, consuming UTF-8 should involve not converting. It is sounding like the internal representation has been changed to ISO-8859-1 in order to support characters outside those in US-ASCII. If it is indeed internally fixed to ISO-8859-1, then the question of future support for non-Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better tradeoff to leave the internal representation as UTF-8 and restrict the support to the US-ASCII subset for now.

For the same reason, this does not enable the consumption of
non-UTF-8 encoded source files on other platforms.

Thanks Kai for clarifying. I think this direction leads to some questions around testing.

The auto-conversion feature makes use of some filesystem-specific features such as filetags that indicate the associated coded character set. In terms of the testing environment on a z/OS system under USS, will there be documentation or scripts available for establishing the necessary file properties on the local tree? It also sounds like there would be some tests that are specific to z/OS-hosted builds that test the conversion facilities.

Also, if the platform feature does not handle conversions of multi-byte encodings, I am wondering if alternative mechanisms (such as iconv) have been investigated. I suppose there is an issue over how source positions are determined; however, I do not see how an extension of the autoconversion facility would avoid the said issue.

Corentin via llvm-dev

unread,

Jun 12, 2020, 12:09:55 AM6/12/20

to llvm...@lists.llvm.org

Hello.

> 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded

input source files. This would be done at the file open time to allow the 
rest of Clang to operate as if the source was UTF-8 and so require no 
changes downstream. Feedback on this plan is welcome from the Clang 
community.

Would it be correct to assume that this EBCDIC -> UTF-8 mapping would be as prescribed by

UTF-EBCDIC / IBM CDRA, notably for the control characters that do not map exactly?
Notably, if the execution encoding is EBCDIC, is '0x06' equivalent to '0086', etc?

Louis Dionne via llvm-dev

unread,

Jun 12, 2020, 10:10:35 AM6/12/20

to Kai Peter Nacke, llvm...@lists.llvm.org

I must admit I am wary of adding even more complexity to the libc++ headers. We have a big problem of configuration explosion in libc++, and that will unfortunately not help. If there are ways to support EDBCDIC non-intrusively, that would be better. In all cases, let's talk. Ideally, we could have an exploratory chat where someone from your side can show me what you're trying to achieve before we go down the route of having a full patch posted to Phab.

> We would also add patches to disable
> functionality when on z/OS where there is no support for the
> functionality. For example, thread specific locales would be disabled when
> in a non-POSIX mode.
> Our intent is that follow on patches would incrementally add support in
> tandem with the compiler for features that require it and for other z/OS
> specifics such as various floating/fixed point formats.

At the risk of sounding grumpy, this scares me too. Conditionally removing parts of libc++ is tricky. It basically creates a different "compile-time code path" through #ifdefs every time we do that. Those are a source of bugs and a maintenance burden, and I've been trying to remove as many as I can. For example, you'll notice I'm fairly aggressive about removing workarounds for old untested Apple platforms. It's much easier when the underlying system just provides what the library needs.

But in all cases, let's have a chat to see the extent of the changes you need and we can go from there. Also, I would suggest setting up build bots for libc++ on z/OS from the very start -- one can only claim to support a system if it's tested continuously, so patches will be received with a lot more enthusiasm if they are tested somehow.

If you're on the CppLang Slack, feel free to drop me a line and we can have an informal chat.

Cheers,
Louis

Kai Peter Nacke via llvm-dev

unread,

Jun 16, 2020, 8:51:13 AM6/16/20

to Corentin, llvm...@lists.llvm.org

The current goal is to make only minimal changes to the frontend to enable
reading of EBCDIC encoded files. For this, we use the auto-conversion
service of z/OS UNIX System Services (
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm
), together with file tagging and setting the CCSID for the program and
for opened files.. The auto-conversion service supports round-trip
conversion between EBCDIC and Enhanced ASCII. With it, boot strapping with
EBCDIC source files is possible.
Of course, more complete UTF-8 support is a valid implementation
alternative.

Kai Peter Nacke via llvm-dev

unread,

Jun 16, 2020, 9:17:04 AM6/16/20

to Hubert Tong, llvm-dev

Hubert Tong <hubert.rein...@gmail.com> wrote on 11.06.2020
22:53:14:

> The intention is to use the auto-conversion feature from the
> language environment. Currently, this platform feature does not
> handle conversions of multi-byte encodings, so at this time
> consumption of UTF-8 encoded source files is not possible.
> If the internal representation is still UTF-8, consuming UTF-8
> should involve not converting. It is sounding like the internal
> representation has been changed to ISO-8859-1 in order to support
> characters outside those in US-ASCII. If it is indeed internally
> fixed to ISO-8859-1, then the question of future support for non-
> Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better
> tradeoff to leave the internal representation as UTF-8 and restrict
> the support to the US-ASCII subset for now.

The intention is to initially restrict the support to the US-ASCII subset.
This enables compiling with EBCDIC-encoding files and does not exclude
further development for true UTF-8 support.

> For the same reason, this does not enable the consumption of
> non-UTF-8 encoded source files on other platforms.

Yes, because a platform-specific feature is used, it does not enable
reading of non-UTF-8 encoded files on other platforms.

> Thanks Kai for clarifying. I think this direction leads to some
> questions around testing.
>
> The auto-conversion feature makes use of some filesystem-specific
> features such as filetags that indicate the associated coded
> character set. In terms of the testing environment on a z/OS system
> under USS, will there be documentation or scripts available for
> establishing the necessary file properties on the local tree? It
> also sounds like there would be some tests that are specific to z/
> OS-hosted builds that test the conversion facilities.

With a git clone under z/OS USS, the files get automatically tagged as
Latin-1, requiring no further setup.
We also have some tests which tests the text conversion. Of course, this
only runs on z/OS USS due to the use of the conversion service.

> Also, if the platform feature does not handle conversions of multi-
> byte encodings, I am wondering if alternative mechanisms (such as
> iconv) have been investigated. I suppose there is an issue over how
> source positions are determined; however, I do not see how an
> extension of the autoconversion facility would avoid the said issue.

We have not yet investigated alternative mechanisms for converting file
data. The first striking complexity is where to do the conversion. With
the source locations identified, other conversion approaches are
imaginable. Of course, converting on the fly poses some challenges, like
the one you mentioned.

Tom Honermann via llvm-dev

unread,

Jun 16, 2020, 10:53:53 AM6/16/20

to Kai Peter Nacke, Corentin, llvm...@lists.llvm.org

> -----Original Message-----
> From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Kai Peter Nacke
> via llvm-dev

> https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecenter/
> SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm__;!!A4F2R9G_pg!NKRnU
> eS37wLNWpYN6Yvhm9SzZwujyMlnpbFJyHV5Z8-M6-aucp0zxwXGxSZ7EKlr$

> ), together with file tagging and setting the CCSID for the program and for
> opened files.. The auto-conversion service supports round-trip conversion
> between EBCDIC and Enhanced ASCII. With it, boot strapping with EBCDIC
> source files is possible.
> Of course, more complete UTF-8 support is a valid implementation alternative.

Other good references:
- The 'ctag' utility
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.bpxa500/chtag.htm
- File tagging overview
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm

Kai, would use of auto conversion require that users set the _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment variables? Or do you envision having the clang driver set them before invocation of the compiler? If the latter, that would imply that users (and tests) are responsible for setting them for direct 'clang -cc1' invocations.

Here is another possible direction to consider that would provide a more portable facility. Clang has interfaces for overriding file contents with a memory buffer; see the overrideFileContents() overloads in SourceManager. It should be straight forward to, when loading a file, make a determination as to whether a conversion is needed (e.g., consider file tags, environment variables, command line options, etc...) and, if needed, transcode the file contents and register the resulting buffer as an override. This would be useful for implementation of -finput-charset and would benefit deployments in Microsoft environments that have source files in ISO-8859 encodings.

Tom.

Kai Peter Nacke via llvm-dev

unread,

Jun 16, 2020, 11:18:38 AM6/16/20

to Tom Honermann, llvm...@lists.llvm.org

Tom Honermann <Thomas.H...@synopsys.com> wrote on 16.06.2020
16:53:33:

Hi Tom,
the current approach is to enable auto conversion only if _BPX_AUTOCVT is
set to ON. If the variable is not set, then all input files are treated as
EBCDIC. The rational behind is that we do not want to outsmart the user.
So there is no problem with direct `clang -cc1` invocations. It's a good
hint that we need to describe this setup somewhere.

> Here is another possible direction to consider that would provide a
> more portable facility. Clang has interfaces for overriding file
> contents with a memory buffer; see the overrideFileContents()
> overloads in SourceManager. It should be straight forward to, when
> loading a file, make a determination as to whether a conversion is
> needed (e.g., consider file tags, environment variables, command
> line options, etc...) and, if needed, transcode the file contents
> and register the resulting buffer as an override. This would be
> useful for implementation of -finput-charset and would benefit
> deployments in Microsoft environments that have source files in
> ISO-8859 encodings.

That's a good hint. I'll definitely have a look at it, as it sounds that
it could solve some problems/complexity. A separate solution would then
still be required for LLVM.

> Tom.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

_______________________________________________

Tom Honermann via llvm-dev

unread,

Jun 16, 2020, 1:09:39 PM6/16/20

to Kai Peter Nacke, llvm...@lists.llvm.org

> -----Original Message-----
> From: Kai Peter Nacke <kai....@de.ibm.com>
> Sent: Tuesday, June 16, 2020 11:17 AM
> To: Tom Honermann <thon...@synopsys.com>
> Cc: Corentin <corenti...@gmail.com>; llvm...@lists.llvm.org
> Subject: RE: [llvm-dev] RFC: Adding support for the z/OS platform to LLVM and
> clang
>

> > https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> >
> r/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_zG_i
> 0QW
> > ZFauUVe6IKXYm6CeMjYXbWNyQ6SO-TOs$

> > com.ibm.zos.v2r3.bpxa500/chtag.htm
> > - File tagging overview
> >

> > https://urldefense.com/v3/__https://www.ibm.com/support/knowledgecente
> >
> r/en/SSLTBW_2.3.0/__;!!A4F2R9G_pg!KV1im4SvVFKKMIvutwguN6maqCZttB7_z
> G_i
> > 0QWZFauUVe6IKXYm6CeMjYXbWNyQ2CwjL08$

> > com.ibm.zos.v2r3.cbcpx01/cbc1p273.htm
> >
> > Kai, would use of auto conversion require that users set the
> > _BPXK_AUTOCVT, _BPXK_CCSIDS, and/or _BPXK_PCCSID environment
> > variables? Or do you envision having the clang driver set them before
> > invocation of the compiler? If the latter, that would imply that
> > users (and tests) are responsible for setting them for direct 'clang
> > -cc1' invocations.
>
> Hi Tom,
> the current approach is to enable auto conversion only if _BPX_AUTOCVT is set
> to ON. If the variable is not set, then all input files are treated as EBCDIC. The
> rational behind is that we do not want to outsmart the user.
> So there is no problem with direct `clang -cc1` invocations. It's a good hint that
> we need to describe this setup somewhere.

That seems reasonable. How would you handle _BPX_AUTOCVT being set to ALL?

(
For anyone following along, the difference between ON and ALL is described at https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcpx01/setenv.htm#setenv:
> When _BPXK_AUTOCVT is ON, automatic conversion can only take place between IBM-1047 and ISO8859-1 code sets. Other CCSID pairs are not supported for automatic text conversion. To request automatic conversion for any CCSID pairs that Unicode service supports, set _BPXK_AUTOCVT to ALL.
)

Tom.

Kai Peter Nacke via llvm-dev

unread,

Jun 17, 2020, 8:06:25 AM6/17/20

to Tom Honermann, llvm...@lists.llvm.org

Tom Honermann <Thomas.H...@synopsys.com> wrote on 16.06.2020

19:09:18:

> From: Tom Honermann <Thomas.H...@synopsys.com>
> To: Kai Peter Nacke <kai....@de.ibm.com>

That's a bit more complicated. For reading files, I can imagine the
following approach:
- the application is still using the ASCII execution mode (to link against
the ASCII version of the library)
- on each file handle, the program CCSID is set to UTF-8 (1208)
auto-conversion on the file is turned on if
- _BPX_AUTOCVT set to ALL
- file is untagged (assuming EBCDIC 1047) or file tag is not 1208
Writing text files would need a default encoding. Using UTF-8 (1208) would
makes sense.

This is really a "rough" first thought. I gave it a quick try, and it
failed. Most likely I overlooked something.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

_______________________________________________

Kai Peter Nacke via llvm-dev

unread,

Jun 18, 2020, 8:02:36 AM6/18/20

to Kai Peter Nacke, llvm...@lists.llvm.org

Thanks for the feedback!

As summary,
- some concerns about added complexity and testability were expressed
- details of the encoding conversion in the clang frontend were discussed

For these kinds of discussion, it's best to have a look at the source, so
I created the first Phabricator review: https://reviews.llvm.org/D82081

Thanks again!

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

"llvm-dev" <llvm-dev...@lists.llvm.org> wrote on 10.06.2020 21:11:05:

> https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=43FMMTMN1rMQYLfzcfWYI9JmFbjyCLLZVkpxUNJkDuQ&m=yMqyEWU-
>
Y3yQdOvsfp9HBkSwlDeLE5A9ZydkH_73SZk&s=ymE6YLFHNgGyUlKieWzvyvY5QV4ycTojjrma8Uz_rhs&e=

Reply all

Reply to author

Forward