To EBCDIC platform developers

77 views
Skip to first unread message

Zoltan Herczeg

unread,
Sep 24, 2024, 5:45:23 AM9/24/24
to PCRE2 discussion list
Dear EBCDIC platform developers,

a lot of improvements has been landed for character classes support in
the PCRE2 codebase recently. Unfortunately it has not been tested on
EBCDIC platforms. Furthermore the 16/32 bit character support for
EBCDIC is missing. If you have some free time, please try the newest
code:

https://github.com/PCRE2Project/pcre2

I can also help to make it work on the 16/32 bit library, although I
can only make speculative fixes.

Regards,
Zoltan

Alan Lehotsky

unread,
Mar 16, 2025, 1:47:40 PMMar 16
to PCRE2 discussion list
First off - I'm upgraded from PCRE to PCRE2 - the new APIs are awesome.  I'm running on both Linux and z/OS - and have a problem with the
ebcdic environment.

So this is probably something I've done wrong - but it's not obvious to me...
My z/OS config.h has

#define BSR_ANYCRLF 1
#define EBCDIC 1
#define EBCDIC_NL25 1
#define NEWLINE_DEFAULT 5

But '\n' is not being handled as a line-break for an ebcdic string such as
     "foo\nbar\nfoo"

and an ebcdic pattern
    "(?m)^foo"

When I use the pcre2_substitute() api to try and replace the 'match' with "XXX", it only matches on the first, so I end up with
"XXX\nbar\nfoo"

If I use \r as the line delimiter, I get the expected result.   Prefixing the pattern with (*NL) doesn't help.  But (*ANY) does treat \n and \r as line delimiters.  So I am going to change NEWLINE_DEFAULT to 4

Ze'ev Atlas

unread,
Mar 16, 2025, 1:55:12 PMMar 16
to Alan Lehotsky, PCRE2 Discussion List
Hi Alan
Did you ever handled it correctly in the old PCRE?
I have created a fork for EBCDIC and maintained it until the current version.  I do not remember what was done with \n exactly, but may look for you, and advise, if you arebinterested.

Ze'ev Atlas


--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com
.
To view this discussion visit https://groups.google.com/d/msgid/pcre2-dev/3d4d12c1-95aa-4984-abb1-6ad29e19e599n%40googlegroups.com
.

Nicholas Wilson

unread,
Mar 17, 2025, 8:58:47 AMMar 17
to PCRE2 discussion list
Hi!

It's great to hear that you're using the EBCDIC support in PCRE2.

Firstly, your problem with '\n' is probably due to using EBCDIC_NL25. That's an unusual choice - normally NL is 0x15 in most EBCDIC codepages (the option to use 0x25 for NL is there for compatibility with less-used codepages). What EBCDIC are you using? Perhaps IBM-1047?

Setting NEWLINE_DEFAULT to 4 or 5 will both accept an NL character as a newline - you just need to make sure that PCRE2's encoding of "NL" matches what's in the data you're processing.

Secondly: PCRE2's EBCDIC support was really quite buggy until recently! Beware!

All releases up to and including 10.45 have quite a few known bugs.

The good news is: the nightly builds ("10.46-DEV" prerelease) have fixed these bugs, and the test suite is now fully-passing on EBCDIC.

To download a nightly build, you can try out the "Distribution release" link at the bottom of this page: https://github.com/PCRE2Project/pcre2/actions/runs/13845780047

The EBCDIC fixes in the latest code were due to help from very kind users in the "z/OS enthusiasts" discord channel.

I have also gained SSH access to an z/OS VM on real Z hardware, and we are using this now for nightly testing, to verify that the code builds and that the test suite passes.

I have not yet updated the build instructions for z/OS, since the improved support is brand new and hasn't been shipped in an official release yet.

This is how I build on z/OS, using "gtar" and "gmake" from the zopen toolset, and using the XLC compiler:

            gtar xzf pcre2-build.tar.gz -C pcre2-build;
            cd pcre2-build;
            chtag -R -tc ISO8859-1 .;
            MAKE=gmake CC=xlc ./configure --enable-ebcdic --disable-unicode;
            gmake;
            gmake check

Ze'ev (who has also replied above) maintains an alternative way of supporting z/OS, but I have not tested it personally.

I hope this helps,

Nick

Nicholas Wilson

unread,
Mar 17, 2025, 9:28:48 AMMar 17
to PCRE2 discussion list
Additionally, I should add that future releases of PCRE2 (including 10.46-DEV nightly builds) support EBCDIC on Linux & Windows.

This was only added for testing purposes, so that we can maintain and test the EBCDIC support without needing access to an expensive z/OS system.

However, some users might (perhaps?) want to process EBCDIC data on z/Linux, in which case this support may be of interest.

Nick

Alan Lehotsky

unread,
Mar 17, 2025, 12:37:33 PMMar 17
to Ze'ev Atlas, PCRE2 Discussion List
We're set on the code page - we've used PCRE for over 15 years on both ascii/utf8 and ebcdic environments.

As to the \n matching; this was a test case that's been around for most of that time and was working until we upgraded to PCRE2.

I was thinking maybe that the fact that we have the EBCDIC_NL25 ends up confusing the #defin NEWLINE_DEFAULT 5 setting. I'm rebuilding with a default of 4, which should work correctly.

I shared this on the off chance that someone else has seen it or might benefit from the workaround of using the ALL default - although it's unlikely that it's a very common failure mode.

Thanks for your efforts in maintaining the ebcdic branch...
-- Al

________________________________________
From: Ze'ev Atlas <zat...@yahoo.com>
Sent: Sunday, March 16, 2025 2:57 PM
To: Alan Lehotsky; PCRE2 Discussion List
Subject: Re: [pcre2-dev] Re: To EBCDIC platform developers

Oh, and one more issue, you have to e careful about the codepage that you are using.

The geniuses in IBM switched specifically between the Caret ^ and ¬ Logical Not (005E and 00AE) which is crucial when using Regular Expressions

Ze'ev Atlas
201-801-0378
201-805-0286 (cell)
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com<mailto:pcre2-dev+...@googlegroups.com>
.
To view this discussion visit https://groups.google.com/d/msgid/pcre2-dev/3d4d12c1-95aa-4984-abb1-6ad29e19e599n%40googlegroups.com<https://groups.google.com/d/msgid/pcre2-dev/3d4d12c1-95aa-4984-abb1-6ad29e19e599n%40googlegroups.com?utm_medium=email&utm_source=footer>
.

--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com<mailto:pcre2-dev+...@googlegroups.com>.
To view this discussion visit
https://groups.google.com/d/msgid/pcre2-dev/1898308351.2526829.1742147704701%40mail.yahoo.com<https://groups.google.com/d/msgid/pcre2-dev/1898308351.2526829.1742147704701%40mail.yahoo.com?utm_medium=email&utm_source=footer>
.


NOTICE from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.

Ze'ev Atlas

unread,
Mar 17, 2025, 12:37:38 PMMar 17
to Alan Lehotsky, PCRE2 Discussion List
Oh, and one more issue, you have to e careful about the codepage that you are using.

The geniuses in IBM switched specifically between the Caret and ¬ Logical Not (005E and 00AE) which is crucial when using Regular Expressions

Ze'ev Atlas
201-801-0378
201-805-0286 (cell)


Ze'ev Atlas

unread,
Mar 17, 2025, 12:37:45 PMMar 17
to Alan Lehotsky, PCRE2 Discussion List, Igor Todorovski
You may also contact igor, copied here, as he is maintaing a more modern, and presumably working, version.

Ze'ev Atlas
201-801-0378
201-805-0286 (cell)

Nicholas Wilson

unread,
Mar 17, 2025, 12:41:43 PMMar 17
to PCRE2 discussion list
>  I was thinking maybe that the fact that we have the EBCDIC_NL25 ends up confusing the #defin NEWLINE_DEFAULT 5 setting. I'm rebuilding with a default of 4, which should work correctly.

That shouldn't be the case (unless Ze'ev has fiddled with the code in his fork).

In the stock PCRE2, which supports EBCDIC and z/OS, both NEWLINE_DEFAULT of 4 or 5 will match the same bytes for the NL character.

You should set EBCDIC_NL25 depending on whether the NL character has byte 0x15 or byte 0x25 in your EBCDIC codepage. Apart from that, PCRE2 supports any/all EBCDIC codepages at compile-time (whichever is being used by the compiler). But at runtime, it supports *ONE* single codepage, namely, the one that was in use by the compiler at build-time.

Note that Igor's version of PCRE2 (via zopen) only supports ASCII/ISO-8859-1/Unicode, not EBCDIC.

Nick

Ze'ev Atlas

unread,
Mar 22, 2025, 10:57:30 PMMar 22
to PCRE2 discussion list, Nicholas Wilson
Hi all
a few clarifications
1. My 'fork' is a subset, it meant only for 8 bits and in the environment of classic z/OS and to be used with LE languages, mainly COBOL.  As such, it is compiled using the IBM C and NOT with gcc.
2. I started with code page 037 and at that one, NL was definitely x15
3. I am debating whether I should continue maintaining this less than popular fork, but if I will, I will also switch to NL25
4. I never fiddled with the code itself.  Phil Hazel was very kind and provided me with a very few hooks, mainly in DFTABLES, GREP and TEST.  The hooks are triggered by the NATIVE_ZOS macro, and allow the code to #include some stuff (not available outside of my 'fork').  In that way we kept my code, recognized by the main code, but totally external in the same time.  
5. Historically, I almost single handedly made Phil to handle some oddities of EBCDIC, after I asked for an additional hook.
6. The most important developments in my 'fork' besides providing LE languages with access to PCRE2, are
6.1 An interface to TSO/Rexx
6.2 allowing GREP to look into PDS library as if it as a directory

Ze'ev Atlas



Nicholas Wilson

unread,
Mar 24, 2025, 7:46:06 AMMar 24
to PCRE2 discussion list
Thank you Ze'ev, and Alan.

1. I haven't tested with GCC on z/OS either, only with IBM's xlc compiler. I don't know how common it is for users to use alternative compilers? I only have time and energy to maintain support for one compiler I think.
I'm not exactly sure what "classic z/OS", but I have access to a standard IBM base system of z/OS, which is fully UNIX-compliant (POSIX/SUSv3). It runs a Bourne /bin/sh (not Bash though), it runs the Autoconf script ./configure very nicely without needing any GNU tools installed, and it can compile and run the official PCRE2 tarball very nicely just with IBM-provided closed-source tools, including IBM's xlc compiler.

3. Yes, in the IBM 037 codepage NL is 0x15. It is also 0x15 in the (popular?) IBM 1047 codepage, which is the default on the system I was given access to. I don't think 0x25 should be the default - unless you have evidence.
The question from Alan Lehotsky was about using 0x25. Alan never said what his codepage was, but he reported that "\n is not being handled as a line break" when using "#define  EBCDIC_NL25 1". So whatever his codepage is, he wants NL to be 0x15 I think (the PCRE2 default), and he'd mistakenly set it to 0x25. I think PCRE2's default is right, and Alan had changed from the default and (unsurprisingly) this didn't work.

4. Ze'ev - I'm really curious. Did you ever run the PCRE2 test suite for your fork? There are several quite critical bugs in the EBCDIC support for PCRE2 10.45, which I discovered after the 10.45 release, while doing testing. I strongly suspect that previous releases are very buggy too, since the PCRE2 maintainers have never done any EBCDIC testing themselves.
I have a suspicion that perhaps, no-one has ever run the test suite on an EBCDIC machine? Or at least, not on any recent release.
Could it be that when you did your PCRE2 10.44 release on CBT Tape, that you didn't run the test suite?
I fixed about 6 EBCDIC bugs, three of which I would classify as "major errors in pattern matching".

I basically can't recommend that anyone use PCRE2 version 10.45 (or 10.44 and earlier) with EBCDIC. The nightly releases of 10.46-DEV however are fixed, and the full test suite now passes in EBCIC mode on real z/OS hardware.

Thank you Ze'ev for offering your port for so many years. But, you're not under any obligation to support it any longer than you want to; it's not something we would request you to do.

> 4. I never fiddled with the code itself.  Phil Hazel was very kind and provided me with a very few hooks, mainly in DFTABLES, GREP and TEST.  The hooks are triggered by the NATIVE_ZOS macro, and allow the code to #include some stuff (not available outside of my 'fork').  In that way we kept my code, recognized by the main code, but totally external in the same time.

Now that PCRE2 is in Git, not Subversion, it is much easier to maintain forks of the code. I am considering removing those hooks from the upstream code. Those NATIVE_ZOS hooks could be maintained in the fork, in the same place as the pcrzoscs.h header itself, using the same "git merge" that you (presumably) use to sync with PCRE2 releases. Would you prefer to keep the NATIVE_ZOS hooks in the upstream code, or would be happy to keep those in your fork - especially as they can't even compile unless used as part of your fork!

All the best,
Nick

Alan Lehotsky

unread,
Mar 24, 2025, 9:51:30 AMMar 24
to Nicholas Wilson, PCRE2 discussion list
Re NL as \x25 - that's a typo on my part - we use \x15 aka \o25... and the IBM 1047 codepage as the default ebcdic charset.

________________________________________
From: pcre...@googlegroups.com <pcre...@googlegroups.com> on behalf of Nicholas Wilson <nich...@nickcwilson.co.uk>
Sent: Monday, March 24, 2025 7:46 AM
To: PCRE2 discussion list
Subject: Re: [pcre2-dev] Re: To EBCDIC platform developers

--
You received this message because you are subscribed to a topic in the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pcre2-dev/cfgsSuj8vQo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pcre2-dev+...@googlegroups.com<mailto:pcre2-dev+...@googlegroups.com>.
To view this discussion visit https://groups.google.com/d/msgid/pcre2-dev/1194c2b0-2d0d-48f1-a390-96b7c465facfn%40googlegroups.com<https://groups.google.com/d/msgid/pcre2-dev/1194c2b0-2d0d-48f1-a390-96b7c465facfn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Nicholas Wilson

unread,
Mar 24, 2025, 10:02:04 AMMar 24
to PCRE2 discussion list
Right - but your original post said you were compiling with "#define EBCDIC_NL25 1", which means, "use NL as 0x25 = TRUE".

My strong guess is that's why you were encountering your original problem with "\n" in patterns not matching an 0x15 NL.

I hope this helps.

Best wishes,
Nick

Philip Hazel

unread,
Mar 24, 2025, 11:02:00 AMMar 24
to Nicholas Wilson, PCRE2 discussion list
I suspect that by "classic z/OS" Ze'ev means the basic OS without the Unix compliant features, that is, the direct descendant of the MVT and MVS operating systems we were running on IBM mainframes here in Cambridge in the 70's and 80's and into the 90's. Ze'ev mentions PDS ("partitioned dataset"), which is a feature of these OS. It is a way of packing a number of small files ("datasets" in IBM-speak) into one larger one, invented because IBM files could not be smaller than one disk track, and newer drives had bigger and bigger tracks.

Regards,
Philip

--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.

Ze'ev Atlas

unread,
Mar 24, 2025, 1:50:12 PMMar 24
to Nicholas Wilson, Philip Hazel, PCRE2 discussion list
Thank you Phil
Phil is correct, and I intended to answer later in detail. 
That bring me to a request!  It really won't be easier for me to maintain a fork on gitlab, as my development is NOT a fork in the common meaning.  I download the final version and convert it to z/OS.

Therefore, I ask you to please, do not take away my hooks, and please leave them as is.

Thank You

Ze'ev Atlas



Nicholas Wilson

unread,
Mar 25, 2025, 1:18:30 PMMar 25
to PCRE2 discussion list
Hi Ze'ev,

Could you not click the "fork" button in the GitHub UI, and grab your PCRE2 tarball from your fork? Then you could continue with your current workflow unchanged, by simply building your tarball with whatever z/OS hooks you require.

Nick

Ze'ev Atlas

unread,
Mar 25, 2025, 3:31:11 PMMar 25
to PCRE2 discussion list, Nicholas Wilson
Let me please explain again in detail:

Please read my lips: I do not have a fork, I do not maintain anything in PCRE itself!  I take the final copy, tested and released to the public of the latest release and use that as a basis for my new parallel release, without looking back at all into my previous release.

Therefore, if you take away my hooks, you will force me to get your new release, compare it to the previous, re-instate my hooks (hopefully without damaging anything) and than start.  Phil has done these hooks in order to keep my version totally separate and  prevent any cross pollination, and to allow me to use the final copy, tested and released to the public of the latest release and use that as a basis for my new parallel release, as mentioned before.

Remember, classic z/OS does not look like Linux or Windows.  It is a beast unto itself.  Did I ever mention for example, that there is actually nothing like newline built into that system (well, there is as afterthought, but the main languages like COBOL are not even aware about it?! :)

Please keep my hooks, and thus keep our products totally separate as they should

Ze'ev Atlas
201-801-0378
201-805-0286 (cell)

--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
To view this discussion visit

Nicholas Wilson

unread,
Mar 25, 2025, 4:20:27 PMMar 25
to PCRE2 discussion list
Hi Ze'ev,

I am listening carefully. I think I understand your workflow well as you are describing it.

You are currently doing "something" to build your release from the public tarball, including adding in your "pcrzosfs.h" header. This header is currently referenced in our upstream PCRE2 code - although we don't distribute it, and your hooks won't compile without your own build scripts.

I am suggesting that instead of taking our public tarball, you build your own tarball from the same git checkout, using "git merge" to keep your hooks in sync. This would let you use your current workflow without any substantial change.

But very importantly, I'd like to ask again: have you actually run the tests in your EBCDIC environment? Did the tests pass before you made your 10.44 release? I believe that the EBCDIC code has been broken in quite serious ways for quite a long time. This is why I was recommending to Alan that he *not* use 10.44 or 10.45 or prior releases on EBCDIC platforms without confirming this.

Nick

Ze'ev Atlas

unread,
Mar 25, 2025, 5:45:50 PMMar 25
to PCRE2 discussion list, Nicholas Wilson
I will answer fully.

1. You try to teach an old dog, some new tricks, like using git and merge.  Oh well, I guess that I will have to do that if you insist.  Do I have to install something on my Windows machine?

2. The "something" that I am doing is running through a series of Perl scripts on Windows and Rexx scripts via JCL jobs on the z/OS.  Those scripts transform the C programs mechanically (without really changing them) to something that compile on z/OS.  From that compilation/link process, I produce a 'link time' dependency list, and recompile/link cleanly while taking care of the dependencies.  (I cannot tolerate circular dependencies!  So whenever they do occur, I ask the maintainer nicely, to please eradicate them :)

3. I've developed a subset of tests that must work.  I had attributed failures to the differences between EBCDIC and the rest of the world, and obviously I ignore tests that have to do with newline and other things that are foreign to z/OS.  I am looking forward to 10.46 that may reduce the amount of tests that I ignore.  But there will always be differences because of EBCDIC being different than the rest of the world.

Ze'ev Atlas

--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
To view this discussion visit

Philip Hazel

unread,
Mar 26, 2025, 5:27:45 AMMar 26
to PCRE2 discussion list
Forgive me for jumping in again, but it seems to me as if there's still some misunderstanding here. The world Ze'ev inhabits is very alien compared with any Unix or even Windows. I know because I spent a couple of decades in that world. It's not only EBCDIC vs ASCII but many other concepts aren't the same (files are not streams of bytes, for example; they come in records, and in various different formats). If I recall correctly, it took Ze'ev many hours/days of work to figure out how to get PCRE to build and run under native z/OS because its compiling/linking systems are also different. I was somewhat impressed when Ze'ev managed to get it all to work. Given that there are already some Windows and VMS special bits of code in pcre2grep and pcre2test, it seems perfectly sensible to have some z/OS bits as well. This means that Ze'ev doesn't have to learn how to use Git and GitHub (something I, as a latecomer and "old dog", am still not very comfortable with). So this is a plea for retaining the status quo and not complicating Ze'ev's life.

Philip 

Nicholas Wilson

unread,
Mar 26, 2025, 6:32:27 AMMar 26
to PCRE2 discussion list
Thank you Ze'ev and Philip.

I understand - I was merely asking whether Ze'ev would be able to use Git to manage his "hooks".

I can hear that's not straightforward for him, so I'm happy to keep the hooks. I had already decided that before I saw Philip's message.

I didn't mean to cause offense, and I'm glad I was able to learn about your porting process. Thank you for your patience Ze'ev!

Ze'ev's port is indeed impressive - a z/OS user on the IBM z/OS Discord channel commented that it seemed of good quality and contained many useful bindings.

IBM does now distribute their own build of PCRE2 free-of-charge and free of restrictions as part of the z/OS base system (not in one of their numerous add-ons, which are all exorbitantly priced). I have corresponded with the IBM employees maintaining this. I suspect in future most z/OS users of PCRE2 will switch to using that, or else download the PCRE2 tarball and build it themselves, now that the Autoconf script has been updated to work on z/OS.

This doesn't in any way diminish the work that Ze'ev has done. These alternatives simply weren't available when Ze'ev began his port.

Thank you Ze'ev for explaining your testing process. That is very reassuring. I will find details of the EBCDIC bugs I have fixed since the 10.45 release was finalized.

All the best,
Nick

Nicholas Wilson

unread,
Mar 26, 2025, 7:50:50 AMMar 26
to PCRE2 discussion list
For the record, here is a list of the EBCDIC bugs that will be fixed in 10.46:
  • Backslash escape issues: \» treated as an escape for '}', and others. The backslash handling hardcoded some codepage assumptions, which are wrong for both IBM 037 and IBM 1047 (and others). It is now codepage-agnostic (correct for any EBCDIC page).
  • Bug in backslash escape handling in pcre2_substitute (fairly minor impact)
  • Completely wrong matching of [[:alpha:]], [[:alnum:]], and [[:space:]] POSIX classes. The ASCII code values of "_" and vertical whitesace were accidentally hardcoded. Somewhat serious matching error: this pattern syntax basically simply didn't match the correct characters on EBCDIC.
  • pcre2_convert() function: seriously broken on EBCDIC. Only values < 128 whitelisted, but in EBCDIC letters & digits are > 128. Minor as this function is rarely-used.
  • \h at start of pattern broken: value 0xA0 hardcoded in the start-of-pattern optimisation (NBSB in ASCII, µ in EBCDIC), so it would not match the correct characters.
  • Various other even more minor issues
To be honest, it's not a terrible list of bugs, especially given that we were not (previously) able to run the test suite on EBCDIC builds prior to release.

All of these were test failures in the PCRE2 test suite. I have done no additional testing myself, beyond fixing bugs until the test suite passes.

Nick

Reply all
Reply to author
Forward
0 new messages