new line in EBCDIC

14 views
Skip to the first unread message

Ze'ev Atlas

unread,
28 Dec 2022, 18:58:3128/12/2022
to PCRE2 Discussion List
Hi Phil
I began to look at the concept of newline in EBCDIC.  We do x'15' as the NL = NEL as codepage 1047 states:

PCRE2 version 10.42 2022-12-11                         
Compiled with                                          
  EBCDIC code support: LF is 0x15                      
  EBCDIC code page IBM1047 or similar                  
  8-bit support                                        
  No Unicode support                                   
  No just-in-time compiler support                     
  Default newline sequence is LF                       
  \R matches all Unicode newlines                      
  \C is supported                                      
  Internal link size = 2                               
  Parentheses nest limit = 250                         
  Default heap limit = 20000000 kibibytes              
  Default match limit = 10000000                       
  Default depth limit = 10000000                       
  pcre2test has neither libreadline nor libedit support

the test does recognize x'15' as $ bit not as \n... any idea?  the text contains x'15' immediately after the string 'newlines'

/(.*)$/            
    test newlines  
 0: test newlines  
 1: test newlines  
inside             
 0: inside         
 1: inside         
                   
/(.*)\n/           
    test newlines  
No match           
inside             
No match           

Thank you


Ze'ev Atlas


Ze'ev Atlas

unread,
28 Dec 2022, 21:57:2528/12/2022
to PCRE2 Discussion List
attaching my input and output files.
Notice that when I replaced x'15' with \n, the results where somewhat better.  Also note that the files are converted to ASCII, so you have to trust me about the X'15'

Ze'ev Atlas



--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pcre2-dev/13990021.2239362.1672271907003%40mail.yahoo.com.
TESTOTE1.txt
TESTINE1.txt

Philip Hazel

unread,
29 Dec 2022, 12:23:3929/12/2022
to Ze'ev Atlas, PCRE2 Discussion List
It should work, of course. Looking at the code, I can't immediately see anything wrong, but of course I cannot test in an EBCDIC environment. One thing you could do is to run the test with the -d option. On Linux, I get this:

/(.*)\n/
------------------------------------------------------------------
  0  15 Bra
  3   7 CBra 1
  8     Any*
 10   7 Ket
 13     \x0a
 15  15 Ket
 18     End
------------------------------------------------------------------

Note the explicit \x0a (ASCII newline) that is shown. In an EBCDIC world that should show as \x15. If it does not, we will have to find out why not. 

Regards,
Philip

Ze'ev Atlas

unread,
29 Dec 2022, 14:17:4229/12/2022
to Philip Hazel, PCRE2 Discussion List
I tried the -d, and I get the \x15, yet it does not recognize it correctly, but does when I have 'n instead, see attached.
So far I could not find why

Thank you

Ze'ev Atlas

--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
testote01.txt

Ze'ev Atlas

unread,
29 Dec 2022, 22:22:4129/12/2022
to Philip Hazel, PCRE2 Discussion List
I have to take my complaint back!

I ran the same thing on Windows (using pcre2test version PCRE 10.39) and I got the exact same results, except where the EBCDIC vs. ASCII came to show slightly different results.  My main issue is that the z/OS environment does NOT have newline character on regular basis, but when I force it, I get the same results.  To those who are interested, z/OS native files are separated by length.  Normally, text files consists of fixed length rows (usually 80 bytes long,) or variable length rows with header that define the record length (and thus limit it to maximum of 32K.)

I will explain it in my documentation and perhaps create some functionality to add newline character for those who must have it :)

Ze'ev Atlas



Philip Hazel

unread,
30 Dec 2022, 06:58:2930/12/2022
to Ze'ev Atlas, PCRE2 Discussion List
I'm not sure I entirely understand what you are saying about the Windows version. On Linux it behaves exactly as I would expect:

PCRE2 version 10.42 2022-12-11
/(.*)\n/
abcd
No match
abcd\n
 0: abcd\x0a
 1: abcd

Regards,
Philip

Ze'ev Atlas

unread,
30 Dec 2022, 07:02:1430/12/2022
to philip...@gmail.com, PCRE2 Discussion List
The Windows version behaves exactly as the Linux version.   I get exactly the same results in Windows and z/OS except of the EBCDIC  vs ASCII.  I WILL TRY TO EXPLAIN THIS PARTICULAR DIFFERENCE LATER.
Reply all
Reply to author
Forward
0 new messages