Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Blanks, REXX, and portability...

433 views
Skip to first unread message

Scott Ophof

unread,
Aug 25, 1992, 11:21:29 PM8/25/92
to
Question: When are blanks not only the space character?
Answer: In any case in Unix.

In "The REXX Language" (Mike Cowlishaw) and most other REXX
publications meant for CMS and equivs, the character known as
"space" and the concept of "blanks" are used interchangeably.

For CMS-REXX (and analogous implementations), this poses no
problems. The space character is the *only* character defined
as a "blank".
(Note that I'm *not* talking about non-printable characters,
and the space char *is* a printable character!)

Under Unix however, the tab character (and some others) are
considered "blanks", though it's called "whitespace" there.
At least some REXX implementations for Unix recognize more than
the space char as whitespace. And REXX on the PC recognizes at
least the space char and the tab char as whitespace/blank...

My point?
I would hate to have to port to CMS any REXX program written for
Unix (or PC); to have a program fail due to something like this
would not be very easy to debug...

My suggestion?
In the interest of increasing the chance of successful porting, to
request the ANSI-REXX committee to define that the *only* blank/
whitespace recognized in standard REXX is the SPACE character (ASCII
hex-20, EBCDIC hex-40).

Your comments? :-)


Regards.
$$/

Anders Christensen

unread,
Aug 26, 1992, 4:56:20 AM8/26/92
to

> Question: When are blanks not only the space character?
> Answer: In any case in Unix.

Answer is incorrect .... Correct answer (for any ASCII-based system):

In some cases, blanks are tabs and in some cases blanks are spaces, and
it is sometimes hard to predict it in advance, and it might differ
from machine to machine, and from login-session to login-session.

> In "The REXX Language" (Mike Cowlishaw) and most other REXX
> publications meant for CMS and equivs, the character known as
> "space" and the concept of "blanks" are used interchangeably.

TRL always uses the terms 'blank' and 'blanks', I believe?

> For CMS-REXX (and analogous implementations), this poses no
> problems. The space character is the *only* character defined
> as a "blank".
> (Note that I'm *not* talking about non-printable characters,
> and the space char *is* a printable character!)

Actually, this is rather a EBCDIC vs ASCII conflict, rather than a
CMS vs the-rest-of-the-world (is there is difference, btw? :-)

In ASCII, the following characters are often considered 'whitespace',
listed in decreasing order of 'whitespaceness' (codes in decimal)

ascii 32 - space
ascii 9 - HT (horizontal tab)
ascii 10 - LF (line feed)
ascii 13 - CR (carriage return)
ascii 12 - NP (new page, or FF - formfeed)
ascii 11 - VT (vertical tab)

There might be even more. And worse, in some modes, I think characters
above 128 are space characters, like hard-space (a space that can not
be divided between lines). In particular the HT is considered
whitespace, since it conceptually a number of compressed space
characters (customarily 2-8).

> Under Unix however, the tab character (and some others) are
> considered "blanks", though it's called "whitespace" there.
> At least some REXX implementations for Unix recognize more than
> the space char as whitespace. And REXX on the PC recognizes at
> least the space char and the tab char as whitespace/blank...

Well ... try this if you are using Unix:

who | od -a

Now, are all whitespace spaces, or is there any tabs (ht) mixed into
the output? *Many* of the Unix commands use tab and other whitespace
characters in the output. Reason: to save characters. If you are using
a 300 baud modem line, compressing 8 spaces to a tab is a Major
Advancement of Civilization.

> My point?
> I would hate to have to port to CMS any REXX program written for
> Unix (or PC); to have a program fail due to something like this
> would not be very easy to debug...

I think you would hate even more to port a rexx program from one unix
machine to another unix machine; and have the program fail due to one
of the machines being more intelligent about compressing spaces to
tabs. And it is far more probable that you would do that, than porting
between CMS and Unix! And, this is not just a Unix problem; it is more
or less a ASCII problem.

If the ANSI REXX committee requires that blank have one specific
character code within each character set, then IMHO the committee has
made Rexx harder to move *from* EBCDIC (i.e. IBM mainframes), not
eased the spreading of Rexx to the rest of the computing community.
Most machines use multiple characters as blanks, and Rexx should not
be limited to just those machine which have One Blank character.

By the way ... I really can't see the problem? Unix generates tabs
and spaces as whitespace, the Unix rexx interpreter interprets boths as
blanks, No problem!

You port it to CMS: CMS generates spaces as whitespace, the CMS
interpreter interprets only space as blanks. No problem.

You ftp a file from Unix to CMS. You don't use binary mode, since the
ASCII code would be unreadable on a EBCDIC machine anyway. So you use
text mode, and your CMS machine translates the text to EBCDIC,
including translating the tabs to spaces. No problem.

You ftp a file from CMS to Unix, by the same reason as above, you use
text mode, all the CMS spaces becomes Unix spaces. No problem.

Your Unix code contains parse patterns like '09'x to match a tab, you
take your program over to CMS. But if you make assumptions about the
glyphs of the characters, you're in trouble anyway. In fact, if your
Unix Rexx interpreter interpreted tab as a whitespace character, you
probably wouldn't have had to parse on the '09'x pattern in the first
place. No problem.

Where is the problem, Scott?

(Oops, I seem to have assumed that all Unix machines are ASCII, which
is probably not correct; interpret "CMS" as EBCDIC-based, and Unix as
ASCII-based in the list above, that is more what I meant.)

> My suggestion?
> In the interest of increasing the chance of successful porting, to
> request the ANSI-REXX committee to define that the *only* blank/
> whitespace recognized in standard REXX is the SPACE character (ASCII
> hex-20, EBCDIC hex-40).

And there is indeed also a very good chance that this suggestion (if
accepted) will make any rexx interpreter for Unix rather useless, or
at best, just redline the astonishing factor (see end of posting for
an example).

Instead, perhaps the ANSI REXX committee should look over the shoulder
of the ANSI C committee, and how they solved the problem of
whitespace, and their definition of the isspace() function.

If the ANSI REXX committee determines that one particular character in
ASCII and one particular character in EBCDIC is to be considered the
Only True Blank, it might even have consequences for using Rexx with
national characters (locales) (yes, in some systems, blanks may even
depend on the language chosen!).

Section one of TRL states that Rexx "... involve the use of two
character sets." One used for the rexx script (source code), and one
used by the interpret under execution (data). If you want to define
that a rexx script may only include certain characters as blanks under
parsing of the source code (except in quotes and comments), that's
fine. However, if you suggests that only certain characters should be
recognized by the interpreter as blanks under execution (as data), then
I fear Real Trouble.

I also want to question whether appointing specific character codes as
the Only True Blank, is in the spirit of TRL. To quote section 1:
"[...] this book uses characters to convey meaning and not to imply a
specific character code [...] At no time is REXX concerned with the
glyph (actual appearance) of a character."

The way I read the, a Blank is the common character used as a Blank in
the operating system that you are running. Consequently, if your
operating system has more than one blank character, all these are
interpreted as blanks. However, as a default to pad characters in the
builtin functions, it would be appropriate to require that one single
character is used consistently. But I still think it should be beyond
the definition of Rexx as a language to specify which character that
is in the various character sets.

> Your comments? :-)

Please, by all means, standardize what a blank is, but *please*, don't
standardize it in such a way, that it makes it impossible to use the a
true standard Rexx interpreter on some platforms.

My suggestion?
In the interest of increasing the chance of successful porting of the
Rexx standard itself from EBCDIC to ASCII based systems, to request
that the ANSI-REXX committee explicitly allows the common whitespace
characters of the host operating system to be interpreted as 'blank'
characters, and that the definition of exactly what is blank, is
implementation-dependent and system-dependent.

-anders

So, the example I promised. The output from the command 'who' on
this machine (ultrix) currently is:

> jorgens ttyp0 Aug 24 10:51 (129.241.27.23:0.)
> anders ttyp3 Aug 26 09:05 (129.241.36.3:0.0)

Suppose the address syntax made it possible to push something on the
stack, then the following Rexx program ought to work:

address unix to queue 'who'
do queued()
parse pull user . . . time node
say user 'is logged in from' node 'at time' time'!'
end

Simple isn't it? It is just that is only works on some machines, and
even then only sometimes. Why would such a program write out:

jorgens is logged in from at time 10:51 (129.241.27.23:0.)!
anders is logged in from at time 09:05 (129.241.36.3:0.0)!

The answer is, there is at tab between the time and the hostname. This
rexx script will work, dependent on the machine type, the terminal
type, the mood of your system operator and a lot of other conditions.
Prohibiting the tab as a space did not *help* the user in this
situation, in fact it will only confuse.

Dave Gomberg

unread,
Aug 26, 1992, 11:31:32 AM8/26/92
to
The notion of using tabs to represent multiple blanks dates back to the
IBM2741. Now if there is a truely obselete device on this planet, that
is it. But U**X continues to try to perpetrate the notion that this is
a good idea.

Please don't be confused by the issue of disk file compression. Certainly
nobody wants to fill a disk with long strings of blanks, and on the disk,
compression (with tabs or whatever scheme) is useful. But to have codes
other that '20'x in running text when a blank is meant, to me is obselete,
and worse yet, confusing and stupid. There is no NEED for lots of different
blank characters. That C recognized them shows the difference between C
and REXX.

Barry Puryear (SSS/VM)

unread,
Aug 26, 1992, 10:44:29 AM8/26/92
to
The question of "what is a blank" is deeper than the definition of REXX.
Several years ago, I had the duty to port some Fortran source from a UNIX
system to CMS. As it happened, the Fortran source was loaded with tab
characters. The method we used to transfer those files did not convert
the tabs to spaces, but to x'05', as I recall. Of course, those source
files would not compile on the first try. I had to convert the tabs to
the correct number of blanks via XEDIT and compile again.

I think that the best way to resolve the "what does REXX call a blank"
question is for CMS (and other environments that use EBCDIC) to start
treating tab characters more reasonably, rather than updating language
definitions and changing individual language processors.

Jerry Campbell

unread,
Aug 26, 1992, 11:55:01 AM8/26/92
to
In article AA0...@SERVER.uwindsor.ca, Scott Ophof <op...@SERVER.UWINDSOR.CA> () writes:
>Question: When are blanks not only the space character?
>Answer: In any case in Unix.
>
> ...... stuff removed .....

>
>My point?
>I would hate to have to port to CMS any REXX program written for
>Unix (or PC); to have a program fail due to something like this
>would not be very easy to debug...
>
>My suggestion?
>In the interest of increasing the chance of successful porting, to
>request the ANSI-REXX committee to define that the *only* blank/
>whitespace recognized in standard REXX is the SPACE character (ASCII
>hex-20, EBCDIC hex-40).
>
>Your comments? :-)
>
>
>Regards.
>$$/

Just an observation from someone whose done a LOT of Rexx programming
under CMS and not a whole lot under Unix. The commonly accepted "style"
(at least in my shop) for constructing data streams to feed to Rexx or
for interpreting system information (msgs, cmd output, what not) many
times leads to code such as this:

Parse Var sometin 1 parm1 +4 5 parm2 +4 ....

All very byte position oriented. The programmer expects to intepret things
in terms of "card columns". Many programs are written with that kind of
dependency on CMS. Unix seems to require a different conceptualization
of data streams.

---
Jerry Campbell reply to: zjl...@hou.amoco.com
Amoco Corp. ISD SSS/Graphics
Houston, Tx. 713/556-7036

Otto Stolz

unread,
Aug 26, 1992, 2:41:09 AM8/26/92
to
On Wed, 26 Aug 92 08:31:32 PDT Dave Gomberg said:
> [...] There is no NEED for lots of different blank characters.

May I humbly object:

In June, ISO DIS 10646-1 "Universal Character Code" was approved as an
international standard, which will presumably be published in early
1993. Major vendors have expressed their intention to implement this
standard, which is meant as a possible replacement for ASCII (and the
other national variants of ISO 646-1983 "ISO 7-bit Character Set for
Information Interchange"), the ISO 8859 series ""8-bit single-byte coded
character sets", and a lore of other standard or proprietary character
codes. This standard will comprise (besides, of course, the good old
ASCII tabs) the following space vaiants:
Space
Non-breaking Space
Ideographic Space
n-Space
m-Space
3-per-m-space
4-per-m-space
6-per-m-space
figure space
punctuation space
thin space
hair space
zero-width space
Obviously, there *are* people seeing a need ... :-)

Apparently, the trend goes towards more sophistication in word processing
and finer control over printing devices. Another trend is towards
reconciling electronic data processing and typesetting. Generally
speaking, our programming languages must cease to presuppose the
(obsolete!) typewriter-based notion of "a character is a byte is a
writing position". I guess you are *not* advocating that REXX should
become extinct in the forthcoming EDP world :-)

REXX can cope well with any character code, if it is defined and imple-
mented consistently. Regarding white space, this would involve
1. that the term "white space" (or an equivalent one) be defined in
the forthcoming standard in a code-independend way,
2. that all language features recognising words, or depending in other
ways on the meaning of white space, be identified, and that the
standard required consistent implementation of these features, e.g.
- tokenizing of the REXX source program, and of the the INTERPRET
statement's operand (cf. note 1, below),
- variables, and dots, in parsing templates,
- WORD, WORDINDEX, WORDPOS, SUBWORD, and WORDS, functions,
- STRIP (cf. note 2), and SPACE, functions,
- weak character comparison (so-called "normal" comparative operators
applied to non-decimal operands),
- padding default in COMPARE funktion (cf. note 2),
- DATATYPE function,
- white space in operands of numeric operations (including functions)
(Warning: This list may not be exhaustive);
3. that the term "blank character" (or an equvalent one) be defined in
the standard as one definite character belonging to the constituents
of white space (cf. item 1, supra);
4. that all language features generating white space (including the
defaults for pad characters) be identified, and that the standard
required consistent implementation of these, viz. generation of
blank characters, e.g. by
- concatenating terms with one blank in between (expressed by
white space in lieu of an operator),
- default padding character in SPACE, CENTER, LEFT, RIGHT, INSERT,
OVERLAY, SPACE, SUBSTR, and TRANSLATE, functions,
- padding of the shorter operand in weak character comparisons,
- the FORMAT function
(Warning: This list may not be exhaustive);
5. that standard-conforming implementations be required to implement
the recognition of white space in a way conforming
- to all possible sources for REXX source programs,
- to all possible sources for input to REXX programs
(cf. note 1);
6. that standard-conforming implementations be required to implement
the blank character in a way conforming
- to all possible environments REXX programs may address,
- to all possible sinks for output from REXX programs
(cf. note 1).

In a nutshell: REXX language features should be as permissive as
possible when accepting white space, and as predictable
as possible when generating it. To achieve this goal, the
standard should replace the notions of "blank", "blanks"
or "blank characters" whith "white space" whenever
characters are inspected, and replace them with "blank
character" or "blank characters" whenever characters are
generated.

Note 1: Another recent contribution to REXXLIST stated that REXX source
code, and REXX operands, might be represented in different
character codes (perhaps including different notions of white
space). A cursory scan through TRL did not reveal any support
for this statemnt.

REXX source code and REXX operands are tightly coupled (actually
tighter than in any other programming language I am aware of) by
several language features, e.g.
- literal strings,
- INTERPRET statement,
- SYMBOL, and VALUE, functions
- SOURCELINE function,
- VALUE sub-keywords of ADDRESS, SIGNAL, and TRACE statements,
- ADDRESS, and TRACE, functions.

To me, this tight coupling suggests that source program and
operands should ideally be expressed in the same character
code. If the standard does not require this, it must give
precise, and simple, rules how every single of these language
features shall handle the discrepancies -- while trying to
minimize the astonishing factor.

For less consistent systems, the REXX implementation will somehow
have to level out the irregularities (regarding character codes,
particularly white space). The standard may choose to provide
suitable OPTIONS operands to assist in this regard.

Note 2. By default, the STRIP function should remove leading and/or
trailing white space (rather than blank characters). By default,
the COMPARE function should ignore white space in the excessive
part of the longer operand (in other words: when no pad character
is specified, COMPARE should return 0, iff the longer operand
consists of an exact copy of the shorter operand, followed by any
amount of white space, and it should return the position of the
1st non-white character in the excessive part of the longer
operand, iff the latter consists of an exact copy of the shorter
operand followed by anything but white space).

Note that these defaults cannot explicetly be specified via the
currently valid interface. This idiosyncracy could be removed
by an additional (yet minor, and upwards-compatible) language
extension:
- For the STRIP function, the standard could allow an arbitrary
string rather than a single character as its 3rd argument;
the meaning would be to remove sequences of any characters
specified.
- For the COMPARE function, the standard could allow an arbitrary
string rather than a single character as its 3rd argument;
the meaning would be to ignore, in the excessive part of the
longer operand, sequences of any characters specified.

Note that there is no need to define a canonic form for white space in
operands, as there would be REXX functions to accomplish any desired
transformations, if the above items 1 to 6 became standard. Particularly,
stretches of white space could be easily transformed to single blanks
(to allow for a sensible comparison) by applying the SPACE function.
Note also that this function has already the suggestive name of SPACE
rather than BLANK :-)

Best wishes,
Otto Stolz <RZO...@DKNKURZ1.Bitnet>
<RZO...@nyx.uni-konstanz.de>

Anders Christensen

unread,
Aug 26, 1992, 8:40:39 PM8/26/92
to
I completely agree with Otto Stolz, both on the description of the
current situation, and on what should be done. His list of checkpoints
is very timely and should be addressed in the coming ANSI standard.
However, on one point I disagree:

In article <REXXLIST%92082621432914@DEARN> Otto Stolz <RZO...@DKNKURZ1.BITNET> writes:

> Note 1: Another recent contribution to REXXLIST stated that REXX source
> code, and REXX operands, might be represented in different
> character codes (perhaps including different notions of white
> space). A cursory scan through TRL did not reveal any support
> for this statemnt.

To quote TRL:

"Programming in the REXX language can be considered to involve the
use of two character sets. The first is used for expressing the
REXX program itself, and is the relatively small set of characters
described in the next section. The second character set is the set
of characters that can be used as data by a particular
implementation of a REXX language processor. This character set may
be limitied in size (often to a limit of 256 different characters,
which have a convenient 8-bit representation), or it may be much
larger. Usually, most or all of these characters are also allowed
within a REXX program, but only within commentary or immediate
(literal) data."
TRL, 2nd ed.
Part 2, section 1, page 18.
First paragraph of "Character Sets",

Of course, the term "two character sets" does not mean that the Rexx
script is written in EBCDIC and the data is in ASCII. Rather, it means
that the characters allowed in a rexx script are a subset of the
characters allowed to be handled as data by the interpreter. For
instance, ISO 646 (7-bit ASCII) can be allowed in the rexx script
(except for comments and strings), while ISO 8859-1 is the character
set that forms the data.

Consequently, Rexx may allow a lesser set of whitespace characters for
use in the Rexx script source code (except from comments and strings),
while it may recognize a much broader set of whitespace characters in
the data. The total character set that can be used in the Rexx scripts
can be (including comments and strings) less than the character set
handled by the interpreter. End-Of-Line characters (for the machines
that use such) are allowed in data, but explicitly not allowed in
strings in a Rexx script.

For instance, Tab (ascii 9) and Space (ascii 32) could be interpreted
as whitespace in rexx source code, while Tab, Space and Non-breakable
space (ascii 160) could be interpreted as whitespace in data.

Btw, I don't remember saying anything about source code and data (or
operands as you put it) should be represented by different character
codes. What I did say (or least what I did mean) was that source code
and data should be of two character sets.

> To me, this tight coupling suggests that source program and
> operands should ideally be expressed in the same character

> code. [...]

It is, except that the source program can only use a subset of the
character codes that the data can use.

-anders

Scott Ophof

unread,
Aug 26, 1992, 8:54:12 PM8/26/92
to
On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said:
>In article <920826032...@SERVER.uwindsor.ca> op...@SERVER.UWINDSOR.CA
>(Scott Ophof) writes:
>>...

>it is sometimes hard to predict it in advance, and it might differ
>from machine to machine, and from login-session to login-session.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is a tough byte to swallow...

>>...


>TRL always uses the terms 'blank' and 'blanks', I believe?

Oops! My apologies both to the list/group and Mike; I was still
thinking in terms of the 1st edition. A quick run-through of the
2nd edition shows it to be more consistent (British understatement)
in using "blank"/ "blanks", as Anders remarks.

But I still don't see any need for more than one character being a
blank, ie. the character that can always be used as separator of
words.
*The* strong point of REXX is (imho) that it can so easily be used
in operations on human languages. One qualification: I restrict
myself to the character set of the Western world (and its national
variants) due to lack of knowledge re languages using pictograms
and such.

Thus the EBCDIC vs. ASCII issue becomes rather unimportant, as does
CMS vs. rest-of-world.
Catering to envirs where characters are effectively abused because
of antiquated needs to save space also seems to me to *not* be The
Way To Go In The Future.
Hitting the TAB-key (etc.) eases the work of the typist. OK.
But the data in the file should have the correct number of blanks.
At the application level programmers should *not* need to concern
themselves with the disposition of data in a (disk) file, as Dave
Gomberg implies.

...
>I think you would hate even more to port a rexx program from one unix
>machine to another unix machine; and have the program fail due to one

>...

(Urgh) :-( No, I *won't* make a nasty remark about UN*X... (grin)

In going from one system to another, imho *no* translations like
TAB-to-SPACEs should take place. How can one parse on tabs when
they have been converted to spaces in the ftp / copy / whatever
process?

IMHO it's high time (computer) envirs put more effort into
standardizing *across the board*, putting behind them all the
antiquated stuff and petty local differences.
(I know, it's idealistic...)


>> My suggestion?
>> In the interest of increasing the chance of successful porting, to
>> request the ANSI-REXX committee to define that the *only* blank/
>> whitespace recognized in standard REXX is the SPACE character (ASCII
>> hex-20, EBCDIC hex-40).

>And there is indeed also a very good chance that this suggestion (if
>accepted) will make any rexx interpreter for Unix rather useless, or
>at best, just redline the astonishing factor (see end of posting for
>an example).

Note that I said "standard REXX".
Whether specific implementations decide to extend their definition
for local purposes, I don't really care. My concern is only for
portability *regardless* of system. And on all those I know (of),
SPACE is the only character they all recognize as word delimiter.


>Section one of TRL states that Rexx "... involve the use of two
>character sets." One used for the rexx script (source code), and one
>used by the interpret under execution (data). If you want to define
>that a rexx script may only include certain characters as blanks under
>parsing of the source code (except in quotes and comments), that's
>fine. However, if you suggests that only certain characters should be
>recognized by the interpreter as blanks under execution (as data), then
>I fear Real Trouble.

So what is it that needs to be done to get all these envirs to end
their (*OUR*!!) petty squables over minor differences and start
working *together*? :-)
(Though not relevant to REXX, I do want to say that I'm getting sick
& tired of opsys wars, Internet/BITnet/EARN bickering, the
egotistical/greedy side of patenting/copyrighting, ad nauseum...)

Please, ANSI-REXX committee, keep up the good work to ensure a
decent *overall* standard!


>> Your comments? :-)
>Please, by all means, standardize what a blank is, but *please*, don't
>standardize it in such a way, that it makes it impossible to use the a
>true standard Rexx interpreter on some platforms.

>My suggestion?


>In the interest of increasing the chance of successful porting of the
>Rexx standard itself from EBCDIC to ASCII based systems, to request
>that the ANSI-REXX committee explicitly allows the common whitespace
>characters of the host operating system to be interpreted as 'blank'
>characters, and that the definition of exactly what is blank, is
>implementation-dependent and system-dependent.

I think we effectively *agree* on the basics of the issue, Anders.
May I respectfully ask both the implementors and the users sections
of the ANSI-REXX committee, ex officio, for their views?


Regards.
$$/

Dave Gomberg

unread,
Aug 26, 1992, 9:10:54 PM8/26/92
to
There may be a need to control print formatting, but the way to achieve
that is not the addition of the 127-to-an-emm space. The way to do that
is PCL or PostScript or TeX or someother tool designed to solve the
problem. You don't screw up the character set just to try to control
output.

Pat Mueller

unread,
Aug 26, 1992, 10:40:05 PM8/26/92
to
If it's useful to be able to handle it BOTH ways - make it a
value of the OPTIONS statement. IBM rexx implementations already
have option 'words' for DBCS enablement.

Patrick Mueller
pmu...@vnet.ibm.com
Programming Systems, IBM Cary, NC

Anders Christensen

unread,
Aug 27, 1992, 2:30:35 AM8/27/92
to

> But I still don't see any need for more than one character being a
> blank, ie. the character that can always be used as separator of
> words.

Please reread the example at the end of my previous posting, where I
showed one example of why Unix need to interpret Tab as a blank. I
don't think it is a matter of what is needed, it is rather a matter of
what already exists. All (true) ASCII machines use Tab and Space as
whitespace, although the use is most common on Unix machines.

> Thus the EBCDIC vs. ASCII issue becomes rather unimportant, [...]

I don't think so. ASCII uses multiple whitespace characters (Tab and
Space), while EBCDIC uses one. TRL does not explicitly address the
problem, but implicitly uses only one space character. This gives room
for implementors on ASCII machines to interpret "blank" as they need.
But if the standard includes a requirement that Rexx should *only*
have one space character, the running a true standard rexx interpreter
on some machine will become very difficult.

> Catering to envirs where characters are effectively abused because
> of antiquated needs to save space also seems to me to *not* be The
> Way To Go In The Future.

Sorry, but the future seems to go away from EBCDIC and One space, and
in direction of large ASCII based character set (32 bit) and Many
spaces. Note, this is _not_ done to save space (32 bit characters
don't exactly decrease the need for disk and memory).

> Hitting the TAB-key (etc.) eases the work of the typist. OK.
> But the data in the file should have the correct number of blanks.
> At the application level programmers should *not* need to concern
> themselves with the disposition of data in a (disk) file, as Dave
> Gomberg implies.

You want to make rexx data closer to human languages? Fine, then
remove the concept of a space character altogether. After all, the
space character is a computer kludge to deal with grouping of
characters. No human (Latin-based) languages use a space letter, they
use graphical letters separated by open space. The space it not a
letter, it is just emptyness.

> In going from one system to another, imho *no* translations like
> TAB-to-SPACEs should take place. How can one parse on tabs when
> they have been converted to spaces in the ftp / copy / whatever
> process?

When taking from an ASCII to an EBCDIC system (or vice versa), you
_must_ translate the contents. The clue it that *you* don't parse on
Tab, the *computer* (i.e. the rexx interpreter) parses on Tab,
depending on the common customs of the host operating system. So
instead of forcing you, as a user to have to take care of the Tabs,
the computer (i.e. the Rexx interpreter) should handle it. The
interpreter can only do that if it knows what is whitespace and what
is not whitespace for your particular system.

You have not yet described a specific situation where the
interpretation of Space and Tab as whitespace on ASCII machines would
create problems on either ASCII or EBCDIC machines. I still assert
that this will not create any problems.

> Note that I said "standard REXX".
> Whether specific implementations decide to extend their definition
> for local purposes, I don't really care. My concern is only for
> portability *regardless* of system. And on all those I know (of),
> SPACE is the only character they all recognize as word delimiter.

That is a contradiction. If an implementation has 'local' extensions,
it is not standard anymore. If you really want a Rexx standard that
has any chances to become widespread, you should wish for a standard
that only defines what most people agree on, and leave the rest an
open question. If the standard is to strict, few people will use it.
The world is full of unused standards ...

As an implementor, my attitude is that I want to make the interpreter
compatible to the ANSI REXX, but if that standard is totally
impossible for those platforms I work on, then I have to make an
extension or non-standard behavior for the features that don't work.
And having added _one_ non-standard feature, the magic of ANSI REXX is
gone, and it is much easier (psychologically speaking) to add the next
extension ... etc

> So what is it that needs to be done to get all these envirs to end
> their (*OUR*!!) petty squables over minor differences and start
> working *together*? :-)

I don't want to reform the rest of the world. I just want to write a
nice Rexx interpreter. In particular, I don't want to write a Rexx
interpreter that doesn't work, just because it is "standard".

> (Though not relevant to REXX, I do want to say that I'm getting sick
> & tired of opsys wars, Internet/BITnet/EARN bickering, the
> egotistical/greedy side of patenting/copyrighting, ad nauseum...)
>
> Please, ANSI-REXX committee, keep up the good work to ensure a
> decent *overall* standard!

Yes, I agree! On both points.

Someone posted earlier that CMS programmers often tend to regard the
format of data as very constant, e.g. that the interesting portion of
the output from command XXX starts in column 42, and is 8 characters
long. That is very different from the Unix approach, in which the
exact column where things start is often not constant, and where the
number of whitespace-separated words in the output is used to find the
right data. The difference can be seen from these two examples

parse var foo 42 user +8
parse var foo . . . user .

Rexx is powerful enough to handle both approaches. But this will only
work on Unix machines, if Unix Rexx interpreters are allowed to
interpret all Unix' whitespace characters as blanks.

No, I am not trying to start a opsys-war, and I think opsys-wars are
just a waste of time. But I do think it is important for Rexx to be
defined in such a way that it can be used under most operating system,
including Unix.

-anders

Dave Gomberg

unread,
Aug 27, 1992, 10:22:02 AM8/27/92
to
Unix REXX interpreters should do whatever they need to do to overcome the
shortcomings of the underlying operating system, such as a confusion on
the part of the original designers as to what a space was.

It is NOT true that this is an ASCII vs EBCDIC discussion, most PC-DOS and
OS/2 applications don't see tab as whitespace, and almost all of the ones
that do do so only optionally. And there are MANY more PCs that Unix
machines.

Mutliple space characters (tab) are a typing and disk compression convenience.
Partial space characters are a way to format printed output.

Neither belong in the description of the syntax of a programming language.


Dave Gomberg GOMBERG@UCSFVM Internet node UCSFVM.UCSF.EDU (415)731-7793
Seven Gateview Court, San Francisco CA 94116-1941

Steve Bacher

unread,
Aug 27, 1992, 10:03:00 AM8/27/92
to
In article <ANDERS.92A...@lise3.lise.unit.no>,
and...@lise3.lise.unit.no (Anders Christensen) writes:

>You ftp a file from Unix to CMS. You don't use binary mode, since the
>ASCII code would be unreadable on a EBCDIC machine anyway. So you use
>text mode, and your CMS machine translates the text to EBCDIC,
>including translating the tabs to spaces. No problem.

Not so (at least with MVS). ASCII tabs are translated to EBCDIC tabs.

[comments on parsing the output of the "who" command]

As long as we're talking Unix, you can do

"who | expand"

as a portable (from one Unix to another, anyhow) workaround.


--
Steve Bacher (Batchman) Draper Laboratory
Internet: s...@draper.com Cambridge, MA, USA

Dave Gomberg

unread,
Aug 27, 1992, 11:39:48 AM8/27/92
to
Anders, I really think you want to consider a bit more before you write.
Your last posting was so full of things which are wrong, and other things
with which I disagreed that it was too much work for me to bother to
reply. I think if you tended to focus your thoughts a bit more on the
real issues, and less on U**x, you might attract a wider audience. But
then U**x folks like C, so maybe I am wrong. Dave

Eric Thomas

unread,
Aug 27, 1992, 1:55:21 PM8/27/92
to
In article <ANDERS.92A...@lise3.lise.unit.no>, and...@lise3.lise.unit.no (Anders Christensen) writes:
> When taking from an ASCII to an EBCDIC system (or vice versa), you
> _must_ translate the contents. The clue it that *you* don't parse on
> Tab, the *computer* (i.e. the rexx interpreter) parses on Tab,
> depending on the common customs of the host operating system. So
> instead of forcing you, as a user to have to take care of the Tabs,
> the computer (i.e. the Rexx interpreter) should handle it. The
> interpreter can only do that if it knows what is whitespace and what
> is not whitespace for your particular system.

Ok, time for the usual stupid question. Say I have a program that does:

Parse var line ':'tagname'.'value' :'line

Say I run that program on an ASCII system which recognize the 20-odd types of
blanks Otto showed in his posting. Where does my 'value' variable end, when the
interpreter encounters a SPACE followed by colon (ancient, despicable, evil
EBCDIC-type behaviour, surely that must not be the case), or also when it
encounters TAB followed by colon (nice, modern ASCII-type behaviour which
happens to break the program because that is not what the programmer wanted
and thought the interpreter would do).

In either case you have a problem. If the blank in the search string only
stands for SPACE, it is very difficult to indicate that you want any of the 20+
white space characters to match. You would almost need a new WSPARSE command,
and WSPOS, and so on. If on the other hand the blank stands for any white space
character, you have no way in the language to halt on just a SPACE when you
need to do that. OPTIONS is not a solution, a given program may well need both
functions very often and switching OPTIONS statements is at best impractical.

Eric

Paul Russell

unread,
Aug 27, 1992, 12:23:47 PM8/27/92
to
On Thu, 27 Aug 1992 06:30:35 LCL Anders Christensen said:
... lots of stuff deleted ...

>Someone posted earlier that CMS programmers often tend to regard the
>format of data as very constant, e.g. that the interesting portion of
>the output from command XXX starts in column 42, and is 8 characters
>long. That is very different from the Unix approach, in which the
>exact column where things start is often not constant, and where the
>number of whitespace-separated words in the output is used to find the
>right data. The difference can be seen from these two examples
>
> parse var foo 42 user +8
> parse var foo . . . user .
>
>Rexx is powerful enough to handle both approaches. But this will only
>work on Unix machines, if Unix Rexx interpreters are allowed to
>interpret all Unix' whitespace characters as blanks.
... more stuff deleted ...
I've been following this discussion for several days and have found it
both interesting and enlightening. I was a little concerned about the
general nature of the statement cited at the beginning of the excerpt
above, however, I let it pass without comment, because I didn't feel
that an editorial comment would add value to the discussion. However,
since it now appears that the original statement has been accepted as
gospel, I think that it's time to add my 2 cents worth, to wit: *SOME*
CMS REXX programmers (including me) do parse *SOME* things on the basis
of hard-coded "column" numbers, *IF* the format of the data is constant.
However, it has been my experience (however limited you might perceive
that to be) that the format of the data is seldom constant, so, it seems
that *MOST* CMS REXX programmers parse data on the basis of white-space-
delimited words and/or the presence of known constants at least as often
as they parse data on the basis of hard-coded "column numbers". Jumping
to the conclusion that *ALL* CMS REXX programmers have an "80-column
card" mindset is just about as valid as jumping to the conclusion that
*ALL* U**X programmers are (insert your favorite pejorative here). 8-)
I can hardly remember the last time that I used an 80-column card, but
I'm sure that I was writing a grocery list on the back. 8-) pdr

Jim Glover

unread,
Aug 27, 1992, 3:10:36 PM8/27/92
to
>However, it has been my experience (however limited you might perceive
>that to be) that the format of the data is seldom constant, so, it seems
>that *MOST* CMS REXX programmers parse data on the basis of white-space-
>delimited words and/or the presence of known constants at least as often
>as they parse data on the basis of hard-coded "column numbers".

I'd like to add my "Amen!" I too am a CMS REXX programmer, and I can
tell you that I *rarely* parse based on column position. I do make an
exception sometimes when data will be located in a particular position
without any delimiting white space at all, or when data will be located
in a particular position and the preceeding fields may unpredictably
be blank, thus preventing my knowing how many .'s to include in some-
thing like PARSE VAR INPUT . . . Stuff_I_Want .
I'd guess that such cases represent about 1% of what I do in REXX. The
rest of the time, I'm using white space just like everyone else.
--Jim

Otto Stolz

unread,
Aug 27, 1992, 3:36:39 PM8/27/92
to
In article <REXXLIST%92082621432914@DEARN> I said:
> A cursory scan through TRL did not reveal any support
> for this statemnt.

On Thu, 27 Aug 92 00:40:39 LCL Anders Christensen said:
> "Programming in the REXX language can be considered to involve the
> use of two character sets. [...]"


> TRL, 2nd ed.
> Part 2, section 1, page 18.
> First paragraph of "Character Sets",

Sorry, my "cursory scan" started in the right place (Part 2, Section 1)
of the wrong book (1st edition, 1985).

> [...]


> Consequently, Rexx may allow a lesser set of whitespace characters for
> use in the Rexx script source code (except from comments and strings),
> while it may recognize a much broader set of whitespace characters in
> the data.

This is a reasonable demand, but it places an extra burden on the
standard committee. The REXX standard will have to give explicit (though
not system specific) rules about the mapping from the source character
set to the data character set underlying the following language
features:
- literal strings,
- SOURCELINE function,
- ADDRESS, and TRACE, functions,
and about the mapping from the data character set to the source character
set underlying the following features:
- INTERPRET statement,
- SYMBOL, and VALUE, functions,
- VALUE sub-keywords of ADDRESS, SIGNAL, and TRACE, statements.
I hope I haven't missed any relevant features.

These rules should be designed to preserve the meaning of the character
strings as much as possible. Some of the points to be considered are:
- when the source character set is a proper subset of the data character
set, the rules should state that source code is transferred unalterd
into the data domain;
- when a graphic character is available in one domain S but lacking in
the other domain T, the rules could state
- either that this character may not be contained in character strings
bound to be mapped from S to T (e.g. the statement
INTERPRET "say '" || blabla || "'"
would result in a runtime error, if the blabla variable comprised
a character not allowed in the source program (not even in literal
strings), whilst the statement
INTERPRET "say blabla"
would work under the same circumstances),
- or that the offending character would be mapped to a suitable
substitute such as the SUB character of domain T (e.g. '1A'x in
ISO 646 and ISO 6429, '001A'x in the BMP of ISO 10646), or '3F'x
in EBCDIC);
- when white space is represented differently in the two domains, the
rules could state that one representation be mapped to an equivalent
one;
- if at all possible, the statement
INTERPRET sourceline(i)
should have the same effect as the REXX statement(s) contained in
line i of the respective source program (aka "round-trip integrity"),
or otherwise the standard should define all possible discrepancies.

I am not sure which rule would be most practical. Opinions? Note that I
do *not* suggest the REXX standard should state explicit character names
(let alone particular bit combinations): these were only fyi -- rather I
suggest the standard should state its rules independent of the underlying
character code (as in "a suitable substitute", above).

Otto Stolz +49 7531 88 2645

unread,
Aug 27, 1992, 5:37:20 AM8/27/92
to

I agree, wholeheartedly. I did not give that list of various spaces from
ISO 10646 to imply that all of them must be used to control output; my
point was that they are defined in the forthcoming Universal Character
Code, which imho is bound to stay.

REXX will have to cope with major existing and forthcoming character
codes, as characters are the stuff both REXX programs and REXX data are
made of. Consequently, the REXX standard must be compatible with features
these codes exhibit.

Particularly regarding space characters, the REXX standard must account
for these properties:
- there are character codes comprising several space characters,
- some space characters are meant to separate words, other characters
are meant to form part of a word even if their visual representation
consists in absence of a graphic symbol,
- some character codes comprise tab characters (and possibly other means)
to express the notions of either white space, or word boundaries, or
both.

TRL effectively uses the terms "blank", or "blank character", as a
synonym for the notions of "word delimiter"; moreover, TRL tacidly
assumes that there is only one sort of "blank" (aka "space") character
in the underlying character code. (Cf. eg. the sub-section on "Parsing
strings into words" of part 2 section 9, or the definition of the SPACE
function in part 2 section 8 of the 1st edition -- sorry, I haven't the
2nd edition at hand.) Now we have seen that the tacid assumptions do not
hold, we have to find the best compromise between the author's original
intend, feasibility, usefulness, and adequacy (the notorious "least
astonishing factor").

In such hermeneutic process, I came to the conclusion, that

1. whenever the emphasis is on recognizing words (or tokens), REXX should
recognize (and treat interchangeably) all characters normally used
to delimit words

(these are the language features I listed under "white space", the
other day),

2. whenever a REXX construct is said to generate a single "blank" char-
acter, this should be interpreted as "one single character, that will
both act as a word-delimiter in any subsequent parsing operation and
appearing as white space on any subsequent output" -- and analogously
for constructs generating several "blank" characters

(these are the language features I listed under "blank character", the
other day).

In the UCC (ISO 10646), the following characters will definitely serve
as word-delimiters, hence would be covered by item 1, above:
Space
Ideographic space
the following characters will never serve as word-delimiters, hence they
shall be treated as ordinary (non-space) characters:
Non-breaking Space
Figure space
I will have to read the final text of the standard to assess the other
sorts of space and tabulator characters.

I am still proposing that the REXX standard should
- distinguish between the notions of (word-delimiting) white space and
the blank character (generated by several language features),
- exploit the new terms consistently, and throughout (cf. my checklists
of yesterday).
Furthermore I suggest that the REXX standard should present, in an in-
formative annex, examples (based on popular character codes) of space
characters and non-space characters (cf. above).

On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof <op...@SERVER.UWINDSOR.CA>
said:


> But I still don't see any need for more than one character being a
> blank, ie. the character that can always be used as separator of
> words.

One crucial point is to distinguish analysing data from generating data.
When REXX has to analyse data, the question is not whether one character
*can* always be used, but rather whether it *will* always be used.
Believe me: it won't :-(

Another, less obvious, point is, whether indeed one-and-the-same
character will fit all word-delimiting situations. With universal
character codes, such as ISO 10646, one size does *not* fit all!
The reason is that ideographic scripts (Chinese, Korean, Japanese)
require a different space character from letter, or syllable, based
scripts. I have not made up my mind what the REXX (or any programming
language) standard should do regarding this intricacy.

On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said:
> My concern is only for
> portability *regardless* of system. And on all those I know (of),
> SPACE is the only character they all recognize as word delimiter.

Precisely to bring about the desired portability, REXX must act
- as permissively as possible when analysing data, and
- as predictably as possible when generating data.
Your observation that SPACE is a (perhaps the) character all systems
recognize does not guarantee portability of programs and data, if at
least one system has additional means to delimit words! (We all know
meanwhile that indeed these systems exist, even abound.)

On Thu, 27 Aug 92 06:30:35 LCL Anders Christensen said:
> Someone posted earlier that CMS programmers often tend to regard the
> format of data as very constant, e.g. that the interesting portion of
> the output from command XXX starts in column 42, and is 8 characters

> long. [...]

To guarantee that positional parsing templates, SUBSTRING functions,
and the like, work as expected, the REXX standard should explicitely
state:
Data passed directly from one REXX program to another REXX program will
be delivered unaltered; particularly, neither seemingly irrelevant
information (such as trailing white space) will be removed, nor any
form of data reduction (such as replacing sequences of blank characters
with tabulator characters) will take place.

This rule will apply to the following situations:
- arguments passed to external REXX routines,
- results yielded by external REXX functions,
- data exchange via the external data queue,
- data exchange via persistent data streams.

Note: any command sent to an external environment is subject to the
rules of that environment. Hence, when a REXX program sends a command
to cause the environment to invoke another REXX program with a
particular argument string, then the latter is subject to any data
transformations normally effected by that environment.

In article <920827005...@SERVER.uwindsor.ca> Scott Ophof writes:
> Hitting the TAB-key (etc.) eases the work of the typist. OK.
> But the data in the file should have the correct number of blanks.
> At the application level programmers should *not* need to concern
> themselves with the disposition of data in a (disk) file, as Dave
> Gomberg implies.

Will REXX always be used at the application level? Do you want to pre-
clude REXX being useed as a system programming language?

Digression:

On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said:

> [the character code in use] is sometimes hard to predict in advance,


> and it might differ from machine to machine, and from login-session to
> login-session.

On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said:
> This is a tough byte to swallow...

Honestly, I am using a system of this sort. Even worse: the character
code will change with every edit session you start! This is our attempt
to cope with IBM's character code policy (can you say "un-policy"?):
various system parts (printers, terminals, compilers, word processors,
...) implement differerent character codes. No official, IBM defined,
I/O interface code matches the code expected by official, IBM supplied
compilers (you can either buy a terminal that correctly enters the braces
for the Pascal compiler, or the brackets, but not both). Hence, we
invoke a character translation routine whenever the user starts editing
a Pascal program, another one when he/she starts editing a TEX source,
and so on.

This byte is so tough that SHARE Europe, the European IBM user's organi-
sation, has been chewing it for 12 years.

End-of-digression.

On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said:
> Please, by all means, standardize what a blank is, but *please*, don't
> standardize it in such a way, that it makes it impossible to use the a
> true standard Rexx interpreter on some platforms.

Rather, standardize *how words are delimited*.

Dave Gomberg

unread,
Aug 27, 1992, 4:11:46 PM8/27/92
to
Maybe the following comprimise on standards handling of white space is
acceptable:

1. The standard thruout uses the word blank.

2. This text is added: "Some operating systems routinely use more than
one character as meaning the same, or similar to, a blank. In those
operating systems, implementers are encouraged to treat as a blank
character any of the set of characters which the vast majority of
users would expect to be treated as a blank in that operating system.
If there is no consensus among users as to which characters are
equivalent to blanks, then only the blank character should be so
implemented."

Anders Christensen

unread,
Aug 27, 1992, 8:50:53 PM8/27/92
to
In article <REXXLIST%9208271...@UGA.CC.UGA.EDU> Dave Gomberg <GOM...@UCSFVM.BITNET> writes:

> Sender: REXX Programming discussion list <REXX...@UGA.BITNET>
> From: Dave Gomberg <GOM...@UCSFVM.BITNET>


>
> Anders, I really think you want to consider a bit more before you write.
> Your last posting was so full of things which are wrong, and other things
> with which I disagreed that it was too much work for me to bother to
> reply.

If you bother to state in public that I am wrong, then I think you should
bother to indicate where I am wrong.

> I think if you tended to focus your thoughts a bit more on the

> real issues, and less on U**x, you might attract a wider audience. [...]

At least to me, the Tab/Space problem *is* very real. The original
posting in this tread was a suggestion from Scott Ophof that the ANSI
standard should only have one space character. This, IMHO might have
serious negative effects on ASCII machines. *That* was my concern.

-anders

Anders Christensen

unread,
Aug 27, 1992, 10:13:58 PM8/27/92
to
In article <REXXLIST%9208271...@UGA.CC.UGA.EDU> Dave Gomberg <GOM...@UCSFVM.BITNET> writes:

> Maybe the following comprimise on standards handling of white space is
> acceptable:
>
> 1. The standard thruout uses the word blank.
>
> 2. This text is added: "Some operating systems routinely use more than

> one character as meaning the same [...]

That is the "status quo" solution, It really does not say anything
more than 2nd ed of TRL, except from the note that the concept of
'blank' is likely to differ between operating systems.

If no other solution is acceptable, then that is the best solution,
since it leaves the TRL as is (most people acknowledges TRL as the
basis of the standard). The same solution should be chosen in other
situations where agreement is impossible to achieve. Many may not
think that is is the _perfect_ solution, since it is not very
specific, but I assume that it at least is the solution acceptable to
most parties.

With this flame-war on REXXLIST/comp.lang.rexx, one thing has at least
been achieved: the probability that the problem is going to be
discussed at the coming ANSI REXX meeting has increased ... :-)

> If there is no consensus among users as to which characters are
> equivalent to blanks, then only the blank character should be so
> implemented."

I am not really sure how to interpret this. As long as "users",
"consensus" and "be implemented" (implemented where? and by whom?) is
rather unspecified, I think that sentence should be left out.

-anders

Scott Ophof

unread,
Aug 27, 1992, 11:37:20 PM8/27/92
to
On Thu, 27 Aug 1992 17:55:21 GMT Eric Thomas said:
>In article <ANDERS.92A...@lise3.lise.unit.no>, and...@lise3.lise.unit.no
>(Anders Christensen) writes:
>> When taking from an ASCII to an EBCDIC system (or vice versa), you
>> _must_ translate the contents. The clue it that *you* don't parse on
>>...

>Ok, time for the usual stupid question. Say I have a program that does:
> Parse var line ':'tagname'.'value' :'line

>Say I run that program on an ASCII system which recognize the 20-odd types of
>blanks Otto showed in his posting. Where does my 'value' variable end, when the
>interpreter encounters a SPACE followed by colon (ancient, despicable, evil
>EBCDIC-type behaviour, surely that must not be the case), or also when it
>encounters TAB followed by colon (nice, modern ASCII-type behaviour which
>happens to break the program because that is not what the programmer wanted
>and thought the interpreter would do).

>In either case you have a problem. If the blank in the search string only
>stands for SPACE, it is very difficult to indicate that you want any of the 20+
>white space characters to match. You would almost need a new WSPARSE command,
>and WSPOS, and so on. If on the other hand the blank stands for any white space
>character, you have no way in the language to halt on just a SPACE when you
>need to do that. OPTIONS is not a solution, a given program may well need both
>functions very often and switching OPTIONS statements is at best impractical.

I just tried to set up the required subroutine for a number of
scenarios with the above PARSE statement, the two which Eric
mentions among them. What a can of worms! :-(
The only decent solution I can come up with that would work ANYWHERE
is to require each pair of "tagname"/"value" to be on a separate
line (or record, whatever) starting past column 1, with the initial
pair of an entry starting AT col 1. So the first colon on a line
triggers "tagname", period. And the first dot after it will trigger
"value". Any forms of leading whitespace on non-initial pairs will
then be ignored, EFFECTIVELY SIDE-STEPPING THE PROBLEM...

Of course, the PARSE statement would then be:


Parse var line ':'tagname'.'value

Please read the next para in cynical or serious mode, or a mixture;
both are applicable. (*grin*)

That the file now needs to be restructured isn't really important.
Same re it possibly becoming a mega-line file; on a record-based
machine more disk space will be eaten, especially if fixed-length.
With variable-length records, or on a byte-stream architecture, the
difference shouldn't prove disastrous.

BUT this is NO solution to the REAL problem. HELP! :-(


Regards.
$$/

Dave Gomberg

unread,
Aug 28, 1992, 1:32:04 AM8/28/92
to

A standard mainly tells implementers what to implement, and to a lesser
extent, users what kind of code to write. If users don't know certain
funny characters are spaces, the implementers in that environment should
not implement them that way. Seems perfectly straightforward to me.

Dave Gomberg

unread,
Aug 28, 1992, 1:37:05 AM8/28/92
to
Eric's problem generates no "can of worms" in one case, the case where
blanks are spaces, and nothing else is. Then value is terminated by
one or more spaces followed by a colon, or the end of line. Simple,
and right. The so-called problem comes from a design flaw in an
underlying system that synonymizes characters that are not at all the
same because the implementers did not understand history.

If it is required that 20 characters be syonymized because the designers
of a character set did not understand the distinction between a character
set and a printer control languague, I for one don't want to buy a device
that fools around with that character set.

TDT...@pucc.princeton.edu

unread,
Aug 28, 1992, 9:40:19 AM8/28/92
to
I'm lost at this point, partly because we're talking about two
distinct questions. The one is, "What character or characters can
delimit a word?", the other is, "Should a literal blank represent
more than a single kind of space?"


In the first, it seems to me that you want to pick out what you
regard as a word; what separates words should make no difference.
The CMS version of REXX could even benefit from an OPTIONS parameter
to modify this. I've been bit once or twice trying to parse CP
output that contained a new line (x'15'). The translate function
is useful in this case, but on other systems letting a word be a word
makes more sense.

The other question is illustrated by Eric's question. It seems to
be that in PARSE and elsewhere a literal string should only match
a single character string; literal blank should match just one
character. If you need to match on more than a single character,
then 'PARSE' needs to be extended, but not just to handle multiple
meanings for a blank.

Tom True
Princeton University
CIT - Advanced Applications
TDT...@PUCC.PRINCETON.EDU
(609) 258-6064

Dave Gomberg

unread,
Aug 28, 1992, 12:41:50 PM8/28/92
to
There are really four questions:

1. How do we seperate words?
2. How do we save disk space?
3. How do we save typist hassle?
4. How do we make our printed output look nice?

I claim we ought, on this list, to address only 1. The others belong on
the operating system, disk management, word processing, and desktop
publishing lists. From the perspective of question one alone, the answer
is obvious, with a blank. And just a blank (or two). And that is what
TRL documents. If your OS has let considerations leak from questions 2-4
into question 1, that is a fault in the design of that system. Nothing
on this planet is perfect, least of all mis-designed operating systems.

Jon Schmidt

unread,
Aug 28, 1992, 12:24:57 PM8/28/92
to
In article <ANDERS.92A...@lise3.lise.unit.no>
and...@lise3.lise.unit.no (Anders Christensen) writes:
[stuff deleted]
>In ASCII, the following characters are often considered 'whitespace',
>listed in decreasing order of 'whitespaceness' (codes in decimal)
>
> ascii 32 - space
> ascii 9 - HT (horizontal tab)
> ascii 10 - LF (line feed)
> ascii 13 - CR (carriage return)
> ascii 12 - NP (new page, or FF - formfeed)
> ascii 11 - VT (vertical tab)
>
>There might be even more. And worse, in some modes, I think characters
>above 128 are space characters, like hard-space (a space that can not
>be divided between lines). In particular the HT is considered
>whitespace, since it conceptually a number of compressed space
>characters (customarily 2-8).
>
[stuff deleted]
>By the way ... I really can't see the problem? Unix generates tabs
>and spaces as whitespace, the Unix rexx interpreter interprets boths as
>blanks, No problem!
[stuff deleted]

I come from CMS and just recently began using a commercial REXX
interpreter under UNIX. The manual for this interpreter always
uses words "blank" and "blanks" but never suggests that a blank
might be anything other than a space character. My meager UNIX
experience had shown than there are some text files where TAB and
SPACE characters are not "equivalent", such as makefiles. I was
therefore astonished to find that the UNIX REXX expression
('09'X='20'X) yielded a "true" result. I later wrote a little
UNIX REXX program (see below) to investigate this anomalous (to me)
phenomenon. I was shocked to discover that this UNIX REXX
interpreter considered 41 of the 256 possible 8-bit codes in the
range '00'X to 'FF'X to be blanks! 41? Where did this come from?
As an excercise, readers are challenged to write a portable REXX
function that returns the position of the Nth blank within a string,
where N and the string are passed as arguments to the function.

Here's my little program, written to explore the commercial UNIX
REXX interpreter's definition and handling of blanks:

#!/usr/local/bin/rxx
blank=LEFT('',1) /* Get a REXX-defined blank */
SAY 'Test 1 result:' ('09'X='20'X) /* 1 */
SAY 'Test 2 result:' ('09'X=='20'X) /* 0 */
SAY 'Test 3 result:' (blank='09'X) /* 1 */
SAY 'Test 4 result:' (blank=='09'X) /* 0 */
SAY 'Test 5 result:' (blank='20'X) /* 1 */
SAY 'Test 6 result:' (blank=='20'X) /* 1 */
SAY 'Test 7 result:' COMPARE(blank,'09'X||'20'X) /* 1 */
SAY 'Test 8 result:' POS(blank,'09'X||'20'X) /* 2 */
SAY 'Test 9 result:' BlankCount(XRANGE('00'X,'FF'X)) /* 41 */
EXIT 0
BlankCount:PROCEDURE /* Count blanks in a string */
nonblanks=0
DO n=1 TO WORDS(ARG(1))
nonblanks=nonblanks+WORDLENGTH(ARG(1),n)
END
RETURN LENGTH(ARG(1))-nonblanks

--
Jon_S...@stortek.com Storage Technology Corporation
(303) 673-3581 - voice 2270 South 88th Street
(303) 673-6039 - fax Louisville, Colorado 80028-5209

Edward T Spire

unread,
Aug 28, 1992, 1:41:21 PM8/28/92
to
and...@lise3.lise.unit.no (Anders Christensen) writes:
:
: With this flame-war on REXXLIST/comp.lang.rexx, one thing has at least

: been achieved: the probability that the problem is going to be
: discussed at the coming ANSI REXX meeting has increased ... :-)

This discussion has been rather reserved compared to the real flame wars
one sees. No personal attacks, etc. Useful, too.

I'm sure this will all be discussed over more than one ANSI session.
Maybe not the next one, however, the agenda's probably already set.

-Ed

Edward T Spire

unread,
Aug 28, 1992, 1:28:15 PM8/28/92
to
SEB...@MVS.draper.com (Steve Bacher) writes:
: In article <ANDERS.92A...@lise3.lise.unit.no>,
: and...@lise3.lise.unit.no (Anders Christensen) writes:
:
: [comments on parsing the output of the "who" command]

:
: As long as we're talking Unix, you can do
:
: "who | expand"
:
: as a portable (from one Unix to another, anyhow) workaround.

Parsing of Unix command output will never be portable from one machine
to another. I tried Ander's 'who' parser on AIX 3.2, and it didn't
work. Not because of the tab issue (uni-REXX **does** use isspace()
to determine what AIX classifies as whitespace), but because the AIX
'who' command output just plain doesn't look like the one Anders is
using! This crops up all the time on the system administration type
commands you want to use to manage disks, users, tasks, etc.

-Ed

Dave Gomberg

unread,
Aug 28, 1992, 4:03:23 PM8/28/92
to
I think systems where you can have trivia contests like: guess what
this command (or program) does? are a real kick in the pants!

Unfortunately, I also have work to do. Too bad I can't spend all day
on such a fun system!

Anders Christensen

unread,
Aug 31, 1992, 9:13:35 AM8/31/92
to

No. It is not straightforward.

1) It implicitly assumes that the only problems with whitespace is
whether to have one or many. Example: Suppose half the users thinks
that Space and Tab are whitespace, and the rest thinks that Space
Tab and Formfeed are whitespace. Then, since there are no consensus
among users about whitespace, inplementations can only use Space as
whitespace. Even though all users agree that Tab is a whitespace.

2) The term 'users' are unspecified. Is it _all_ the users, or is it
the users within one operating system? What about variants of
operating systems? What if all users in Europe agrees on one thing,
and all users in America agrees on another?

Does 'users' refer to all the users within one community, or just
the Rexx users? Maybe just Rexx programmers?

3) The term "consensus" is rather vague. Is it more than 50%? More
than 75%? Is it enough that _one_ person disagrees? Does
experienced users count as much as beginners?

Given these uncertainties, an implementor can choose to give it an
liberal interpretation. Then, he may implement whatever he wants,
justifying by something like:

"We asked three of our biggest customers, and after we explained the
situation, they agreed with our perception of 'whitespace'"

Or even just:

"Our interpreter targets the users who support our perception of
'whitespace'"

Now, do you see what I meant by 'unspecified'?

A better statement might be something like

"Exactly which characters are considered whitespace is
implementation-dependent, but implementors are strongly recommended
to define as whitespace only those characters commonly used as
whitespace for the operating system in question."

Or something like that .... The reason why I care is simply that I
understood your notes as a suggestion for the ANSI standard, and IMHO
such a vague statement should not appear in a standard.

-anders

PS: It's sad to hear that postings must look like the front page of
"The National Inquirer" (sp?) to be read.

Edward T Spire

unread,
Aug 31, 1992, 10:16:58 AM8/31/92
to
GOM...@UCSFVM.BITNET (Dave Gomberg) writes:
: There are really four questions:

I would like REXX to be useful in the environment in which it is to be
used. Hence if it is to parse OS generated output that contains tabs
where one would expect blanks, it is better to modify the REXX
definition to be something useful for that environment rather than
simply curse the darkness.

We're gonna have to modify some REXX definitions to make it useful
outside the narrow boundaries in which it grew up. The newer OS's do
not have an external data queue, they are generally much more case
sensitive than CMS, they support multi-programming, they will generally
have GUI human interfaces, they will support a heavily networked
environment, etc., etc. My goodness, if we must live with such a strict
interpretation of the original language definition as "words are
seperated by God-given blanks and nothing else" I wonder how we're ever
gonna adapt REXX to the evolving computing environment.

-Ed

Jerry Campbell

unread,
Aug 31, 1992, 10:19:29 AM8/31/92
to
In article 9208271...@UGA.CC.UGA.EDU, Paul Russell <PRUS...@IUBVM.BITNET> () writes:
>On Thu, 27 Aug 1992 06:30:35 LCL Anders Christensen said:
>.... lots of stuff deleted ...

>>Someone posted earlier that CMS programmers often tend to regard the
>>format of data as very constant, e.g. that the interesting portion of
>>the output from command XXX starts in column 42, and is 8 characters
>>long. That is very different from the Unix approach, in which the
>>exact column where things start is often not constant, and where the
>>number of whitespace-separated words in the output is used to find the
>>right data. The difference can be seen from these two examples
>>
>> parse var foo 42 user +8
>> parse var foo . . . user .
>>
>>Rexx is powerful enough to handle both approaches. But this will only
>>work on Unix machines, if Unix Rexx interpreters are allowed to
>>interpret all Unix' whitespace characters as blanks.
>.... more stuff deleted ...

>I've been following this discussion for several days and have found it
>both interesting and enlightening. I was a little concerned about the
>general nature of the statement cited at the beginning of the excerpt
>above, however, I let it pass without comment, because I didn't feel
>that an editorial comment would add value to the discussion. However,
>since it now appears that the original statement has been accepted as
>gospel, I think that it's time to add my 2 cents worth, to wit: *SOME*
>CMS REXX programmers (including me) do parse *SOME* things on the basis
>of hard-coded "column" numbers, *IF* the format of the data is constant.
>However, it has been my experience (however limited you might perceive
>that to be) that the format of the data is seldom constant, so, it seems
>that *MOST* CMS REXX programmers parse data on the basis of white-space-
>delimited words and/or the presence of known constants at least as often
>as they parse data on the basis of hard-coded "column numbers". Jumping
>to the conclusion that *ALL* CMS REXX programmers have an "80-column
>card" mindset is just about as valid as jumping to the conclusion that
>*ALL* U**X programmers are (insert your favorite pejorative here). 8-)
>I can hardly remember the last time that I used an 80-column card, but
>I'm sure that I was writing a grocery list on the back. 8-) pdr

Whoa! I recant! When I originally posted that "observation" I didn't mean
to typecast CMS programmers as inflexible cc oriented coders. I apologize
if you took it that way. However, I still hold that there *tends to be* a
difference in mindsets, paradigms, whatever you call it, between CMS
programmers and Unix programmers. I think this is right and natural, CMS and
Un*x are different after all. Some of us are (me) are still learning those
differences. If I'd meant that observation as an insult, I would have to
include myself as a target. I've programmed on CMS for a long time.

Ok, where are the marshmallows?!?!? Drat, hot dogs will have to do. Alright,
I'm ready, flame away! I promise, "I will never, ever, ever again say anything
that might incite opsys riots", I promise, I promise, I promise... :=)

---
Jerry Campbell reply to: zjl...@hou.amoco.com
Amoco Corp. ISD SSS/Graphics
Houston, Tx. 713/556-7036

Dave Gomberg

unread,
Aug 31, 1992, 4:19:54 PM8/31/92
to
My statement is vague, but "commonly used" is not vague? Anders, you
are full of it. Many, many, many lines of it. Dave

Dave Gomberg

unread,
Aug 31, 1992, 4:28:12 PM8/31/92
to
On Mon, 31 Aug 1992 14:16:58 GMT Edward T Spire said:
>I would like REXX to be useful in the environment in which it is to be
>used. Hence if it is to parse OS generated output that contains tabs
>where one would expect blanks, it is better to modify the REXX
>definition to be something useful for that environment rather than
>simply curse the darkness.

But Ed, the problem is not one other character, it is a raft (of undefined
size and contents) of other characters. And implemenation (and even usage
paradigm) specific at that. Far better each such environment should provide
an "expand" tool, and a place in the language processor for it to fit. So
that the effect of "expand | rexx" is achieved.

Eric Thomas

unread,
Aug 31, 1992, 8:21:57 PM8/31/92
to
In article <1992Aug31.1...@wrkgrp.COM>, e...@wrkgrp.COM (Edward T Spire) writes:
> I would like REXX to be useful in the environment in which it is to be
> used. Hence if it is to parse OS generated output that contains tabs
> where one would expect blanks, it is better to modify the REXX
> definition to be something useful for that environment rather than
> simply curse the darkness.

I think everyone agrees to that, the question is not whether REXX should make
it easy to parse system-generated output with tabs, but how, and, in
particular, how much of the language needs to be changed to provide a workable
solution to this problem.

Do not forget that the strength of REXX is that it is easy to learn by people
without the kind of background one needs to learn C or perl in a few
hours/days. Because they can learn REXX quickly, these users can easily improve
upon the interface the system is offering them and make better use of the
computer. If such a simple problem as correct handling of tabs et al requires
complex changes to about half of the aspects/functions of the language (see
Otto's proposal, which by the way would be a very good piece of work if it
really WAS necessary to change all that stuff), you are going to end up with a
factory like SNA, VMSES, you name it.

> The newer OS's do not have an external data queue,

Up to this point I fully agree with you.

> they are generally much more case
> sensitive than CMS, they support multi-programming, they will generally
> have GUI human interfaces, they will support a heavily networked
> environment, etc., etc.

Can we please avoid silly arguments about modern vs stone-age systems? Tabs are
far from being a recent invention, and I don't see anything in REXX which would
cause problems on a system like unix where case matters everywhere, unless you
are proposing to make statements and variables case-sensitive, which I
sincerely hope you are not. Multi programming is too difficult to specify in a
system-independent way to directly affect REXX, at least now. Not even C,
FORTRAN or PL/I have standard, system independent multi-programming primitives,
and they are "real languages" (things you write 500k-lines packages in). GUI
(which I will consider "human" the day I suffer from a total nervous breakdown
and start thinking of people as computers I supply input to and receive output
from) and networking issues (RPC et al) have all been addressed in "real"
languages with external libraries, without one change to the language itself.

Many years ago, a CMS programmer wrote a REXX interface to GDDM, IBM's graphics
system (this was well before IBM wrote the GDDM/REXX product). Even though the
languages GDDM had been designed for were FORTRAN, COBOL and PL/I, the
interface worked fine and one could happily call PSF from REXX (PSF is a
high-level menu-driven data display utility, the kind of thing managers can
understand). Frankly, I can't picture REXX coming with a built-in set of GUI
statements, a built-in set of RPC statements, and so on. If that's what you
guys are aiming for, I'm glad I've decided to drift away from REXX.

Eric

Steve Bacher

unread,
Aug 31, 1992, 8:29:00 PM8/31/92
to
In article <1992Aug27...@sejnet.sunet.se>,
er...@sejnet.sunet.se (Eric Thomas) writes:

>... you have no way in the language to halt on just a SPACE when you
>need to do that. ...

There's always

parse var something foo " " bar " " baz

Of course, this only works for data separated by single spaces.
Multiple spaces are still a problem.

Scott Ophof

unread,
Aug 31, 1992, 4:58:09 AM8/31/92
to
On Thu, 27 Aug 1992 22:32:04 PDT Dave Gomberg said:
>...

>A standard mainly tells implementers what to implement, and to a lesser
>extent, users what kind of code to write. If users don't know certain
^^^^
A comment (partly relevant to the subject, but more to *REXX* and
programming in general):
REXX is for me the first *major* step in the direction of "being
able to program withOUT having to translate from a human language to
some form of non-READable 'language'". I hope to see this trend
continue in the future!

Maybe someday someone will combine a REXX subroutine written using
Chinese idiograms where we use IF/THAN/ELSE with another using
Hebrew characters for the keywords and send the result to Eric who
sees the result in plain English (because his computer does an
automatic translation from Hebrew & Chinese), and incorporates it
all into a new function for REXX, which Anders builds into Regina
(of course reading it in Norwegian) and uploads to those little
green men at the Univ of Marsopolis... Portability unimportant???


On Fri, 28 Aug 1992 18:22:23 GMT Eric Giguere said:
>Life is full of compromises. So are standards. I, for one, am disappointed
>in seeing an innocent question about what "blanks" are degenerate into an
>opinionated debate about the proper (or improper) design and mindsets of
>certain operating systems.

Bringing to the fore those design aspects and mindsets is necessary
to get an overview of all relevant aspects of the problem, so we can
work towards a solution which not only encompasses those aspects,
but also creates enough room in whatever solution is generated to
hopefully allow for many probably unexpected future problems.

>That's why I worry sometimes that CMS is perhaps having too much of an
>influence on the way REXX is being standardized. I hope that's not the case.
>...
>to succeed. But it would be sad if following a standard leads to failure.

I *hate* UN*X, manage to tolerate MS-DOS, and *love* CMS.
*REGARDLESS* of these feelings, I think REXX *can* grow to be a
"universally" useable and useful programming laguage, irrespective
of human language of the programmer and opsys.
Now you have my reason for bringing up the blank/REXX/portability
question in the first place.


On Fri, 28 Aug 1992 16:24:57 GMT Jon Schmidt said:
>...


>Here's my little program, written to explore the commercial UNIX
>REXX interpreter's definition and handling of blanks:

...[program deleted]...

Here's what it does on my PC:
Test Unix PC
--------------
1 1 0
2 0 0
3 1 0
4 0 0
5 1 1
6 1 1
7 1 1
8 2 2
9 41 1

Somehow, I don't trust what the PC did... Anyone care to append a
column for other systems?


On Fri, 28 Aug 1992 09:41:50 PDT Dave Gomberg said:
>There are really four questions:
>1. How do we seperate words?
>2. How do we save disk space?
>3. How do we save typist hassle?
>4. How do we make our printed output look nice?
>I claim we ought, on this list, to address only 1. The others belong on
>the operating system, disk management, word processing, and desktop
>publishing lists.

REXX might well be(come) useful outside of point, so my question is
and remains:
What needs to be done to solve the whitespace/blank/space problem
within and between systems where it relates to REXX?


On Thu, 27 Aug 1992 15:09:14 GMT Jerry Campbell said:
>I think its a BIG mistake for an particular interpreter implementation or
>especially the ANSI committee to make any assumptions about the data a
>Rexx program may need to deal with.

First a disclaimer: The following is *not* meant as a negative
reflection re the ANSI-REXX committee!
We can't help but make such assumptions; there are never enough
far-sighted people in a position with enough authority to "ram"
a decision through so that short-term thinkers don't get the chance
to screw up excellent long-term concepts.

>... If you step back from
>this issue a bit and consider the possible uses of Rexx as an interprocess,
>intersystem scripting language I think its required that we not insist
>on "helping" the Rexx programmer port his programs with stuff like builtin
>*automatic* tab to space conversion and such.

> Now, expand the concept to other systems/hosts instead
>of local processes....

Yes, *please*! :-)

I know someone who *insists* that a SPACE be used to separate the
two words of her last name (she's not a unique case)...
Any idea how many programs are Broken As Designed in this respect?
Here's where the "unbreakable space" comes into play.


On Sat, 29 Aug 1992 23:36:31 GMT Eric Thomas said:
>In article <ANDERS.92A...@lise3.lise.unit.no>, and...@lise3.lise.unit.no
>(Anders Christensen) writes:

>>> Parse var data a b . ' A('c')' d
>> You use a Space character in the pattern to denote a whitespace.
>... If "search" functions interpret blanks as "any white
>space", how can I make a search on a binary key which happens to contain a $20?

No way you can then, with REXX the way it is... :-(
I was rather shocked to discover that in certain envirs a literal is
not always a literal, ie. that parts of it *can* be interpreted.
That the "certain envir" is Unix is *TOTALLY* IRRELEVANT!!!
In REXX single & double quotes have the same syntactical meaning; to
quote a literal string, and *nothing* inside those quotes is allowed
to be interpreted.

Question:
How much breakage would occur if the double quotes retained that
meaning, but the single quotes were to allow interpretation to some
extent? In other words:
Parse var data a b . " A("
means "ASCII '20'X followed by 'A('".
But:
Parse var data a b . ' A('
means "one whitespace followed by 'A('".


On Sat, 29 Aug 1992 17:02:32 GMT Eric Thomas said:
>Frankly, I am a bit surprised at the sheer amount of bytes of arguments,
>proposals and counter-proposals that this fairly simple "blanks and whitespace"
>problem has generated.

So am I, but then in the sense that there are *more* aspects to it
all than I had thought possible... I'm glad they were brought up.

> Personally I can't see what is so complicated about it
>that requires proposals of more than 200 lines which touch about each and every
>function in the language, and what is so controversial about making it easier
>for unixers to process tabs et al the way they are used to - without breaking
>anything, of course.

I see a problem in that portability seems to rate less consideration
than I feel it should in this day and age.

>It is a fact that most non-EBCDIC systems use tabs in source code, and won't
>change their minds just because of REXX. So REXX interpreters running on ASCII
>systems should accept tabs as valid white space delimiters in source code.

What happens when John copies this source program to a system that
does NOT recognize tab as whitespace, and gives it to Jack to test,
without telling him it comes from an ASCII system? Syntax error?
Would adding an error message stating use of illegal-whitespace-for-
that-system be a viable solution?

>A tab in a quoted string is a tab, not a blank, just like in C and other unix
>languages. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^
Are you *100%* *SURE* of this?


>My turn to be flamed now :-)

Well.. I'll oblige you if you *really* want it. But let's do it on
alt.flame with a discussion on how to induce the Internet and BITnet
worlds to work *together*... (*EVIL* grin) ;-)


On Sat, 29 Aug 1992 17:06:23 GMT Eric Thomas said:
>I apologize for the previous post where I quoted everything and said nothing -
>I must have hit the wrong key. Here is what I meant to say :-)

Was it an "s" instead of "d"?... >;->>>
Couldn't have been PF4 or PF6... (hehehe)


>In article <ANDERS.92A...@lise3.lise.unit.no>, and...@lise3.lise.unit.no
>(Anders Christensen) writes:

>> Eric Thomas <er...@sejnet.sunet.se> wrote:
...
>No, and in my opinion this portability concern is what keeps you guys off track
>all the time. Portability between different systems is totally irrelevant, the
>only thing that matters is portability between different REXX interpreters on
>the same type of system.

I claim that with a nice set of functions translating the relevant
system commands to a standardised REXX I/O, portability *is* *very*
definitely relevant. An external function package, of course.
And it's high time this matter is addressed, at least more seriously
than up to now...

>> Since EBCDIC
>> don't use Tabs, the Tabs in ASCII are translated into EBCDIC
>> Spaces.
>By the way, ASCII tabs are translated to EBDIC "program tab". XEDIT does
>support tabs, but they are a software concept (controlled by a SET TABS editor
>command, rather than something you define on the setup screen of your
>terminal).

My apologies for not realizing sooner that we had a misunderstanding
here, Anders. It was an automatic assumption on my part that you
knew all about EBCDIC tabs. Sorry.


>> Regexps are very powerful, and definitively more powerful than the
>> ...
>... Regular expressions are not
>precisely intuitive for this type of people :-)

Yes and yes. Anyone interested in writing an RE(string,rexexp,...)
function for the not-casual user of REXX (thus effectively replacing
most of the built-in functions...)?


On Sat, 29 Aug 1992 07:09:37 PDT Dave Gomberg said:
>I hope you don't feel excessively flamed by this, and it is not my
>intent to dispute your right to your opinions, but I was trying to
>figure out WHY I reacted so negatively to your postings. ...

You wanted to send this item privately, right? :-)


A (rather selective) summary up to now:
- The number of whitespace chars in char-sets *will* increase.
- EXPAND(.,.) would solve a lot when REXX is concerned with data.
It won't solve anything when source files are copied to a more
restrictive system (unless those interpreters are updated)...
How about it, IBM, Quercus (Commodore?)?
- A literal is a literal is a literal. Or is it?... Can there
be an extension/enhancement to allow a limited "maybe"?
- Let REXX be "open-minded" on reading data, and predictable in
output.
- Not much interest in overall portability. :-(


Sorry for the length. ;-) (but there's a lot of whitespace!)


Regards.
$$/

Eric Thomas

unread,
Sep 1, 1992, 12:52:41 PM9/1/92
to
In article <920831085...@SERVER.uwindsor.ca>, Scott Ophof <op...@SERVER.UWINDSOR.CA> writes:
> Question:
> How much breakage would occur if the double quotes retained that
> meaning, but the single quotes were to allow interpretation to some
> extent? In other words:
> Parse var data a b . " A("
> means "ASCII '20'X followed by 'A('".
> But:
> Parse var data a b . ' A('
> means "one whitespace followed by 'A('".

A lot of breakage would occur, I know people who use double quotes all the time
(because they are used to C) while I always use single quotes, except that I
might use double quotes for a message text which contains a single quote, and I
also use double quotes for INTERPRET.

Regardless of compatibility with existing programs and of the fact that it
would require major reworks of the internals of the interpreters, it is simply
impossible to do without violating one of the cornerstones of REXX (the single
data type) and turning the language into a factory. Consider:

Pos(" A(",text) -> literal
Pos(' A(',text) -> interpreted
Pos("I'm a" 'double quote (")',text) -> ???
Pos(Pattern(a,b),text) -> ???
a = " A("; Pos(a, text) -> ???

The list is endless. REXX has no data type, so the interpreter cannot know what
kind of literal type an argument has.

>>It is a fact that most non-EBCDIC systems use tabs in source code, and won't
>>change their minds just because of REXX. So REXX interpreters running on ASCII
>>systems should accept tabs as valid white space delimiters in source code.
>
> What happens when John copies this source program to a system that
> does NOT recognize tab as whitespace, and gives it to Jack to test,
> without telling him it comes from an ASCII system?

Jack does XEDIT FOO REXX, sets tabs to his liking, EXPAND * and FILE. That's
one answer. Another answer is that it doesn't matter, since John's unix program
is extremely unlikely to run under VM or MVS anyway. I think I own 2 programs
which will work on any system, one makes calculations on dates and the other
parses mail headers. No wait, the second one requires Pull/Queue, so it won't
work everywhere.

>>A tab in a quoted string is a tab, not a blank, just like in C and other unix
>>languages. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^
> Are you *100%* *SURE* of this?

main()
{
char *s;
s = " "; /* tab */
printf("%X\n", *s);
return(1);
}

Prints 9. Compiled with GCC, not a DEC compiler you could accuse of being
un-unixish. The author of GCC refers to VMS as "Vomit Making System" and can
hardly be accused of admiring DEC conventions and standards.

> I claim that with a nice set of functions translating the relevant
> system commands to a standardised REXX I/O, portability *is* *very*
> definitely relevant.

Ok Scott, say I give you a unix playstation with a copy of the source code for
LISTSERV (25-30k lines of REXX), and pay you by the hour to make it work under
unix. How much money will I save if I give you a REXX interpreter to make the
conversion easier?

The answer is I'll lose money, because it will take you about 1-2 months to
realize it is much faster to rewrite everything in C than to try to reuse the
REXX code with the interpreter.

Eric

Edward T Spire

unread,
Sep 1, 1992, 12:18:42 PM9/1/92
to
SCH...@LSTC2VM.stortek.com (Jon Schmidt) writes:
: I come from CMS and just recently began using a commercial REXX

: interpreter under UNIX. The manual for this interpreter always
: uses words "blank" and "blanks" but never suggests that a blank
: might be anything other than a space character.

Yes, this is a documentation problem that will be resolved. Thanks.

:My meager UNIX


: experience had shown than there are some text files where TAB and
: SPACE characters are not "equivalent", such as makefiles. I was
: therefore astonished to find that the UNIX REXX expression
: ('09'X='20'X) yielded a "true" result.

This happens because the normal comparison operators trim blanks
(interpreted as isspace() true in unix) from the strings to be
compared.

: I later wrote a little


: UNIX REXX program (see below) to investigate this anomalous (to me)
: phenomenon. I was shocked to discover that this UNIX REXX
: interpreter considered 41 of the 256 possible 8-bit codes in the
: range '00'X to 'FF'X to be blanks! 41? Where did this come from?

I ran your program on an RS/6000 and got 6 white space characters,
not 41. Then I ran it on a older Sun and got 48! On a newer Sun
I got 7, on an HP I got 10, and on SCO I got 44.

I used the following program to investigate further...

'uname' /* document Unix variant in use */
count=0
do i=0 to 255
if d2c(i)=' '
then do
say d2x(i) 'is white space'
count=count+1
end
end
say 'total number of white space characters is' count

Here are the various outputs with some annotation...

AIX aix 3.2
9 IS WHITE SPACE ascii tab
A IS WHITE SPACE line feed
B IS WHITE SPACE vertical tab
C IS WHITE SPACE form feed
D IS WHITE SPACE carraige return
20 IS WHITE SPACE space
total number of white space characters is 6

SunOS SunOS 4.1
9 IS WHITE SPACE
A IS WHITE SPACE
B IS WHITE SPACE
C IS WHITE SPACE
D IS WHITE SPACE
20 IS WHITE SPACE
83 IS WHITE SPACE Older SunOS's are not well defined
84 IS WHITE SPACE with respect to national language support
85 IS WHITE SPACE and the usage of the high part of the
87 IS WHITE SPACE ASCII character set.
88 IS WHITE SPACE
89 IS WHITE SPACE
8A IS WHITE SPACE
8C IS WHITE SPACE
8D IS WHITE SPACE
8F IS WHITE SPACE
91 IS WHITE SPACE
94 IS WHITE SPACE
96 IS WHITE SPACE
97 IS WHITE SPACE
9A IS WHITE SPACE
9F IS WHITE SPACE
A9 IS WHITE SPACE
AB IS WHITE SPACE
AC IS WHITE SPACE
AD IS WHITE SPACE
B4 IS WHITE SPACE
B7 IS WHITE SPACE
BC IS WHITE SPACE
BF IS WHITE SPACE
C3 IS WHITE SPACE
C4 IS WHITE SPACE
C5 IS WHITE SPACE
C6 IS WHITE SPACE
CE IS WHITE SPACE
D2 IS WHITE SPACE
D7 IS WHITE SPACE
D9 IS WHITE SPACE
DD IS WHITE SPACE
E3 IS WHITE SPACE
E5 IS WHITE SPACE
E6 IS WHITE SPACE
E8 IS WHITE SPACE
E9 IS WHITE SPACE
EB IS WHITE SPACE
F2 IS WHITE SPACE
F5 IS WHITE SPACE
F6 IS WHITE SPACE
total number of white space characters is 48

SunOS SunOS 4.2
9 IS WHITE SPACE
A IS WHITE SPACE
B IS WHITE SPACE
C IS WHITE SPACE
D IS WHITE SPACE
20 IS WHITE SPACE
F7 IS WHITE SPACE
total number of white space characters is 7

scotty SCO 'uname' returns hostname instead of OS name (sigh)
9 IS WHITE SPACE
A IS WHITE SPACE
B IS WHITE SPACE
C IS WHITE SPACE
D IS WHITE SPACE
20 IS WHITE SPACE
84 IS WHITE SPACE Probably the same story as SunOS
88 IS WHITE SPACE
8C IS WHITE SPACE
8F IS WHITE SPACE
90 IS WHITE SPACE
94 IS WHITE SPACE
98 IS WHITE SPACE
9B IS WHITE SPACE
9F IS WHITE SPACE
A4 IS WHITE SPACE
A7 IS WHITE SPACE
AB IS WHITE SPACE
AF IS WHITE SPACE
B0 IS WHITE SPACE
B3 IS WHITE SPACE
B7 IS WHITE SPACE
BB IS WHITE SPACE
BF IS WHITE SPACE
C3 IS WHITE SPACE
C8 IS WHITE SPACE
CB IS WHITE SPACE
CF IS WHITE SPACE
D4 IS WHITE SPACE
D7 IS WHITE SPACE
D8 IS WHITE SPACE
DB IS WHITE SPACE
DD IS WHITE SPACE
E2 IS WHITE SPACE
E5 IS WHITE SPACE
E7 IS WHITE SPACE
E9 IS WHITE SPACE
EB IS WHITE SPACE
EF IS WHITE SPACE
F1 IS WHITE SPACE
F3 IS WHITE SPACE
F7 IS WHITE SPACE
F9 IS WHITE SPACE
FB IS WHITE SPACE
total number of white space characters is 44

HP-UX
9 IS WHITE SPACE
A IS WHITE SPACE
B IS WHITE SPACE
C IS WHITE SPACE
D IS WHITE SPACE
20 IS WHITE SPACE
E5 IS WHITE SPACE your guess is as good as mine...
EA IS WHITE SPACE
ED IS WHITE SPACE
EE IS WHITE SPACE
total number of white space characters is 10

Unix variants have not been well standardized in the past. The newer
versions (AIX 3.2, SunOS 4.2) seem pretty well behaved with respect
to the issue at hand.

-Ed

Edward T Spire

unread,
Sep 2, 1992, 12:20:52 PM9/2/92
to
er...@sejnet.sunet.se (Eric Thomas) writes:
: Ok Scott, say I give you a unix playstation with a copy of the source code for

: LISTSERV (25-30k lines of REXX), and pay you by the hour to make it work under
: unix. How much money will I save if I give you a REXX interpreter to make the
: conversion easier?
:
: The answer is I'll lose money, because it will take you about 1-2 months to
: realize it is much faster to rewrite everything in C than to try to reuse the
: REXX code with the interpreter.

1. OS macros written in REXX (and other programs that are mostly OS
commands) are hard to port. However...

2. REXX is also used as a general purpose programming language. Such
programs are much easier to port. And...

3. REXX is used as a macro language for other applications (XEDIT
and ISPF come to mind) that have themselves been "ported". XEDIT macros
ported from CMS to Unix port very nicely indeed (since the primary
addressible environments are very similar). Finally...

4. Even if you did need to essentially re-implement a large OS macro
on Unix, doing it in REXX again may be your best choice. You cannot be
as productive in C or PERL (unless you code these languages all day
long every day, and even then I doubt it...) Hence there may well be
some rexx code that is retained.

REXX certainly adds value to new environments, and we should think
seriously about how it will best fit into those new environments.
Although we may not all agree, I believe we are all rapidly headed
towards being involved in new computing platforms.

-Ed

Paul Gilmartin

unread,
Sep 2, 1992, 5:28:25 PM9/2/92
to
Edward T Spire (e...@wrkgrp.COM) wrote:

: 3. REXX is used as a macro language for other applications (XEDIT

: and ISPF come to mind) that have themselves been "ported". XEDIT macros
: ported from CMS to Unix port very nicely indeed (since the primary
: addressible environments are very similar). ...

I wouldn't quite say "very nicely". Here are the problems I encountered
trying to convert some of my XEDIT macros from CMS to uni-XEDIT/uni-REXX:

extract /uniqueid doesn't work

extract /size doesn't work

extract /lscreen doesn't work

extract /update doesn't work

extract /terminal doesn't work

extract /cursor doesn't return cursor.5 ... cursor.8

extract /ring doesn't work

Search for external functions is case-sensitive

command pfile and command pquit not defined

command line is not stacked on entry to PF key macro. This has been
an undocumented feature of CMS XEDIT. However, in response to my
Reader's Comment Form ES592, IBM has agreed the feature needs to be
documented. As soon as the feature appears in IBM doc, I'll report
it as a defect to TWG.

Eric Thomas

unread,
Sep 2, 1992, 8:33:50 PM9/2/92
to
In article <1992Sep2.1...@wrkgrp.COM>, e...@wrkgrp.COM (Edward T Spire) writes:
> 2. REXX is also used as a general purpose programming language. Such
> programs are much easier to port. And...

True. I own one such program, and maybe a thousand that won't work on anything
but CMS. REXX is fine for prototyping or non CPU intensive calculations, but as
soon as you're talking about real math-type programs, REXX is simply out of
question. By the time the interpreter finishes your job, you will have had time
to convert it to FORTRAN (or PASCAL, C, whatever language you know best).

> 3. REXX is used as a macro language for other applications (XEDIT
> and ISPF come to mind) that have themselves been "ported". XEDIT macros
> ported from CMS to Unix port very nicely indeed (since the primary
> addressible environments are very similar).

No question here. But wouldn't the unix version of XEDIT exhibit the same kind
of behaviour as the CMS version? Or does it, too, use tabs in the data returned
by (say) EXTRACT?

> 4. Even if you did need to essentially re-implement a large OS macro
> on Unix, doing it in REXX again may be your best choice. You cannot be
> as productive in C or PERL (unless you code these languages all day
> long every day, and even then I doubt it...)

I hope this is a joke. I don't want to make a 300-lines post explaining why
REXX is unsuitable for large applications, but believe me, you spend more time
reaching your business goals (especially in the area of performance) than you
might ever possibly waste rewriting all the REXX library functions in C and
then keying in the extra keystrokes C requires. Note that I despise C and will
go a LONG way to avoid having to use it.

Eric

Dave Gomberg

unread,
Sep 1, 1992, 6:01:06 PM9/1/92
to
On Tue, 1 Sep 1992 16:18:42 GMT Edward T Spire said:
>: I later wrote a little
>: UNIX REXX program (see below) to investigate this anomalous (to me)
>: phenomenon. I was shocked to discover that this UNIX REXX
>: interpreter considered 41 of the 256 possible 8-bit codes in the
>: range '00'X to 'FF'X to be blanks! 41? Where did this come from?
>
>I ran your program on an RS/6000 and got 6 white space characters,
>not 41. Then I ran it on a older Sun and got 48! On a newer Sun
>I got 7, on an HP I got 10, and on SCO I got 44.

Now there is an operating system you can really sink your teeth into!

Turgut Kalfaoglu

unread,
Sep 3, 1992, 8:38:01 AM9/3/92
to
Btw, there is now an Rexx->C converter being beta tested (on the users!)
for OS/2 2.0.. It's called RexxTacy.. From what I saw, it's a complete
library of Rexx functions, and some 'mapper' code that translates your
rexx code. I should say "it will be complete." It clearly needs more
work before large things can be ported with it.

PS: Available from ftp-os2.nmsu.edu, among other places.

-turgut

Turgut Kalfaoglu

unread,
Sep 3, 1992, 9:56:21 AM9/3/92
to
Does anyone know if there is a profiler available for Rexx? (one that
would show how often functions are called, how much time was spent in
each one, etc..) I found a such thing for C, and I was really impressed
with it. Like Eric, I have to maintain a rather large, but similar :)
tool. -turgut

Scott Ophof

unread,
Sep 3, 1992, 4:52:48 AM9/3/92
to
On Tue, 1 Sep 1992 16:18:42 GMT Edward T Spire <e...@WRKGRP.COM> said:
>SCH...@LSTC2VM.stortek.com (Jon Schmidt) writes:
>: ...

>: I later wrote a little
>: UNIX REXX program (see below) to investigate this anomalous (to me)
>: phenomenon. I was shocked to discover that this UNIX REXX
>: interpreter considered 41 of the 256 possible 8-bit codes in the
>: range '00'X to 'FF'X to be blanks! 41? Where did this come from?
>I ran your program on an RS/6000 and got 6 white space characters,
>not 41. Then I ran it on a older Sun and got 48! On a newer Sun
>I got 7, on an HP I got 10, and on SCO I got 44.
...[program & sub-results deleted]...

>Here are the various outputs with some annotation...
>AIX aix 3.2
>total number of white space characters is 6
>SunOS SunOS 4.1
>total number of white space characters is 48
>SunOS SunOS 4.2
>total number of white space characters is 7
>scotty SCO 'uname' returns hostname instead of OS name (sigh)
>total number of white space characters is 44
>HP-UX
>total number of white space characters is 10

Running Jon Schmidt's program on an Amiga gave the same results as
my test of it on the PC. Thanks, Bill Hogsett, for passing on the
result of your test.
Are there any systems which returned such surprising results as
SunOS-4.1 and SCO? For example OS2/Cyber/VMS/MacIntosh?


Regards.
$$/

Scott Ophof

unread,
Sep 3, 1992, 4:55:26 AM9/3/92
to
On Tue, 1 Sep 1992 16:52:41 GMT Eric Thomas <er...@SEJNET.SUNET.SE> said:
>In article <920831085...@SERVER.uwindsor.ca>, Scott Ophof
><op...@SERVER.UWINDSOR.CA> writes:
>> Question:
>> How much breakage would occur if the double quotes retained that
>>...

>> Parse var data a b . " A("
>> means "ASCII '20'X followed by 'A('".
>> Parse var data a b . ' A('
>> means "one whitespace followed by 'A('".
>A lot of breakage would occur, I know people who use double quotes all the time
>...

Yes, I was already afraid of this, without even considering the
only-one-datatype basic mindset of REXX. :-(
Maybe it's a failing on my part, but I've never seen a need for more
than one datatype, even when it leads to its own unique astonishment
factor in REXX... (wry grin)


>>>A tab in a quoted string is a tab, not a blank, just like in C and other unix
>>>languages. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> ^^^^^^^^^
>> Are you *100%* *SURE* of this?
>main()

>...
>Prints 9.

Your program is a nice way to emulate the C2X() function in C. :-)
What I meant is the difference between the meaning of single and
double quotes in (at least) the C-shell, which (as I have been
informed) do *not* have identical meanings... Maybe I shouldn't
have taken the C-shell syntax and conventions as a language?


>> I claim that with a nice set of functions translating the relevant
>> system commands to a standardised REXX I/O, portability *is* *very*
>> definitely relevant.

>Ok Scott, say I give you a unix playstation with a copy of the source code for


>LISTSERV (25-30k lines of REXX), and pay you by the hour to make it work under
>unix. How much money will I save if I give you a REXX interpreter to make the
>conversion easier?

Um.. If you were to pay me significantly less "under the table",
you'd have a nice tax shelter... (hey, I'm joking here!) :-)

>The answer is I'll lose money, because it will take you about 1-2 months to
>realize it is much faster to rewrite everything in C than to try to reuse the
>REXX code with the interpreter.

Agreed re existing applications which are heavily based on the
environment they were originally developed in, without real regard
to portability. Note that this is a *general* statement.

Portability in the future seems to depend on creating interfaces
(REXX function packages) that *will* allow the (casual) programmer
to write in a non-opsys-specific manner. Whitespace/blank/space is
just one of the things that need to be resolved in such a way that
the chance of portability is increased as much as possible.
And I don't think I need to add anything to Ed's 4 points why the
point of portability *is* important.

As to the whitespace/blank/space issue (and the likes), are there
any points which haven't been mentioned, but (may) need to be
considered in the light of *portability*? Like other character
(classes)?


Regards.
$$/

Scott (nmi) Mattes

unread,
Sep 3, 1992, 10:42:52 AM9/3/92
to

There is one for CMS. It is HISTOREX and is available on VM-UTIL or
via FTP at KSUVM in the VMTOOLS directory as file WORK53 VMARC.

--------------------------------------+----------------------------------------
Scott (nmi) Mattes | No success can compensate for failure
Work: COT...@SEA04VM.NAVSEA.Navy.Mil | in the home. David O. McKay, Prophet
Home: 73027...@CompuServe.COM | and President of the Church of Jesus
Voice: (703) 769-2917 | Christ of Latter-day Saints, 1951-70

Mike Meyer

unread,
Aug 29, 1992, 2:37:35 AM8/29/92
to
In <920826032...@SERVER.uwindsor.ca>, op...@SERVER.UWINDSOR.CA (Scott Ophof) wrote:
> Question: When are blanks not only the space character?
> Answer: In any case in Unix.
>
> My point?
> I would hate to have to port to CMS any REXX program written for
> Unix (or PC); to have a program fail due to something like this
> would not be very easy to debug...

You're right. Of course, finding the failure because application A
accepts tabs for whitespace but Rexx doesn't isn't easy to debug,
either. That's experience - ARexx doesn't treat tab as whitespace,
but it generally is on the Amiga. As a result, I universially run text
files through a detab program before letting ARexx read them.

> My suggestion?
> In the interest of increasing the chance of successful porting, to
> request the ANSI-REXX committee to define that the *only* blank/
> whitespace recognized in standard REXX is the SPACE character (ASCII
> hex-20, EBCDIC hex-40).
>
> Your comments? :-)

I disagree. What "blanks" are should be system/compiler-dependent.
Then people can choose the quality of implementation they desire.

<mike

Edward T Spire

unread,
Sep 4, 1992, 11:46:21 AM9/4/92
to
er...@sejnet.sunet.se (Eric Thomas) writes:
: In article <1992Sep2.1...@wrkgrp.COM>, e...@wrkgrp.COM (Edward T Spire) writes:
: > 3. REXX is used as a macro language for other applications (XEDIT

: > and ISPF come to mind) that have themselves been "ported". XEDIT macros
: > ported from CMS to Unix port very nicely indeed (since the primary
: > addressible environments are very similar).
:
: No question here. But wouldn't the unix version of XEDIT exhibit the same kind
: of behaviour as the CMS version? Or does it, too, use tabs in the data returned
: by (say) EXTRACT?

Not normally, but if you were editing a makefile that had tabs as part
of the data lines and you extracted a data line...

: > 4. Even if you did need to essentially re-implement a large OS macro


: > on Unix, doing it in REXX again may be your best choice. You cannot be
: > as productive in C or PERL (unless you code these languages all day
: > long every day, and even then I doubt it...)
:
: I hope this is a joke. I don't want to make a 300-lines post explaining why
: REXX is unsuitable for large applications, but believe me, you spend more time
: reaching your business goals (especially in the area of performance) than you
: might ever possibly waste rewriting all the REXX library functions in C and
: then keying in the extra keystrokes C requires. Note that I despise C and will
: go a LONG way to avoid having to use it.

I am not joking, nor are the other folks who choose this route.
Hardware performance bottlenecks are becoming less important as
price/performance ratios change. Note that I'm writing this on a
25 MIP RS/6000 that is essentially a single user system.

-Ed

Edward T Spire

unread,
Sep 4, 1992, 12:24:19 PM9/4/92
to
p...@sanitas.stortek.com (Paul Gilmartin) writes:

OK, "nicely", not "very nicely". When the "cloning" process is
completed, then it **will** be "very nicely".

Of course, some of these things don't port cleanly. Terminal, for
example. Case considerations for external functions is another sticky
issue.

-Ed

Dave Gomberg

unread,
Sep 6, 1992, 8:56:46 PM9/6/92
to
Could you show the code you find to be unobvious as to interpretation?

PARSE EXPAND VALUE 'ab›tcd' WITH what?

To me this means (assuming in the particular OS that ab›tcd is the same
as ab cd (tab to column 9)

PARSE EXPAND VALUE 'ab cd' WITH the same thing. Dave

Charles R. Martin

unread,
Sep 6, 1992, 6:34:14 PM9/6/92
to
In article <REXXLIST%9209051...@UGA.CC.UGA.EDU> GOM...@UCSFVM.BITNET (Dave Gomberg) writes:

On Sat, 5 Sep 1992 17:50:20 GMT Charles R. Martin said:
>1. I don't think any sensible interpretation of pos('a b'...) would
> permit it to match on either 'a<blank>b' or 'a<tab>b'. I'd certainly
> consider such an implementation buggy.

I am glad I don't run an implementation you wrote. I would sue an
implementer who would ever return something besides 1 from pos(x,x),
even if x had blanks in it (unless, of course, x was the null
string). Why would they even matter? Have you read the definition
of pos?

Um, are you sure that you read what *I* wrote? In English, "either ...
or" is *inclusive* or. What I said is that I would be displeased by an
implementation of pos which matched "a b" with either "a<space>b", OR
"a<tab>b". In other words, I'd expect pos("a b", "a b") to return 1 and
pos("a b","a b") (that's a tab character in there) to return 0.


--
Charles R. Martin/(Charlie)/mar...@cs.unc.edu/(ne c...@cs.duke.edu)
O/Dept. of Computer Science/CB #3175 UNC-CH/Chapel Hill, NC 27599-3175
H/3611 University Dr #13M/Durham, NC 27707/(919) 419 1754
----------------------------------------------------------------------
"I am he who walks the States with a barb'd tongue, questioning every
one I meet,/Who are you that wanted only to be told what you knew
before?/ Who are you that wanted only a book to join you in your
nonsense?" _Leaves of Grass_ xxiii.4.

Dave Gomberg

unread,
Sep 5, 1992, 3:56:47 PM9/5/92
to
On Sat, 5 Sep 1992 15:44:05 -0400 cultural elite said:
>PARSE EXPAND is not such a simple solution, as in some situations it will
>require tracking both the expanded and unexpanded copies of input lines.

No it wouldn't. Why would it? PARSE EXPAND VALUE psrc WITH template means
the same as $temp=expand(psrc); PARSE VAR $temp template , right?

Mike Meyer

unread,
Aug 30, 1992, 2:49:29 AM8/30/92
to
In <1992Aug26....@hou.amoco.com>, zjl...@hou.amoco.com (Jerry Campbell) wrote:
> Just an observation from someone whose done a LOT of Rexx programming
> under CMS and not a whole lot under Unix. The commonly accepted "style"
> (at least in my shop) for constructing data streams to feed to Rexx or
> for interpreting system information (msgs, cmd output, what not) many
> times leads to code such as this:
>
> Parse Var sometin 1 parm1 +4 5 parm2 +4 ....
>
> All very byte position oriented. The programmer expects to intepret things
> in terms of "card columns". Many programs are written with that kind of
> dependency on CMS. Unix seems to require a different conceptualization
> of data streams.

Quite right - unix programs use whitespace to seperate output items,
and expect whitespace to seperate input items. Trying to use stock
Unix tools on position-oriented data with no seperation is
frustrating, at best.

<mike

Eric Thomas

unread,
Sep 5, 1992, 10:26:54 PM9/5/92
to
In article <920905194...@lns598.TN.CORNELL.EDU>, cultural elite <d...@LNS598.TN.CORNELL.EDU> writes:
> I agree, that would be ridiculous. Not even Unix (with the possible exception
> of a few broken utilities) behaves that way.

With the possible exception of, for instance, scanf(), the C formatted read
routine. Ever tried to use scanf() to read a one-word reply from the terminal,
and use a default value if the user just hits RETURN? scanf() insists that all
these carriage returns are just white space which should be happily ignored,
and the user wonders what is going on - the more he hits RETURN, the more
nothing happens. Yes, I know, there are ways out, but the least you can say is
that they are unintuitive...

> Mike's question was specifically about word separators, which
> I take to mean the behavior of parse when no explicitly quoted separators are
> given, and the behavior of word() and words() and a few other intrinsics.
> Accepting any whitespace as word separators would have no effect on your
> example, or any other involving explicitly quoted blanks. Where is the problem
> with that?

The problem is that if you want to change the language definition of all these
functions, you have to come up with a solution that is clearly defined and
works for all systems, not just unix. One such solution, which is what I
understood all the previous postings were proposing, is that positional PARSE
et al are to treat any character for which isspace() returns TRUE as white
space.

Under VM, isspace() is true for X'05' and X'15'. It is logical for isspace() to
be true for these characters, since they are the EBCDIC equivalents of TAB and
CRLF, respectively; as far as I know, all the C compilers for VM agree on this.
I'm not really eager to find out what happens to my tens of thousands of lines
of code the day X'15' is suddenly treated as a blank. It is used as a
line-separator in functions like DIAG, and it is common programming practice to
use it as a line separator when returning more than one line of (non-binary)
data from a function. I suppose a similar situation exists for \n on unix
systems, but I've never used REXX on unix so I wouldn't know.

Eric

Charles R. Martin

unread,
Sep 5, 1992, 1:50:20 PM9/5/92
to
I hate to throw a hint of reason into an otherwise promising flame war,
but I'm not at all clear on what the problem is.

1. I don't think any sensible interpretation of pos('a b'...) would
permit it to match on either 'a<blank>b' or 'a<tab>b'. I'd certainly
consider such an implementation buggy.

2. On the other hand, I don't quite see the problem in say parsing a
line with 'parse <source> foo bar' as separating the foo and bar
substrings with arbitrary whitespace. That behavior would both fit
standard conventions in widely-used interactive environments AND
seems consistent with other rexx conventinos. (After all, even a new
rexx programmer like me has noticed that parse doesn't care if it is
only one blank space or several between the substrings. Nor would
one want it to be different.)

3. The use of tabs vs spaces in other systems is very generally such
that columnar counts count a TAB char as a SINGLE character, i.e.
"a\tb" is a three character string in C. It more or less must be
that way since the 'padded' interpretation of a tab is dependent on
the context -- the tab in "a\tb" isn't the same apparent width as the
tab in "ab\tc", and the problem only gets worse in a proportional
font.

So what's the problem with

parse arg foo bar

parsing with the blank being an assumed regular expression of [ \t]+,
but with columnar counts remaining as they always have been? You could
always parse with foo' 'bar if you want to insist on blanks by ghod.

Dave Gomberg

unread,
Sep 7, 1992, 1:32:37 AM9/7/92
to
Well, the Unix folks have rather dug themselves a hole on that one,
haven't they, Charlie. It's not just a question of parse, what about
substr, or left/right. They will have the same problem. I guess we
could say that just because c=a||b doesn't mean expand(c)=expand(a)||
expand(b). But we knew that already. And your parse example is just
this problem in another guise. In fact, the Unix folks have a worse
problem. You can have c=a||b, but pos(a,c)=0! Wow! I don't think
I want that in MY os. But there is no accounting for taste! I would
really like Anders (I assume the first letter is in fact capitalized)
and other Unix types to say if they can stand it that:

c=a||b; say pos(a,c)

would say 0? It is not what I mean by REXX. Sorry this rambles so. Dave

Dave Gomberg

unread,
Sep 7, 1992, 1:28:31 AM9/7/92
to
OK, Charlie. I get your point. But the Unix folks claim they need to
match the tab. Some of us say the solution is to replace the tab by
n blanks where n is some number they decide is right in context. But
of course, 'a b' does not occur in 'a b'. So I suppose how many blanks
they decide to expand the tab to is critical in the case of pos. But for
WORD/WORDS and PARSE with a variable list, it may not matter. I guess we
need a better feeling for what Unix folks think they need. Being a PC
folk myself, I don't much care, but we should try to help if we can.

cultural elite

unread,
Sep 5, 1992, 3:44:05 PM9/5/92
to
Eric Thomas <er...@SEJNET.SUNET.SE> writes:
>In article <mwm....@contessa.palo-alto.ca.us>, m...@contessa.palo-alto.ca.us
(Mike Meyer) writes:
>> A question for all of you using EBCDIC who don't think that arbitrary
>> whitespace should be allowed to seperate words: Why do you care?
>
>I have to use unix systems from time to time to get my job done. If some day
>they have a REXX interpreter, I would like to use it to make my job easier and
>for that it has to behave predictably. If I code POS('A B',string), I want it
>to locate A-blank-B, not A-tab-B. I have said that 200 times already, maybe you
>should read more carefully before you flame?

I agree, that would be ridiculous. Not even Unix (with the possible exception

of a few broken utilities) behaves that way. But is anyone actually advocating
that position? Mike's question was specifically about word separators, which


I take to mean the behavior of parse when no explicitly quoted separators are
given, and the behavior of word() and words() and a few other intrinsics.
Accepting any whitespace as word separators would have no effect on your
example, or any other involving explicitly quoted blanks. Where is the problem
with that?

>> A similar problem shows up in the ANSI C standard. It has this idiotic
>> restriction that external identifiers must be unique in the first six
>> characters, case insensitive. I don't know of _anybody_ who liked
>> this, but there was a very popular system which had a standard linker
>> with that antiquated restriction. Removing this brain-dead restriction
>> meant that you couldn't write an ANSI C compiler for that system,
>> which was considered a bad thing - so the restriction stayed in.
>
>In case I interpreted the snide remark correctly

Why were you looking for a snide remark? It was an illustrative example.
I don't remember reading any requirement that illustrative examples be
(a) snide or (b) only about EBCDIC systems.

>It will make REXX hard to use for people who, for religious reasons, refuse to
>type PARSE EXPAND instead of PARSE wherever necessary. That is their problem,
>as far as I am concerned; I have no such religious diktat.

PARSE EXPAND is not such a simple solution, as in some situations it will
require tracking both the expanded and unexpanded copies of input lines.

--
Dan Riley Internet: d...@lns598.tn.cornell.edu
Wilson Lab, Cornell University HEPNET/SPAN: lns598::dsr (44630::dsr)
"Maybe, leastways is the best way of all" -Caterwaul

Charles R. Martin

unread,
Sep 6, 1992, 6:39:49 PM9/6/92
to
In article <REXXLIST%9209051...@UGA.CC.UGA.EDU> GOM...@UCSFVM.BITNET (Dave Gomberg) writes:

I think the problem lies in repeated applications of parse, say with
columnar fields. Say I parse the string "ab\tc" -- importing a little C
terminology to make clear where the tab lies -- with parse expand. The
tab char should come out to be some number of blanks, as I understand
it. (Probably 6 for canonical tabs.) But if I parse off the first
field, THEN parse the resulting "\tc", the tab now wants to expand to 8
characters. You can't keep the two interpretations straight without
storing information about the line as it was for later use by parse.

Eric Thomas

unread,
Sep 5, 1992, 9:47:20 AM9/5/92
to
> In <1992Aug27...@sejnet.sunet.se>, er...@sejnet.sunet.se (Eric Thomas) wrote:
>> Ok, time for the usual stupid question. Say I have a program that does:
>>
>> Parse var line ':'tagname'.'value' :'line
>>
>> Say I run that program on an ASCII system which recognize the 20-odd types of
>> blanks Otto showed in his posting. Where does my 'value' variable end, when the
>> interpreter encounters a SPACE followed by colon (ancient, despicable, evil
>> EBCDIC-type behaviour, surely that must not be the case), or also when it
>> encounters TAB followed by colon (nice, modern ASCII-type behaviour which
>> happens to break the program because that is not what the programmer wanted
>> and thought the interpreter would do).
>
> What you want it to be is whatever whoever generated the input thought
> it would be.

Ah yes, the interpreter uses its divinatory powers to read the mind of whoever
generated the input? Seeing a literal of ' :', it thinks "Aww, this must be one
of these NAMES file IBM thingies, let's stop on a blank only". Seeing ': ', it
thinks "Ah, this is clearly someone parsing a RFC822 header, let's stop on any
white space".

> I.e. - I have to accept both space and tab as space, because the
> standards document I'm using _says_ I do.

Really? Damn. I have all this mail software in REXX on my VM machine where the
interpreter only stops on blanks, how can I possibly have made it respect the
RFC? It must have cost me thousands of lines of code. Let me check it. Oh, it
looks like all it took was a function call on ONE line out of several thousand,
the one in the loop that reads the incoming header and calls the parser! But to
think I had to key in an extra 30 characters when THE SYSTEM should have known
what I meant to do and done it for me :((((((( I will petition the ANSI REXX
committee to change the language so I never have to do that again!!!!!!

Anyway, I understand the issue much better now: we're back to the usual
religious arguments. People like Mike will never accept a solution which they
perceive to be un-unixish. Well as Mike pointed out, I don't really give a damn
since I'm moving away from REXX, but I find this both funny and saddening. REXX
is intrinsically un-unixish: identifiers and statements are not case sensitive,
the language is verbose, very few special characters are used, strings are not
terminated by \0, the default value of a variable is its name in UPPER case,
and so on. REXX has its own personality, and it can never become unixish no
matter how hard you try. So I find it strange that people who want to unixify
REXX even bother, when there are a lot of other unix scripting languages to
choose from. On the other hand, I am not really surprised - the spirit of the
crusades will live forever, I suppose this is human nature.

Eric

Charles R. Martin

unread,
Sep 6, 1992, 10:42:09 PM9/6/92
to
In article <REXXLIST%9209062...@UGA.CC.UGA.EDU> GOM...@UCSFVM.BITNET (Dave Gomberg) writes:

Could you show the code you find to be unobvious as to interpretation?

PARSE EXPAND VALUE 'abtcd' WITH what?

To me this means (assuming in the particular OS that abtcd is the same


as ab cd (tab to column 9)

PARSE EXPAND VALUE 'ab cd' WITH the same thing. Dave

The problem would come with multiple applications. Now, be kind on
technical syntax here (um, is that spelled syntaxx for rexx?) because
it's a new language for me and I don't have a manual at hand. But
consider

parse expand value 'ab\tcd' with fld_a 01 02 rest_a

where you should interpret the "fld_a 01 02" as the appropriate field
description for a two column field taking the first two characters. I
think we're agreed that the result should be

fld_a="ab"
rest_a=" cd" (modulo me counting blanks right on that line.)

so that if we did

parse expand value 'ab\tcd' with line_a

then we'd expect line_a to equal fld_a||rest_a

But what if we instead

parse value 'ab\tcd' with fld_b 01 02 rest_b

and follow that with

parse expand value rest_b with line_b

so that the expansion happens in the second parse and not the first?
How many blanks must be added to replace the TAB character? In the
first case, we added 6 characters -- but to preserve the meaning of TAB
as 'tab to column 9' we must add *8* characters in the second; suddenly
we find that line_b DOES NOT equal fld_b||rest_b, which is aberrant.

Eric Thomas

unread,
Sep 5, 1992, 9:24:40 AM9/5/92
to
> A question for all of you using EBCDIC who don't think that arbitrary
> whitespace should be allowed to seperate words: Why do you care?

I have to use unix systems from time to time to get my job done. If some day
they have a REXX interpreter, I would like to use it to make my job easier and
for that it has to behave predictably. If I code POS('A B',string), I want it
to locate A-blank-B, not A-tab-B. I have said that 200 times already, maybe you
should read more carefully before you flame?

> A similar problem shows up in the ANSI C standard. It has this idiotic


> restriction that external identifiers must be unique in the first six
> characters, case insensitive. I don't know of _anybody_ who liked
> this, but there was a very popular system which had a standard linker
> with that antiquated restriction. Removing this brain-dead restriction
> meant that you couldn't write an ANSI C compiler for that system,
> which was considered a bad thing - so the restriction stayed in.

In case I interpreted the snide remark correctly, the standard IBM linker uses
EIGHT characters names, not 6. Furthermore the C compiler performs a mapping
between C names and internally generated linker names to allow C programmers to
have function names of the length they want. So I have to conclude that the
system in question is not an IBM system, and therefore not an EBCDIC system.

> While requiring that only space can be treated as a blank character
> won't prevent people from implemeting "standard Rexx" on any
> particular machine, it will make Rexx hard to use, and cause people to
> look to other tools.

It will make REXX hard to use for people who, for religious reasons, refuse to
type PARSE EXPAND instead of PARSE wherever necessary. That is their problem,
as far as I am concerned; I have no such religious diktat.

Eric

Mike Meyer

unread,
Aug 30, 1992, 3:07:05 AM8/30/92
to
A question for all of you using EBCDIC who don't think that arbitrary
whitespace should be allowed to seperate words: Why do you care?

The only argument I've seen is that allowing input data with tabs
makes programs less portable. This is false. It makes programs more
portable; it makes _data_ less portable.

If you think that tabs are evil, you may be right. It doesn't matter.
TAB is an ASCII character that it is perceived of as a long blank, and
people use it as such on a regular basis. Wishing and language
definitions won't make the problem go away; they'll just fail to
generate horses and create headaches for people who want to use the
language.

A similar problem shows up in the ANSI C standard. It has this idiotic
restriction that external identifiers must be unique in the first six
characters, case insensitive. I don't know of _anybody_ who liked
this, but there was a very popular system which had a standard linker
with that antiquated restriction. Removing this brain-dead restriction
meant that you couldn't write an ANSI C compiler for that system,
which was considered a bad thing - so the restriction stayed in.

While requiring that only space can be treated as a blank character


won't prevent people from implemeting "standard Rexx" on any
particular machine, it will make Rexx hard to use, and cause people to
look to other tools.

In <REXXLIST%92082621432914@DEARN>, Otto Stolz <RZO...@DKNKURZ1.BITNET> wrote:
> 5. that standard-conforming implementations be required to implement
> the recognition of white space in a way conforming
> - to all possible sources for REXX source programs,
> - to all possible sources for input to REXX programs
> (cf. note 1);

I like that - that means you have to consider every computer system in
the world as a possible source, including all possible character set
mappings in getting the data from there to here. Not to mention all
tapes/decks/etc. that still exists, even if there are no functioning
computers of the type that created them.

> In a nutshell: REXX language features should be as permissive as
> possible when accepting white space, and as predictable
> as possible when generating it.

"Accept permissively and generate strictly" is a good general
guideline whenever portability or interoperability becomes a
consideration.

<mike

Mike Meyer

unread,
Aug 30, 1992, 8:35:41 PM8/30/92
to
In <1992Aug27...@sejnet.sunet.se>, er...@sejnet.sunet.se (Eric Thomas) wrote:
> Ok, time for the usual stupid question. Say I have a program that does:
>
> Parse var line ':'tagname'.'value' :'line
>
> Say I run that program on an ASCII system which recognize the 20-odd types of
> blanks Otto showed in his posting. Where does my 'value' variable end, when the
> interpreter encounters a SPACE followed by colon (ancient, despicable, evil
> EBCDIC-type behaviour, surely that must not be the case), or also when it
> encounters TAB followed by colon (nice, modern ASCII-type behaviour which
> happens to break the program because that is not what the programmer wanted
> and thought the interpreter would do).

What you want it to be is whatever whoever generated the input thought

it would be. In the case that I run into most, that's easy. I'm
dealing with data that follows Internet Request For Comment #822,
which says:

3.4.2. WHITE SPACE

Note: In structured field bodies, multiple linear space ASCII
characters (namely HTABs and SPACEs) are treated as
single spaces and may freely surround any symbol. In
all header fields, the only place in which at least one
LWSP-char is REQUIRED is at the beginning of continua-
tion lines in a folded field.

I.e. - I have to accept both space and tab as space, because the
standards document I'm using _says_ I do.

> In either case you have a problem. If the blank in the search string only
> stands for SPACE, it is very difficult to indicate that you want any of the 20+
> white space characters to match. You would almost need a new WSPARSE command,
> and WSPOS, and so on. If on the other hand the blank stands for any white space
> character, you have no way in the language to halt on just a SPACE when you
> need to do that. OPTIONS is not a solution, a given program may well need both
> functions very often and switching OPTIONS statements is at best impractical.

How many cases have you run into that need both behaviors (nuts, how
many have you run into other than the one you normally deal with)? The
only ones I've met were that way because the input data stream allowed
used TABs as whitespace, but the output streams didn't. That case
doesn't require changing modes in midstream, it just means you can't
generate TABs. No problem.

<mike

Mike Meyer

unread,
Sep 1, 1992, 1:25:50 AM9/1/92
to
In <REXXLIST%9208280...@UGA.CC.UGA.EDU>, Dave Gomberg <GOM...@UCSFVM.BITNET> wrote:
> If it is required that 20 characters be syonymized because the designers
> of a character set did not understand the distinction between a character
> set and a printer control languague, I for one don't want to buy a device
> that fools around with that character set.

Yeah, the designer of any character set that includes things like HT
(EBCDIC 05) or FF (0C) should be taken out and shot. And you'd
certainly never want to buy a device that used any such characteer
set.

You've consistently taken an antagonistic attitude about this problem,
apparently trying to smear things that aren't done the "true blue"
way, and usually incorrectly. The problem isn't that some OS does
things "wrong", or that some character set is "wrong." The problem is
that _people_ have become used to using more than one character to
mean space.

Those same people, when faced with an EBCDIC system, will try and use
the "TAB" key exactly like they have before. They'll get upset and
complain about the system if it doesn't do what they expect. They will
complain that the system is doing things wrong. From their point of
view, they'll be right.

Trying to place blame isn't going to solve any problems; it's just
going to antagonize people. What is needed is a solution that everyone
can accept. If it can be arrived at here, that's great - someone can
take it to the ANSI meeting. If it can't be arrived at there, then the
Rexx community as a whole will lose.

<mike

Charles R. Martin

unread,
Sep 6, 1992, 10:29:25 PM9/6/92
to
In article <REXXLIST%9209062...@UGA.CC.UGA.EDU> GOM...@UCSFVM.BITNET (Dave Gomberg) writes:

I certainly did not understand what you meant correctly. I thot that
a shouldn't match b or c to mean a shouldn't match b and a shouldn't
match c. If I understand what you meant, you meant a shouldn't
match b or else a shouldn't match c. If the latter is what you mean,
I just don't understand your point, so I cannot agree or disagree with
it. Perhaps you could restate it?? Dave

sure: the position operator should match the true characters: pos("a b"...)
should hit on the substring "a<blank>b" in the target, not on the
substring "a<tab>b".

Eric Thomas

unread,
Sep 4, 1992, 9:00:55 PM9/4/92
to
In article <1992Sep4.1...@wrkgrp.COM>, e...@wrkgrp.COM (Edward T Spire) writes:

> er...@sejnet.sunet.se (Eric Thomas) writes:
> : No question here. But wouldn't the unix version of XEDIT exhibit the same kind
> : of behaviour as the CMS version? Or does it, too, use tabs in the data returned
> : by (say) EXTRACT?
>
> Not normally, but if you were editing a makefile that had tabs as part
> of the data lines and you extracted a data line...

*sigh* Very intelligent remark I must say. Extracting raw data from the file
can obviously return any possible value from 00 to FF, since the file can
contain bytes of any value. When editing a makefile, you furthermore want to
make very sure that tabs are NOT automatically treated as blanks, since they
have a totally different syntactical meaning.

> I am not joking, nor are the other folks who choose this route.
> Hardware performance bottlenecks are becoming less important as
> price/performance ratios change.

If performance is the only problem you have noted, you have missed the whole
picture. One example among many: how can I write a routine that will sort a
stem the caller passes as argument? Given that I can't, how do I share the
routine in question between several REXX source files? Choice 1, I write a huge
REXX file with all 30,000 lines of code in it (a technique commonly called
"modular programming" which improves robustness and programming efficiency).
Choice 2, I use the editor to copy the routine wherever I use it, and have 200
places to fix if it turns out to be buggy (of course the stem I need to sort
will have a different name every time). Ah well, there is no deafer man than
him who does not wish to hear.

Eric

Dave Gomberg

unread,
Sep 5, 1992, 3:48:38 PM9/5/92
to
On Sat, 5 Sep 1992 17:50:20 GMT Charles R. Martin said:
>1. I don't think any sensible interpretation of pos('a b'...) would
> permit it to match on either 'a<blank>b' or 'a<tab>b'. I'd certainly
> consider such an implementation buggy.

I am glad I don't run an implementation you wrote. I would sue an implementer

Dave Gomberg

unread,
Sep 6, 1992, 8:53:16 PM9/6/92
to

Dave Gomberg

unread,
Sep 6, 1992, 8:47:49 PM9/6/92
to
I have been deeply involved with ASCII since many of you were in diapers
(1960). And I suspect I have read at least the earlier standards
much oftener than most of you have. And the purpose of HT or FF is not
printer driving (the printers for which ASCII was defined originally
supported neither of those characters). So let's not confuse the issue.
Two is not twenty. Anyone who thinks the function of a character set
is to drive composition devices needs to go back to school and learn
what a character set is and how to drive a composition device. And G*d
help us all if we are forced to use a character set designed with that
little knowledge and intelligence. Dave

Eric Thomas

unread,
Sep 7, 1992, 12:30:41 PM9/7/92
to
In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I think the problem lies in repeated applications of parse, say with
> columnar fields. Say I parse the string "ab\tc" -- importing a little C
> terminology to make clear where the tab lies -- with parse expand. The
> tab char should come out to be some number of blanks, as I understand
> it. (Probably 6 for canonical tabs.) But if I parse off the first
> field, THEN parse the resulting "\tc", the tab now wants to expand to 8
> characters.

Good point, but this is a hasty conclusion. IF you are worried about exact
column positions and have to cope with data containing tabs and blanks, you
will have to think carefully in any case. As you pointed out,

Parse var line a +2 b
Parse expand var b <whatever>

will not work, but
Parse expand var line a +2 b
Parse var b <whatever>

does what you want. If you expand, you have to expand at the source. Now, with
a REXX that has built-in white space recognition rather than expand, the code
fragment would have to be:

Parse var line a +2 b
Parse var b <whatever>

The first PARSE gives you a variable 'b' that starts with a tab, the second
PARSE throws it away (since it is white space) unless you use columnar or
pattern based parsing. Losing the tab will surely screw up your column
positioning, since it will yield exactly the same result as if the tab had been
a blank. So the only way to handle this situation properly is to use a form of
parsing which does NOT treat tabs as white space. You are not making use of the
interpreter's automatic handling of tabs.

In other words, the situation is not better with a REXX that automatically
treats tabs as white space; in fact, it is worse.

Eric

Charles R. Martin

unread,
Sep 7, 1992, 10:42:22 AM9/7/92
to
Is the point that they need to match the tab in POS? Or was that a base
canard introduced in the flaming? I myself would be *real* surprised at
a substring position operation that matched both blank and tab in this
context.

But I think the point is that in the parse operation, with something
like

line='foo bar bletch' /* that's <space><tab> */
parse var line a b c

it makes sense to have a='foo', b='bar', and c='bletch', and to have
parse eat both whitespace characters just as it already eats any string
of blanks between non-blank maximals. I can't think of much of a
context in which this wouldn't be a (the?) sensible interpretation, and
you can easily write a parse specification that will get blanks
specifically if that's what you want.

I do rather like the idea of having a syntax for general regular
expression matching in parse, though. (And please don't tell me about
how rexx is supposed to be a simple end-user language: people are
writing complicated number-theoretic programs and lisp interpreters in
rexx.)

Charles R. Martin

unread,
Sep 7, 1992, 11:07:59 AM9/7/92
to

c=a||b; say pos(a,c)

I guess it's my turn to be confused; this reads as if we've gone back to
the idea that pos should "hit" on both tab and blank as a special case.

Just to jump on my formal-methods high horse for a second, this is what
really comes of trying to do programming-language semantics without a
mathematical basis: we keep hacking back and forth on what we mean by so
and so. (Don't take this as a personal criticism, by the way -- this is
a standard screwup in programming langauges, in my opinion. See e.g.
the Ada specification.)

So here's an idea: I'll introduce an idea of regular expression for Rexx
and describe what *I* think this all means, and maybe that'll clear
things up. (By the way, this isn't a serious proposal for rexx syntaxx
here, just a notation for discussion.)

Let's use something like emacs regex expressions, since they can be
typed easily; Regular expressions are contained within <>'s, and use
backslash as an escape character. So,

<a> is a regular expression matching exactly one 'a'
<a*> is a reg exp matching 0 or more 'a's
<a<a*>> is a reg exp matching 1 or more 'a's (which is sufficiently
useful that it's abbreviated using the + operator, i.e. <a+>)
<.> matches any one character
<.+> matches one or more of any character
<\.> matches only '.' itself.
<a|b> matches either 'a' or 'b'
<^a> matches anything except 'a'
<^<a|b>> matches anything except 'a' or 'b'

Let's also alias certain characters with escapes, e.g., \t is notation
for the TAB character in whatever coding set is in use.

Now, POS is a simple regular expression matcher that doesn't allow any
of the regular set operations; that is, the search string is just a
regular expression with no *,+,|,^, or . operators. It's clear then
that if we think of POS("a b", string) as POS(<a b>, string) we have
exactly the expected behavior as I think we've agreed. That is, since
<a b> matches only the substring "a<space>b" it ignores the substring
"a<tab>b".

I think the contention is that

parse arg a b c

ought to be a synonym for

parse upper arg with a<<\b|\t>+>b<<\b|\t>+>c

Now if we had that, I don't see immediately what the need for parse
expand would be; in fact, given that the simple identity doesn't hold
after parse expand, maybe the whole idea is broken. Ah, well, such is
the risk of informal language definition. I suppose it must be left in
at this late date.

But the idea of embedded general regular expressions in parse is such a
nice one, and seems like such a natural extension to the existing parse,
that I think it ought to be seriously considered.

Eric Thomas

unread,
Sep 7, 1992, 3:00:18 PM9/7/92
to
In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I do rather like the idea of having a syntax for general regular
> expression matching in parse, though. (And please don't tell me about
> how rexx is supposed to be a simple end-user language: people are
> writing complicated number-theoretic programs and lisp interpreters in
> rexx.)

People have a constitutional right to stupidity. In my family, screwdrivers
generally have a 75% probability of being bent because I have never managed to
get my relatives to understand that they are not designed to be used as levers.
It would be clearly more convenient for them if screwdrivers were made much
thicker, with just the tip being the size of the screw. I'm afraid I don't know
any hardware company that sells this type of screwdrivers, though.

Eric

Eric Thomas

unread,
Sep 7, 1992, 3:12:17 PM9/7/92
to
In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I think the contention is that
>
> parse arg a b c
>
> ought to be a synonym for
>
> parse upper arg with a<<\b|\t>+>b<<\b|\t>+>c
>
> Now if we had that, I don't see immediately what the need for parse
> expand would be;

Conversely, given PARSE EXPAND I don't see what the need for changing the
definition of all functions which deal with words, leading/trailing blanks and
so on would be. Compatibility with prior applications is guaranteed, and the
language only needs to be changed in one place.

> in fact, given that the simple identity doesn't hold
> after parse expand, maybe the whole idea is broken.

PARSE does not respect identity when using word parsing (ie PARSE something A B
as opposed to column or literal string parsing), because it quietly removes an
arbitrary amount of blanks. The only case where identity is preserved is when
you do not use word parsing. This is true regardless of the definition you use
for white space.

> But the idea of embedded general regular expressions in parse is such a
> nice one, and seems like such a natural extension to the existing parse,
> that I think it ought to be seriously considered.

> parse upper arg with a<<\b|\t>+>b<<\b|\t>+>c

Nice? Natural? I see.

Eric

Steve Bacher

unread,
Sep 7, 1992, 2:17:00 PM9/7/92
to
In article <1992Sep6...@sejnet.sunet.se>,
er...@sejnet.sunet.se (Eric Thomas) writes:

>Under VM, isspace() is true for X'05' and X'15'. It is logical for isspace() to
>be true for these characters, since they are the EBCDIC equivalents of TAB and
>CRLF, respectively; as far as I know, all the C compilers for VM agree on this.
>I'm not really eager to find out what happens to my tens of thousands of lines
>of code the day X'15' is suddenly treated as a blank. It is used as a
>line-separator in functions like DIAG, and it is common programming practice to
>use it as a line separator when returning more than one line of (non-binary)
>data from a function. I suppose a similar situation exists for \n on unix
>systems, but I've never used REXX on unix so I wouldn't know.

At least with the more reasonable proposals,

PARSE VALUE DIAG(something) WITH var1 var2 '15'x var3 var4 '15'x var5

should continue to work just fine, regardless of whether '15'x is
considered whitespace or not. A correct implementation of REXX will
break up the line at the '15'xes and only then perform the whitespace
analysis on the resulting pieces. (Ed, you fixed this in UniREXX,
didn't you?)

Disclaimer: I'm not a VMer, so don't take my sample syntax literally.

--
Steve Bacher (Batchman) Draper Laboratory
Internet: s...@draper.com Cambridge, MA, USA

Steve Bacher

unread,
Sep 7, 1992, 2:18:00 PM9/7/92
to

>Those same people, when faced with an EBCDIC system, will try and use
>the "TAB" key exactly like they have before. They'll get upset and
>complain about the system if it doesn't do what they expect. They will
>complain that the system is doing things wrong. From their point of
>view, they'll be right.

Yeah, especially because on a 3270 (real or simulated), the TAB key
isn't a character at all, but a terminal control operation! So when
they complain that the system isn't behaving right, this isn't
something that changing the definition of REXX can fix.

Charles R. Martin

unread,
Sep 7, 1992, 7:41:07 PM9/7/92
to
In article <1992Sep7...@sejnet.sunet.se> er...@sejnet.sunet.se (Eric Thomas) writes:

In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I do rather like the idea of having a syntax for general regular
> expression matching in parse, though. (And please don't tell me about
> how rexx is supposed to be a simple end-user language: people are
> writing complicated number-theoretic programs and lisp interpreters in
> rexx.)

People have a constitutional right to stupidity.

I'm afraid that a lot of this whole discussion comes out sounding like a
demonstration of this.

In my family, screwdrivers generally have a 75% probability of being
bent because I have never managed to get my relatives to understand
that they are not designed to be used as levers. It would be clearly
more convenient for them if screwdrivers were made much thicker, with
just the tip being the size of the screw. I'm afraid I don't know any
hardware company that sells this type of screwdrivers, though.

I don't think this (rather dippy) analogy holds very well. If parse
were extended with a regular expression syntax that was a strict
extension -- that is, any program which didn't use the reg exp syntax to
get results isn't broken by the addition -- then we'd know that (a) you
can do anything you did before, and (b) that you could do neat new
things. It's like if someone started making screwdrivers out of a
material with a very high yield point, so that even if you were prying
up a trap door with an elephant on it the shaft would return to shape:
no one would notice that they were any different, except when they tried
to use them as prybars.

Charles R. Martin

unread,
Sep 7, 1992, 7:48:17 PM9/7/92
to
In article <1992Sep7...@sejnet.sunet.se> er...@sejnet.sunet.se (Eric Thomas) writes:

I think I should direct you to the note where I went over precisely what
I meant using regular expressions. But I think your example is flawed:
once again I don't have a rexx interpreter I trust at hand -- by the
way, OS2 2.0 rexx doesn't recognize the expand clause at all, where does
it come from? -- but would you expect the clause

parse var b c d

to put leading blanks onto the value that shows up in c? That is, given
b='\b\bX\bY' would you expect c to be "\b\bX"? That doesn't seem to be
the behavior I have seen so far.

Charles R. Martin

unread,
Sep 7, 1992, 7:58:33 PM9/7/92
to
In article <1992Sep7...@sejnet.sunet.se> er...@sejnet.sunet.se (Eric Thomas) writes:

In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I think the contention is that
>
> parse arg a b c
>
> ought to be a synonym for
>
> parse upper arg with a<<\b|\t>+>b<<\b|\t>+>c
>
> Now if we had that, I don't see immediately what the need for parse
> expand would be;

Conversely, given PARSE EXPAND I don't see what the need for changing
the definition of all functions which deal with words,
leading/trailing blanks and so on would be. Compatibility with prior
applications is guaranteed, and the language only needs to be changed
in one place.

But as I noted before, if the reg exp syntax is a strict extention, then
there shouldn't be any affect on exsiting functions. (In fact, having a
reg exp extention might be the answer to the non-IBM world's problems
with parse expand.)

But, tell me -- is parse expand actually part of rexx already? I can't
seem to find it in the documentation I have. Are we in a battle of the
competing extentions, or is it already part of usual rexx?

> in fact, given that the simple identity doesn't hold
> after parse expand, maybe the whole idea is broken.

PARSE does not respect identity when using word parsing (ie PARSE
something A B as opposed to column or literal string parsing),
because it quietly removes an arbitrary amount of blanks. The only
case where identity is preserved is when you do not use word parsing.
This is true regardless of the definition you use for white space.

Then what's the objection to making whitespace remove an arbitrary
number of blanks and tabs? What existing programs does it break?

> But the idea of embedded general regular expressions in parse is such a
> nice one, and seems like such a natural extension to the existing parse,
> that I think it ought to be seriously considered.

> parse upper arg with a<<\b|\t>+>b<<\b|\t>+>c

Nice? Natural? I see.

If you don't like the syntax, well, I never claimed that this was good
syntax. (Of course, you've elided the part where I said that.) As far
as it's niceness and naturalness, it is at least nice and natural to
anyone with a competent education in computer science; since (as I said
and you elided again) the point of the notation was to make precise a
concept that had up to then been imprecise, I don't see an objection;
and if the notation used for reg exps is nicely done, those who don't
want to have to deal with such a complicated idea as regular sets can
just ignore them.

Sam Drake

unread,
Sep 8, 1992, 1:53:17 AM9/8/92
to
Excuse me, I'm late to this debate, and frankly the discussion here is
seriously in the ditch. I've read 42 items in this thread, and it's
pretty clear that y'all aren't quite listening to each other anymore.

Can someone, calmly, explain for my own edification what's wrong with:

In article <MARTINC.92...@grover.cs.unc.edu> mar...@grover.cs.unc.edu (Charles R. Martin) writes:
>But I think the point is that in the parse operation, with something
>like
>
>line='foo bar bletch' /* that's <space><tab> */
>parse var line a b c
>
>it makes sense to have a='foo', b='bar', and c='bletch', and to have
>parse eat both whitespace characters just as it already eats any string
>of blanks between non-blank maximals.

I'm just a simple guy, but I can't think of anything wrong with this.
What's the argument on this point, if any? Several folks have suggested
this, and no one has disagreed that I can see, but the issue doesn't
seem to be closed.


Sam Drake / IBM Almaden Research Center
Internet: dr...@almaden.ibm.com BITNET: DRAKE at ALMADEN

Anders Christensen

unread,
Sep 8, 1992, 10:38:20 AM9/8/92
to
In article <REXXLIST%9209070...@UGA.CC.UGA.EDU> Dave Gomberg <GOM...@UCSFVM.BITNET> writes:

> I would really like Anders
> and other Unix types to say if they can stand it that:
>
> c=a||b; say pos(a,c)
>
> would say 0?

Why are you focusing on POS()? The description of POS() in TRL (2nd
ed) doesn't even mention 'blank' or 'blanks'. Thus, POS() has nothing
to do with whitespace, and is only suited to obscure this discussion.

The WORDPOS() built-in function is much more interesting, relevant and
illustrative. I think it should return (using "\t" to denote Tabs, but
note that this applies to other whitespace characters as well):

wordpos( "a b", "a b" ) --> 1
wordpos( "a b", "a b" ) --> 1
wordpos( "a\tb", "a b" ) --> 1
wordpos( "a b", "a b" ) --> 1
wordpos( "a b", "a\tb" ) --> 1

Calling POS() instead, only the first of these returns 1, the rest
return 0. (Unless you introduce regular expressions which, although a
very interesting extension, is a totally different tread.)

-anders

PS: Under normal circumstances, your example says "0" iff 'a' is a
variable set to the nullstring.

Eric Thomas

unread,
Sep 8, 1992, 11:26:10 AM9/8/92
to
In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> I don't think this (rather dippy) analogy holds very well. If parse
> were extended with a regular expression syntax that was a strict
> extension -- that is, any program which didn't use the reg exp syntax to
> get results isn't broken by the addition -- then we'd know that (a) you
> can do anything you did before, and (b) that you could do neat new
> things. It's like if someone started making screwdrivers out of a
> material with a very high yield point, so that even if you were prying
> up a trap door with an elephant on it the shaft would return to shape:
> no one would notice that they were any different, except when they tried
> to use them as prybars.

And when they saw the price on the cash register. Your analogy doesn't hold,
it's not like if someone started making these super-screwdrivers, it's like if
a new law required that all screwdrivers be made this way. If you want to make
your own 'REXXrg' language for your private use, I really have no objection.
You *were* talking about changing the standard, weren't you?

Eric

Eric Thomas

unread,
Sep 8, 1992, 11:40:48 AM9/8/92
to
In article <1992090713...@MVS.draper.com>, SEB...@MVS.draper.com (Steve Bacher) writes:
> At least with the more reasonable proposals,
>
> PARSE VALUE DIAG(something) WITH var1 var2 '15'x var3 var4 '15'x var5
>
> should continue to work just fine, regardless of whether '15'x is
> considered whitespace or not. A correct implementation of REXX will
> break up the line at the '15'xes and only then perform the whitespace
> analysis on the resulting pieces.

Yes, that much will work, and it is indeed what people generally code to
process the output of commands which produce a fixed-size answer (like when
extracting the time zone name from QUERY TIME). But there are many commands
which produce variable-size output that one processes in a loop (example: QUERY
NAMES). What bothers me is that we now have '15'x = ' ', a situation most
programmers must NOT have quite expected. My own programs would not be affected
since I always use ==, but the amount of spontaneous religious flames I get
about this usage from people who have read my code for the first time suggests
that most programmers do use = and thus their programs would break.

Eric

Eric Thomas

unread,
Sep 8, 1992, 11:54:36 AM9/8/92
to

Did you read your own note? I am getting the impression you are replying to
the wrong message by mistake.

> But I think your example is flawed:

My examples are not flawed. I have written enough REXX code to know what a
simple PARSE clause like the ones I quoted above does, if you don't believe me
feel free to run them through your favourite interpreter. As you said (and if I
ever said the opposite I'd like you to show me where), PARSE does indeed strip
both leading and trailing blanks from the first N-1 variables in a word parsing
clause. That is precisely why you will lose column positioning if you use this
form of parsing, as arbitrary amounts of blanks (or whitespace in your
proposal) are silently eaten up. The only way you can remember what columns
things were at is if you don't use this form of the PARSE statement, in which
case it is totally immaterial whether or not it treats tabs as whitespace when
parsing by word, since you won't be parsing by word, and again the conclusion
is that having this behaviour does not help you in cases where you have:

> > repeated applications of parse, say with columnar fields.

Now if you would please run further examples through an interpreter so that you
can come up with concrete situations where there is indeed a problem, we would
all save a lot of time.

Eric

Eric Thomas

unread,
Sep 8, 1992, 12:25:42 PM9/8/92
to
In article <19...@coyote.UUCP>, dr...@drake.almaden.ibm.com (Sam Drake) writes:
>>line='foo bar bletch' /* that's <space><tab> */
>>parse var line a b c
>>
>>it makes sense to have a='foo', b='bar', and c='bletch', and to have
>>parse eat both whitespace characters just as it already eats any string
>>of blanks between non-blank maximals.
>
> I'm just a simple guy, but I can't think of anything wrong with this.
> What's the argument on this point, if any?

Taking the REXX language as it is today (tab et al not treated as whitespace
characters), you cannot implement JUST this one change without breaking
everything. You have to change a bunch of other functions to keep a homogeneous
behaviour. For instance, it is common to code:

Do Words(list)
Parse var list elm list
<...>
End

WORDS and PARSE had better agree on how many times to run the loop. Similarly,
WORD must produce the same result in case you read the elements with
WORD(list,i). After doing 'list = Space(list,0)', it is reasonable for the
programmer to assume that Words(list) = 1 (or possibly 0, but surely not 2).
Thus SPACE must be changed as well, and the same goes for STRIP, and so on.
Now, once you have changed STRIP, it is necessary to have (I'll use ASCII)
'09'x = ' ', '0D'x = ' ', etc. That might be a surprise to existing programs.
That's a lot of places to change, too. And still there are things which are not
too easy to do, for instance translating all white space to a blank or some
other character, or expanding tabs. It can be done, but the language won't help
you do this.

The alternative is to add an EXPAND function and enhance PARSE so that

PARSE EXPAND something pattern

is equivalent to

PARSE VALUE EXPAND(something) WITH pattern

much like PARSE UPPER is equivalent to PARSE VALUE TRANSLATE() WITH. One new
function, which is useful on its own, and one place to change.

Eric

Ian Collier

unread,
Sep 8, 1992, 11:49:09 AM9/8/92
to
In article <19...@coyote.UUCP>, dr...@drake.almaden.ibm.com (Sam Drake) wrote:
>Excuse me, I'm late to this debate, and frankly the discussion here is
>seriously in the ditch.

Yes, I've joined this argument halfway through as well, and have already
read rather a lot of repetitive articles on the subject. However, here I go
as well...

I have an interest in this subject (and several other threads too), since -
as you may know - I just released a Unix Rexx version in which the only
whitespace character is '20'x (except that unquoted tabs count as whitespace
when reading in the Rexx source). While I can see that having PARSE and
WORDPOS treat spaces and tabs identically can have its advantages, I would
much rather leave the language unchanged except for a new builtin function
expand() which expands tabs into spaces (however if there really is demand
for a "PARSE EXPAND <source> <pattern>", having identical meaning to
"PARSE VALUE expand(<source>) WITH <pattern>" then I would reluctantly put
it in).

The following question raises a worry about having PARSE treat tabs as
spaces by default:

>Can someone, calmly, explain for my own edification what's wrong with:

[the following by mar...@grover.cs.unc.edu (Charles R. Martin)]


>>line='foo bar bletch' /* that's <space><tab> */
>>parse var line a b c
>>
>>it makes sense to have a='foo', b='bar', and c='bletch', and to have
>>parse eat both whitespace characters just as it already eats any string
>>of blanks between non-blank maximals.

Firstly, PARSE eats only one whitespace character from the last token (i.e.
that assigned to c in the example) - so c would contain at least one
whitespace character after the above parse.

Secondly, if PARSE is supposed to eat exactly one space, what happens if a
tab separates the penultimate token from the last one? Does it eat the
whole tab, or eat one space from the tab and leave seven still there?

A different point is this: the meaning of "parse value expand(a) with c 15 d"
is clear, but what would "parse var a c 15 d" do if a contains tabs, and
tabs are special to the parse instruction? Something that starts in column
15 will not necessarily start with the 15th character: in "\t12345678", the
string "78" starts in column 15, but "7" is only the eighth character. Is
parse allowed to split up a tab character if it spans columns 9-16?

Ian Collier
Ian.C...@prg.ox.ac.uk | i...@ecs.ox.ac.uk

Charles R. Martin

unread,
Sep 8, 1992, 1:20:26 PM9/8/92
to
In article <1992Sep8...@sejnet.sunet.se> er...@sejnet.sunet.se (Eric Thomas) writes:

> I think I should direct you to the note where I went over precisely what
> I meant using regular expressions.

Did you read your own note? I am getting the impression you are replying to
the wrong message by mistake.

> But I think your example is flawed:

My examples are not flawed. I have written enough REXX code to know
what a simple PARSE clause like the ones I quoted above does, if you
don't believe me feel free to run them through your favourite
interpreter.

Then you should know enough about rexx code to answer my questions,
shouldn't you?

As you said (and if I ever said the opposite I'd like
you to show me where), PARSE does indeed strip both leading and
trailing blanks from the first N-1 variables in a word parsing
clause. That is precisely why you will lose column positioning if you
use this form of parsing, as arbitrary amounts of blanks (or
whitespace in your proposal) are silently eaten up. The only way you
can remember what columns things were at is if you don't use this
form of the PARSE statement, in which case it is totally immaterial
whether or not it treats tabs as whitespace when parsing by word,
since you won't be parsing by word, and again the conclusion is that
having this behaviour does not help you in cases where you have:

> > repeated applications of parse, say with columnar fields.

Let me see -- parse for words *does* eat up all the intervening blanks
(pretty much what I expect), does lose column positioning, and blanks
are word separators in a words clause. It really appears as if you
either

(a) really think that "a<tab>b" ought to be a "word" in the
words-clause sense -- which seems massively counter-intuitive -- or

(b) that you agree sensible behavior is that any whitespace character
NOT OTHERWISE INTERPRETED (like newline) ought to be a word separator.

If (a), I'd really like to see it defended. It seems a massive flaw.

Charles R. Martin

unread,
Sep 8, 1992, 1:29:54 PM9/8/92
to
In article <1992Sep8...@sejnet.sunet.se> er...@sejnet.sunet.se (Eric Thomas) writes:

Hmmm. It seems as if there are multiple topics in this, all
confabulated together.

(1) Should newline be considered a whitespace character, or more
generally should the rexx definition of whitespace be equivalent to
anything for which unix isspace() returns 1?

I think the answer is "no". Even other UNIX languages that are line-
oriented (like awk) don't do this. It's clear that for any language
serving similar purposes to rexx and awk, newline must be distinguished
from Blank, tab, formfeed, VT etc.

(2) Should parse (and WORDS, WORDPOS, etc) be extended or modified to
make any arbitrary whitespace a word separator?

I've already argued that it is more orthogonal and more uniform to
answer "yes".

(3) Should parse be extended with some kind of general regular
expression mechAnism?

I think it's worth seriously considering -- before I fell asleep last
night I got thinking about some of the things one could do with an
appropriate rexexp parse, and it's pretty exciting.

Steve Bacher

unread,
Sep 8, 1992, 6:21:00 PM9/8/92
to
In article <1992Sep8...@sejnet.sunet.se>,
er...@sejnet.sunet.se (Eric Thomas) writes:

>In article <1992090713...@MVS.draper.com>, SEB...@MVS.draper.com (Steve Bacher) writes:
>> At least with the more reasonable proposals,
>>
>> PARSE VALUE DIAG(something) WITH var1 var2 '15'x var3 var4 '15'x var5
>>
>> should continue to work just fine, regardless of whether '15'x is
>> considered whitespace or not. A correct implementation of REXX will
>> break up the line at the '15'xes and only then perform the whitespace
>> analysis on the resulting pieces.
>
>Yes, that much will work, and it is indeed what people generally code to
>process the output of commands which produce a fixed-size answer (like when
>extracting the time zone name from QUERY TIME). But there are many commands
>which produce variable-size output that one processes in a loop (example: QUERY
>NAMES).

Still not that much of a problem, if you do:

temp = diag(something)
or
parse value diag(something) with temp

do while words(temp) = 0 /* or temp ^= "", or whatever */
parse var temp stuff '15'x temp
call do_something_with stuff
end

The same rules should apply.

I do see your point about all components of the language that deal with
whitespace needing to behave consistently. I take that as a given.

Eric Thomas

unread,
Sep 8, 1992, 8:08:26 PM9/8/92
to
In article <MARTINC.92...@grover.cs.unc.edu>, mar...@grover.cs.unc.edu (Charles R. Martin) writes:
> Then you should know enough about rexx code to answer my questions,
> shouldn't you?

Don't you have documentation with your interpreter? If not, can't you buy the
REXX book? In any case, if you are unsure of the interaction between columnar,
literal and word parsing, why don't you just ask explicitly before making
assumptions in your examples and making people waste their time showing the
flaw in a point they thought you were trying to make but which turned out not
to be what you meant?

> Let me see -- parse for words *does* eat up all the intervening blanks
> (pretty much what I expect), does lose column positioning, and blanks
> are word separators in a words clause.

Right on all counts.

> It really appears as if you either
>
> (a) really think that "a<tab>b" ought to be a "word" in the
> words-clause sense -- which seems massively counter-intuitive -- or

Ah, it only took you 10 postings to get my point. Good that we're now on the
same wavelength.

> If (a), I'd really like to see it defended. It seems a massive flaw.

I believe I have posted more than enough on that topic, so I refer you to my
other messages.

Eric

cultural elite

unread,
Sep 8, 1992, 12:26:23 PM9/8/92
to
Eric Thomas <er...@SEJNET.SUNET.SE> writes:
>In article <920905194...@lns598.TN.CORNELL.EDU>, cultural elite
><d...@LNS598.TN.CORNELL.EDU> writes:
>> I agree, that would be ridiculous. Not even Unix (with the possible
exception
>> of a few broken utilities) behaves that way.
>
>With the possible exception of, for instance, scanf(), the C formatted read
>routine.

Ok, score one for Eric--I have always thought that scanf() is more than
a bit brain-damaged.

>Ever tried to use scanf() to read a one-word reply from the terminal,
>and use a default value if the user just hits RETURN?

Nope, I avoid scanf(). (See above...) If I have to, I'll use fgets()
and sscanf().

>The problem is that if you want to change the language definition of all these
>functions, you have to come up with a solution that is clearly defined and
>works for all systems, not just unix. One such solution, which is what I
>understood all the previous postings were proposing, is that positional PARSE
>et al are to treat any character for which isspace() returns TRUE as white
>space.

Using isspace() on all platforms would *not* be a clearly defined, well
thought out solution. isspace() is not well standardized on Unix platforms,
and isspace() on some non-Unix machines is seriously compromised by the
peculiar requirements of the C standard library, and attempts to make the
system Unix-like. This makes it a terrible choice for Rexx, which should
do its best to cater to the native environment (which is what this discussion
is all about, right?), not impose some Unix/C biased preconception on every
platform.

So I consider

>Under VM, isspace() is true for X'05' and X'15'. It is logical for isspace() to
>be true for these characters, since they are the EBCDIC equivalents of TAB and
>CRLF, respectively; as far as I know, all the C compilers for VM agree on this.

to be a non-issue. It is also an implementation detail, not a fundamental
objection.
--
Dan Riley Internet: d...@lns598.tn.cornell.edu
Wilson Lab, Cornell University HEPNET/SPAN: lns598::dsr (44630::dsr)
"Maybe, leastways is the best way of all" -Caterwaul

Jerry Campbell

unread,
Sep 8, 1992, 2:15:52 PM9/8/92
to
An observation and a what if. Observation, since there is so much controversy
regarding this issue I'd think that would be a clear sign that it should be left
alone. The programmer should worry about this white space vs. "real" blanks
thing. What if, I really, really did want to discriminately parse blanks as
opposed to tabs? Take a Rexx program that wanted to convert tabs to real blanks
or vice versa? In this case you would REQUIRE control over the precise value
of the data when parsing. Please, don't build "intelligent" second guessing
Rexx interpreters.....! Or standards.

---
Jerry Campbell reply to: zjl...@hou.amoco.com
Amoco Corp. ISD SSS/Graphics
Houston, Tx. 713/556-7036

cultural elite

unread,
Sep 8, 1992, 12:32:48 PM9/8/92
to
Dave Gomberg <GOM...@UCSFVM.BITNET> writes:
>Well, the Unix folks have rather dug themselves a hole on that one,
>haven't they, Charlie. It's not just a question of parse, what about
>substr, or left/right.

What about them? I would think substr(), left() and right() would be
unaffected.

>They will have the same problem. I guess we
>could say that just because c=a||b doesn't mean expand(c)=expand(a)||
>expand(b). But we knew that already. And your parse example is just
>this problem in another guise. In fact, the Unix folks have a worse
>problem. You can have c=a||b, but pos(a,c)=0! Wow! I don't think
>I want that in MY os. But there is no accounting for taste! I would
>really like Anders (I assume the first letter is in fact capitalized)

>and other Unix types to say if they can stand it that:
>
> c=a||b; say pos(a,c)
>

>would say 0? It is not what I mean by REXX. Sorry this rambles so. Dave

Ok, call me blind, I but I can't see this hole I seem to be in. You'll
have to explain how you get c=a||b; pos(a,c)=0 out of any sensible
proposal.

Dave Gomberg

unread,
Sep 8, 1992, 1:21:06 PM9/8/92
to
On Tue, 8 Sep 1992 12:32:48 -0400 cultural elite said:
>Dave Gomberg <GOM...@UCSFVM.BITNET> writes:
>>I want that in MY os. But there is no accounting for taste! I would
>>really like Anders (I assume the first letter is in fact capitalized)
>>and other Unix types to say if they can stand it that:
>>
>> c=a||b; say pos(a,c)
>>
>>would say 0? It is not what I mean by REXX. Sorry this rambles so. Dave
>
>Ok, call me blind, I but I can't see this hole I seem to be in. You'll
>have to explain how you get c=a||b; pos(a,c)=0 out of any sensible
>proposal.

If the synonymy of tabs to a particular number of blanks were accepted, and
if assignment did normalization (as it often does, both in standards and in
practice) then c might well have a tab when a ended and b began with strings
of blanks. Dave

cultural elite

unread,
Sep 8, 1992, 1:50:26 PM9/8/92
to

Wild. What languages have you used that do normalization on assignment?
It doesn't sound to me like a reasonable thing to do.

But anyway, I'm not advocating that tabs and blanks be made generally
synonymous--all I want is for them to both be accepted as word separators
in parse, word(), words(), and a few other obvious places, which does not
imply general synonymy. However, Eric may have come up with a serious
objection with his argument that this implies '\t' = ' ' be true, which
I'm not entirely happy about.

It is loading more messages.
0 new messages