In REXX, is it better (more efficient, fewer cycles) to use multiple
SUBSTRing function calls or a single PARSE? For example (assuming
variable "line" has a long string of data to be parsed out):
cat = substr(line,1,6)
status = substr(line,7,1)
type = substr(line,8,1)
typedesc = substr(line,9,15))
baseno = substr(line,24,5)
suffix = substr(line,29,20)
use = substr(line,49,1)
usedesc = substr(line,51,24)
order = substr(line,75,2)
orderdesc = substr(line,77,25)
auth = substr(line,102,2)
authdesc = substr(line,104,8)
or:
parse var line 1 cat 7 status 8 type 9 typedesc 24,
baseno 29 suffix 49 use 51 usedesc 75 order 77,
orderdesc 102 auth 104 authdesc 111
Some of the end columns in my parse example may not correspond exactly
to the lengths specified on the respective SUBSTR lines, but you know
what I mean.
Sometime long ago I was told that using PARSE was more efficient, which
seems logical--one operation vs. many. But a colleague recently
suggested that we don't really know (*we* don't) how many operations the
parse is doing behind the scenes.
I know somebody out there does know. Care to help out?
TIA
Pat Futato
>Date: Fri, 24 Jul 1998 16:38:14 -0400
>From: "Patricia A. Futato" <fut...@publish.no.irs.gov>
>Subject: REXX : PARSE or SUBSTR?
>To: VME...@UAFSYSB.UARK.EDU
I once attended a very interesting presentation at SHARE in which
the presenter (sorry, I've forgotten his name) asked us to tell him
REXX efficiency tricks, then proceeded to demolish them every one.
The real answer is "it depends". You get different answers depending
on whether you are running compiled or uncompiled, on VM or MVS or
OS/2, classic REXX or object REXX (or NetRexx?), etc.
A few observations:
1. A good algorithm will make FAR more difference than any collection
of tricks.
2. Minimizing the number of IOs will make far more difference
than trying to shave off a little CPU time.
3. Is the code you are interested in run once in your program,
or is it in the innermost nested loop of your program. If it's
not in the innermost loop, it probably doesn't matter.
If it does run in the innermost loop, it may be worth your
while to code it both ways and measure the CPU time used.
4. Is your program run once per second? once per day? Once per year?
A one-time program? Unless the program is run frequently,
it probably doesn't matter.
5. If your program IS run often enough to make a difference,
then it probably shouldn't be written in REXX. At least compile
it, or re-write it in a compiled language, or rewrite it in
assembler.
6. Ease of maintenance, ease of reuse, and ease of developing
a reliable program are usually far important than a few
CPU cycles. The price of CPU cycles keeps dropping, the price
of people's time does not.
I have heard that the PARSE example above runs a little faster than the
SUBSTR code, at least when interpreted on VM. This may not remain true
if you compile the code, or run it on another platform.
Since PARSE is a basic idiom in REXX, I see no problem with using a
straight-forward PARSE like the above, and I probably would. On the
other hand, the full definition of PARSE is complex and non-obvious --
I have seen many examples of complex PARSE statement that did not work
as the author of the program expected. I would vote to write the
clearest code and not worry about which is faster.
Alan Ackerman, Bank of America, 925-675-4358, usbo...@ibmmail.com
*** Note new area code -------> ***
I made your example into an exec and profiled it 10 times under TRACEXEC.
Here are the results (column 1 is line number, column 2 is number of times
the line was executed, and column 3 is total CPU for the line). It's
pretty clear that parse wins!
Init 0.018978
1 10 0.000559 /* */
2 10 0.005371 t = time()
3 10 0.000675 line = 'cat STTypeDesc basenSuffix
U' ,
4 0 0.000000 'usedesc OROrderdesc
AUauthdesc'
5 10 0.000447
6 10 0.000916 cat = substr(line,1,6)
7 10 0.000828 status = substr(line,7,1)
8 10 0.000790 type = substr(line,8,1)
9 10 0.000770 typedesc = substr(line,9,15)
10 10 0.000772 baseno = substr(line,24,5)
11 10 0.000769 suffix = substr(line,29,20)
12 10 0.000806 use = substr(line,49,1)
13 10 0.000880 usedesc = substr(line,51,24)
14 10 0.000800 order = substr(line,75,2)
15 10 0.000803 orderdesc = substr(line,77,25)
16 10 0.000774 auth = substr(line,102,2)
17 10 0.000786 authdesc = substr(line,104,8)
18 10 0.000460
19 10 0.001495 parse var line 1 cat 7 status 8 type 9 typedesc
24,
20 0 0.000000 baseno 29 suffix 49 use 51 usedesc 75
order 77,
21 0 0.000000 orderdesc 102 auth 104 authdesc 111
22 10 0.000477
Term 0.002803
------------------------------------------------------------------------
Kent Fiala <sas...@vm.sas.com> R4238
SAS Institute Inc., Cary NC 27513 USA 919-677-8000 x6646
Good luck...
The Bit Bucket, where good things can be found.
In article <35B8F0...@publish.no.irs.gov>,
fut...@publish.no.irs.gov wrote:
> Hi all, just a quick question:
>
> In REXX, is it better (more efficient, fewer cycles) to use multiple
> SUBSTRing function calls or a single PARSE? For example (assuming
> variable "line" has a long string of data to be parsed out):
>
> cat = substr(line,1,6)
> status = substr(line,7,1)
> type = substr(line,8,1)
> typedesc = substr(line,9,15))
> baseno = substr(line,24,5)
> suffix = substr(line,29,20)
> use = substr(line,49,1)
> usedesc = substr(line,51,24)
> order = substr(line,75,2)
> orderdesc = substr(line,77,25)
> auth = substr(line,102,2)
> authdesc = substr(line,104,8)
>
> or:
>
> parse var line 1 cat 7 status 8 type 9 typedesc 24,
> baseno 29 suffix 49 use 51 usedesc 75 order 77,
> orderdesc 102 auth 104 authdesc 111
>
> Some of the end columns in my parse example may not correspond exactly
> to the lengths specified on the respective SUBSTR lines, but you know
> what I mean.
>
> Sometime long ago I was told that using PARSE was more efficient, which
> seems logical--one operation vs. many. But a colleague recently
> suggested that we don't really know (*we* don't) how many operations the
> parse is doing behind the scenes.
>
> I know somebody out there does know. Care to help out?
>
> TIA
> Pat Futato
>
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
It is more efficient (less cpu cycles) to use one statement, i.e. PARSE
then to use multiple SUBSTR statements.
Not only less REXX code is executed, but less function calls are used.
Put your test cases in a loop and compare the times.
In general, the less REXX code, the better.
--
o It almost always doesn't matter.
Execution time will be dominated by something else.
o For the compiled rexx case, I'll guess that it probably doesn't matter.
o For uncompiled rexx, parse wins the race.
But maybe not for the reasons you might think. In some testing I did
a few years ago, (OK several years ago), execution time for
straight-line rexx code was relatively independent of the code, but
highly dependent on the number of characters that needed to be
processed in interpretation. Bigger program segment runs longer.
o For more than one string extraction, I always teach and use parse.
Using parse will impress your friends. Multiple assignment statements
where a single statement could be used just isn't cool. :-)
o It really doesn't matter.
Now, is it better to use individual variables in a program, or use a
stem with the indices being value names? (G.inputFM, for example) :-)
Extra credit: Should "procedure" be used on internal functions?
cheers, wayne
Wayne T. Smith Systems Group -- UNET Technology Services
w...@Maine.edu University of Maine System
> ...
> Now, is it better to use individual variables in a program, or use a
> stem with the indices being value names? (G.inputFM, for example) :-)
>
My experience was that using stems was the only way to create data
structures. Performance dropped dramatically once I went to four or
five levels ( Large programs, but if they weren't, I wouldn't have
needed structures that deep).
> Extra credit: Should "procedure" be used on internal functions?
Can't recall any particular performance penalty/benefit, but I ALWAYS
used it in large progs to localize variables.
<Delete now if not interested in the opinions of apostates>
Rexx really could use an overhaul to implement the OO paradigm.
I tried writing the OO model with Rexx; the results were truly ugly.
Admittedly I was very new to OO at the time, but experience has only
reinforced my perception that OO and Rexx are a bad fit.
A year or so ago I got into a a perl vs. Rexx discussion (on this
list?) At the time I knew nothing about perl, but pointed out that
Rexx is so simple that it makes writing a prog (well, one for which
Rexx is fit at all) very little effort. I still stand by that, but
I have got to say that (now that I'm beginning to get comfortable
with perl) that perl can do things fairly simply that Rexx simply
can't - OO, references, structures, pass by reference, and others
that as a scripting language it's a hands down win for perl. Yes
perl is an ugly language (Rexx is syntactically simple, perl is
syntactically quite complex -- watch the bison messages when you
compile it) and that makes it harder to learn and use, but it is
more than made up for in its raw power.
Now maybe to keep Rexx simple, it doesn't have to be as powerful
as perl, but it really, really needs some kind of makeover to support
modern programming style if there is to be any hope of future interest
in it. (Sorry, but IMHO NetRexx doesn't cut it - you just convince
people to use java; [no objections from me but its really not a
"scripting" language]).
-- TWZ
+-------------------------------------------------------------------------+
! Copyright 1998 by Terrence W. Zellers. All rights explicitly reserved. |
|-------------------------------------------------------------------------|
| terrence....@pobox.com | Warning: The Surgeon General has found |
| | that smoking may cause some individuals |
| | to ignore the Surgeon General. |
+-------------------------------------------------------------------------+
This is not necessarily bad. If you must process each record with Rexx
(and not in a PIPE), then EXECIO diskread (withOUT FINIS) is one of the
fastest ways to read the file.
Thanks Alan, and everyone else for the very complete information. I see
that my original question may have been a bit simplistic, having only a
subset of the original code.
In fact,
- there are actually 86 SUBSTR operations per line, not the 12 I showed,
- they happen in a loop over ~25K records,
- using non-compiled REXX (but the compiler IS on order),
- daily.
To add insult to injury, each record is acquired via an individual
EXECIO diskread.
(In my own defense, I now must note that I didn't write this code!)
Looks like my work is cut out for me!
Thanks again! --pat
>This is not necessarily bad. If you must process each record with Rexx
>(and not in a PIPE), then EXECIO diskread (withOUT FINIS) is one of the
>fastest ways to read the file.
I've also found that upping the record count for EXECIO/DISKR significantly
improves performance. For example....
/* */
infile = 'MEGABYTE DAT *' /* 13,108 record F 80 file */
do forever
'EXECIO 1 DISKR' infile '(VAR REC'
if rc ^= 0 then leave
/* process rec */
end
'FINIS' infile
On my system the above takes 8.14 seconds real time and 7.43 seconds CPU
time. Whereas this...
/* */
infile = 'MEGABYTE DAT *'
rRC = 0
do while rRC = 0
'EXECIO 1000 DISKR' infile '(STEM REC.'
rRC = rc
do i = 1 to rec.0
/* process rec.i */
end
end
'FINIS' infile
...only takes 1.44 seconds real time and 0.97 seconds of CPU time. Oddly,
increasing the 1000 record count value tended to use more CPU time
and take longer. This would likely change depending on how much
processing is done with the individual records. But, it's clear that
reading the data in bunches improves things in a big way.
-- Rich