Another efficiencies question

Bob Bridges

unread,

May 23, 2013, 12:02:32 PM5/23/13

to

It's a small thing, but I do it pretty often so I'm curious about which way is more efficient: I often have to read a dataset with between .5M and 1.5M records (say 100 cylinders), and multiple record types. Usually what I want is only a small subset of the records, so my logic looks like this:

/* For each record: */
parse var record 1 type +4 6 key +8 75 name +20 /* ...etc */
if type<>"0200" then iterate
if pos("XYZ",name)=0 then iterate
/* ...and so on */

We've all heard how efficient the PARSE statement is, and I use it quite a bit when reading this type of file. But the PARSE statement above is executed on every record, whether or not it's one of the few 0200 records. So would this be more efficient?

/* For each record: */
parse var record 1 type +4
if type<>"0200" then iterate
parse var record 6 key +8 75 name +20 /* ...etc */
if pos("XYZ",name)=0 then iterate
/* ...and so on */

The first statement operates on all the records, say a million of them, and everything except the 0200 records are eliminated; then the second statement operates on just the 0200s, say 20K records. Do I lose more by doing the full parse on all the records on the dataset, or by initiating an additional PARSE statement on just a few of them?

I can do my own test, if I want to know badly enough; I'm not asking anyone to do benchmark testing for me. But if someone ALREADY knows the answer, I'm interested in hearing it. Or even in some discussion and guessing, if you're interested.

---
Bob Bridges
rhb...@attglobal.net, cell 336 382-7313
rbri...@InfoSecInc.com

/* I've gone into hundreds of [fortune-tellers' parlors], and have been told thousands of things, but nobody ever told me I was a policewoman getting ready to arrest her. -a New York City detective */

----------------------------------------------------------------------
For TSO-REXX subscribe / signoff / archive access instructions,
send email to LIST...@VM.MARIST.EDU with the message: INFO TSO-REXX

Andreas Fischer

unread,

May 23, 2013, 12:11:09 PM5/23/13

to

hi,

i would guess that number 2 is supposed to be faster, but what i've
learned from rexx when it comes to performance issue is simple: test both
version and measure. there were more than one surprise i got when i tried
out different methods.

guessing now that your description sounds like you go through an unloaded
racf data base, i recommend you to use pgm=sort instead of rexx. it's so
much faster that it's worth the effort, and you can replace simple rexx
programs completely with pgm=sort.

regards,
andi

TSO REXX Discussion List <TSO-...@VM.MARIST.EDU> schrieb am 23.05.2013
18:02:10:

Dave Salt

unread,

May 23, 2013, 12:29:57 PM5/23/13

to

If I were to guess I'd think the second method is faster. But why guess when it's easy enough to do this:

cpu = sysvar("SYSCPU")
srv = sysvar("SYSSRV")
x = time("ELAPSED")

logic goes here....

say "CPU = "sysvar("SYSCPU") - cpu
say "SRV = "sysvar("SYSSRV") - srv
say "Elapsed time = "time("ELAPSED")

Dave Salt

SimpList(tm) - try it; you'll get it!

http://www.mackinney.com/products/program-development/simplist.html

> Date: Thu, 23 May 2013 12:02:10 -0400
> From: rhb...@ATTGLOBAL.NET
> Subject: Another efficiencies question
> To: TSO-...@VM.MARIST.EDU

Paul Gilmartin

unread,

May 23, 2013, 12:36:17 PM5/23/13

to

On 2013-05-23, at 10:02, Bob Bridges wrote:
>
> ---

> /* I've gone into hundreds of [fortune-tellers' parlors], and have been told thousands of things, but nobody ever told me I was a policewoman getting ready to arrest her. -a New York City detective */
>

Of course not. That would invoke the Epimenides Paradox.

-- gil

Thomas Berg

unread,

May 23, 2013, 12:37:10 PM5/23/13

to

Perhaps not relevant - as your point may be about parse - but if the subset of "0200" is as small compared to all records I would begin with:

If Substr(record,1,4) == '0200' Then Iterate
parse var record . 6 key +8 75 name +20 /* ...etc */

if pos("XYZ",name)=0 then iterate
/* ...and so on */

Regards
Thomas Berg
____________________________________________________________________
Thomas Berg Specialist z/OS\RQM\IT Delivery SWEDBANK AB (Publ)

Paul Gilmartin

unread,

May 23, 2013, 12:42:16 PM5/23/13

to

On 2013-05-23, at 10:35, Thomas Berg wrote:

> Perhaps not relevant - as your point may be about parse - but if the subset of "0200" is as small compared to all records I would begin with:
>
> If Substr(record,1,4) == '0200' Then Iterate
>

Ummm. No. My experience is that a function call is more
expensive than PARSE.

-- gil

Thomas Berg

unread,

May 23, 2013, 12:48:54 PM5/23/13

to

> -----Original Message-----
> From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On

> Behalf Of Paul Gilmartin
> Sent: Thursday, May 23, 2013 6:42 PM
> To: TSO-...@VM.MARIST.EDU
> Subject: Re: [TSO-REXX] Another efficiencies question
>
> On 2013-05-23, at 10:35, Thomas Berg wrote:
>
> > Perhaps not relevant - as your point may be about parse - but if the
> subset of "0200" is as small compared to all records I would begin
> with:
> >
> > If Substr(record,1,4) == '0200' Then Iterate
> >
> Ummm. No. My experience is that a function call is more expensive
> than PARSE.

Really ? I would expect otherwise as Substr() is a relatively simplier than parse.
(With substr() I would expect the parser to just store the string '0200' and position (+length) for direct compare.
With parse it have to create and store the involved variables. Plus the actual compare.)

I think I will check it.

Regards
Thomas Berg
____________________________________________________________________
Thomas Berg Specialist z/OS\RQM\IT Delivery SWEDBANK AB (Publ)

Paul Gilmartin

unread,

May 23, 2013, 12:59:33 PM5/23/13

to

On 2013-05-23, at 10:48, Thomas Berg wrote:
>
> Really ? I would expect otherwise as Substr() is a relatively simplier than parse.
> (With substr() I would expect the parser to just store the string '0200' and position (+length) for direct compare.
> With parse it have to create and store the involved variables. Plus the actual compare.)
>

In fairness, I compared:

X = left( S, 4 )

with:

parse var S X 5 .

... so I didn't have the cost of assignment in one case but not
in the other. I concluded that PARSE generates inline code;
LEFT() calls a function, involving entry/exit overhead.

> I think I will check it.
>

Good idea.

-- gil

Thomas Berg

unread,

May 23, 2013, 1:19:41 PM5/23/13

to

I run the rexx below interpreted with this result:

SUBSTR CPU = 5.98
SUBSTR SRV = 49509.98
SUBSTR Elapsed = 10.337013

PARSE CPU = 7.00
PARSE SRV = 58075.00
PARSE Elapsed = 11.160960

When I compiled the rexx I got:

SUBSTR CPU = 3.09
SUBSTR SRV = 25590.09
SUBSTR Elapsed = 6.447525

PARSE CPU = 3.47
PARSE SRV = 28928.47
PARSE Elapsed = 7.572765

Not much but significant, I think.

/* REXX */
loop1 = 5
loop2 = 10
scpu = 0
ssrv = 0
sela = 0
pcpu = 0
psrv = 0
pela = 0
x = time('R')
'ALLOC F(PERFT) DA(EXEC.REXX(STEPDATO)) SHR REUSE'

Do loop1

cpu = sysvar("SYSCPU")
srv = sysvar("SYSSRV")

Do loop2
'EXECIO * DISKR PERFT (STEM R. FINIS' /* This reads about 30000 records */

Do j = 1 To r.0
If Substr(r.j,1,4) /== ' AS' Then Iterate j /* This skips all but about 110 */
x1 = 'sghdty356565d dghgdhgh' r.j
End

End

scpu = scpu + sysvar("SYSCPU") - cpu
ssrv = scpu + sysvar("SYSSRV") - srv
sela = sela + time('R')

cpu = sysvar("SYSCPU")
srv = sysvar("SYSSRV")

Do loop2
'EXECIO * DISKR PERFT (STEM R. FINIS'

Do j = 1 To r.0
Parse Var r.j 1 type +4 6 key +8 75 name +20
If type /== ' AS' Then Iterate j
x1 = 'sghdty356565d dghgdhgh' r.j
End

End

pcpu = pcpu + sysvar("SYSCPU") - cpu
psrv = pcpu + sysvar("SYSSRV") - srv
pela = pela + time('R')
End

say "SUBSTR CPU = "scpu
say "SUBSTR SRV = "ssrv
say "SUBSTR Elapsed = "sela
say ''
say "PARSE CPU = "pcpu
say "PARSE SRV = "psrv
say "PARSE Elapsed = "pela

Exit 0

Regards
Thomas Berg
____________________________________________________________________
Thomas Berg Specialist z/OS\RQM\IT Delivery SWEDBANK AB (Publ)

Thomas Berg

unread,

May 23, 2013, 1:27:36 PM5/23/13

to

When I removed the repeated EXECIO's (those because I tried to avoid optimizations that wasn't relevant here) and have only one EXECIO in the start + loop1 = 10 and loop2 = 15:

SUBSTR CPU = 7.61
SUBSTR SRV = 31610.61
SUBSTR Elapsed = 8.208575

PARSE CPU = 10.73
PARSE SRV = 44178.73
PARSE Elapsed = 11.442840

(This is interpreted.)

> -----Original Message-----
> From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On
> Behalf Of Thomas Berg
> Sent: Thursday, May 23, 2013 7:19 PM
> To: TSO-...@VM.MARIST.EDU
> Subject: Re: [TSO-REXX] Another efficiencies question
>

Adrian Stern

unread,

May 24, 2013, 9:38:07 AM5/24/13

to

Use a sort routine in batch to produce a subset of the records. That'll be the most effective method of selecting the records of the right type.

-----Original Message-----
From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of Bob Bridges
Sent: den 23 maj 2013 17:02
To: TSO-...@VM.MARIST.EDU
Subject: [TSO-REXX] Another efficiencies question

Bob Bridges

unread,

May 24, 2013, 10:26:36 AM5/24/13

to

I didn't mention it before, but Adrian's suggestion echoes some of the earlier assumptions that I'm working in batch. I happen to like REXX and tend to use it a lot because it's supremely flexible and RELATIVELY fast - very fast for an interpreted language - but I should be more careful not to limit myself to just one tool. The problem with SORT is I do a lot of my work in the foreground.

There's nothing to stop me from writing a REXX call to SORT in the foreground. In fact, I have an external REXX routine named SORTQ that, as the name suggests, sorts the contents of the queue. The problem is that at about half the installations I've worked at, the routine fails; I haven't figured out why, but it seems to fail to see the SYSIN record(s), or it sees as garbage. I've tried a bunch of different combinations, but finally got discouraged and haven't tried recently. Anyone have any clues about this? If anyone here has tried that sort of thing, maybe I'll take it up again and give symptoms in hopes that you can identify the problem.

---
Bob Bridges
rhb...@attglobal.net, cell 336 382-7313
rbri...@InfoSecInc.com

/* Politicians used to understand, without being told, that they didn't necessarily have whatever it takes to fill our lives with meaning. Their job was to fill potholes. -Joseph Sobran */

-----Original Message-----
From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of Adrian Stern
Sent: Friday, May 24, 2013 09:38

Use a sort routine in batch to produce a subset of the records. That'll be the most effective method of selecting the records of the right type.

-----Original Message-----
From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of Bob Bridges
Sent: den 23 maj 2013 17:02

It's a small thing, but I do it pretty often so I'm curious about which way is more efficient: I often have to read a dataset with between .5M and 1.5M records (say 100 cylinders), and multiple record types. Usually what I want is only a small subset of the records, so my logic looks like this:

/* For each record: */
parse var record 1 type +4 6 key +8 75 name +20 /* ...etc */
if type<>"0200" then iterate
if pos("XYZ",name)=0 then iterate
/* ...and so on */

We've all heard how efficient the PARSE statement is, and I use it quite a bit when reading this type of file. But the PARSE statement above is executed on every record, whether or not it's one of the few 0200 records. So would this be more efficient?

/* For each record: */
parse var record 1 type +4
if type<>"0200" then iterate
parse var record 6 key +8 75 name +20 /* ...etc */
if pos("XYZ",name)=0 then iterate
/* ...and so on */

The first statement operates on all the records, say a million of them, and everything except the 0200 records are eliminated; then the second statement operates on just the 0200s, say 20K records. Do I lose more by doing the full parse on all the records on the dataset, or by initiating an additional PARSE statement on just a few of them?

I can do my own test, if I want to know badly enough; I'm not asking anyone to do benchmark testing for me. But if someone ALREADY knows the answer, I'm interested in hearing it. Or even in some discussion and guessing, if you're interested.

Steve Thompson

unread,

May 24, 2013, 10:53:32 AM5/24/13

to

From: Bob Bridges <rhb...@ATTGLOBAL.NET>
Date: 05/24/2013 10:27 AM

I didn't mention it before, but Adrian's suggestion echoes some of the
earlier assumptions that I'm working in batch. I happen to like REXX and
tend to use it a lot because it's supremely flexible and RELATIVELY fast -
very fast for an interpreted language - but I should be more careful not
to limit myself to just one tool. The problem with SORT is I do a lot of
my work in the foreground.

There's nothing to stop me from writing a REXX call to SORT in the
foreground. In fact, I have an external REXX routine named SORTQ that, as
the name suggests, sorts the contents of the queue. The problem is that
at about half the installations I've worked at, the routine fails; I
haven't figured out why, but it seems to fail to see the SYSIN record(s),
or it sees as garbage. I've tried a bunch of different combinations, but
finally got discouraged and haven't tried recently. Anyone have any clues
about this? If anyone here has tried that sort of thing, maybe I'll take
it up again and give symptoms in hopes that you can identify the problem.

<SNIPPAGE>

I have written REXX that invokes SORT internally. I have made sure that I
do not use the DDnames that SORT expects and I allocate and load temp
files with what I need to tell sort and then invoke it. It works and it is
rather fast whether DFSORT or SYNCSORT (the two I have needed to do this
with).

You may find that you will have to determine which sort is available
(dynamically) and invoke accordingly.

Once you do this, you may find that you can read the entire sortout into a
stem variable and not have to do any more I/O once so loaded.

Regards,
Steve Thompson

Thomas Berg

unread,

May 24, 2013, 11:19:08 AM5/24/13

to

I have a SORTS() function that sorts the stack using *only rexx code* and *no files* and another function that calls (DF)SORT.
Most of the times I have been satisfied with using the SORTS() function - but then I have that one compiled.

Regards
Thomas Berg
____________________________________________________________________
Thomas Berg Specialist z/OS\RQM\IT Delivery SWEDBANK AB (Publ)

> -----Original Message-----
> From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On

> Behalf Of Steve Thompson
> Sent: Friday, May 24, 2013 4:52 PM
> To: TSO-...@VM.MARIST.EDU
> Subject: Re: [TSO-REXX] Another efficiencies question
>

Paul Gilmartin

unread,

May 24, 2013, 11:38:09 AM5/24/13

to

On 2013-05-24, at 08:51, Steve Thompson wrote:

> From: Bob Bridges
> Date: 05/24/2013 10:27 AM
>

> T...The problem is that

> at about half the installations I've worked at, the routine fails; I
> haven't figured out why, but it seems to fail to see the SYSIN record(s),

> or it sees as garbage. ...
>
Some sites preallocate SYSIN in a TSO LOGON proc. Might
that be conflicting with your usage?

> <SNIPPAGE>
>
> . I have made sure that I do not use the DDnames that SORT expects ....
>
BPXWDYN( 'rtddn(NAME) ...' ) is very useful for this purpose. Also,
does SORT support an alternate DDNAME list?

-- gil