I didn't mention it before, but Adrian's suggestion echoes some of the earlier assumptions that I'm working in batch. I happen to like REXX and tend to use it a lot because it's supremely flexible and RELATIVELY fast - very fast for an interpreted language - but I should be more careful not to limit myself to just one tool. The problem with SORT is I do a lot of my work in the foreground.
There's nothing to stop me from writing a REXX call to SORT in the foreground. In fact, I have an external REXX routine named SORTQ that, as the name suggests, sorts the contents of the queue. The problem is that at about half the installations I've worked at, the routine fails; I haven't figured out why, but it seems to fail to see the SYSIN record(s), or it sees as garbage. I've tried a bunch of different combinations, but finally got discouraged and haven't tried recently. Anyone have any clues about this? If anyone here has tried that sort of thing, maybe I'll take it up again and give symptoms in hopes that you can identify the problem.
/* Politicians used to understand, without being told, that they didn't necessarily have whatever it takes to fill our lives with meaning. Their job was to fill potholes. -Joseph Sobran */
-----Original Message-----
From: TSO REXX Discussion List [mailto:
TSO-...@VM.MARIST.EDU] On Behalf Of Adrian Stern
Sent: Friday, May 24, 2013 09:38
Use a sort routine in batch to produce a subset of the records. That'll be the most effective method of selecting the records of the right type.
-----Original Message-----
From: TSO REXX Discussion List [mailto:
TSO-...@VM.MARIST.EDU] On Behalf Of Bob Bridges
Sent: den 23 maj 2013 17:02
It's a small thing, but I do it pretty often so I'm curious about which way is more efficient: I often have to read a dataset with between .5M and 1.5M records (say 100 cylinders), and multiple record types. Usually what I want is only a small subset of the records, so my logic looks like this:
/* For each record: */
parse var record 1 type +4 6 key +8 75 name +20 /* ...etc */
if type<>"0200" then iterate
if pos("XYZ",name)=0 then iterate
/* ...and so on */
We've all heard how efficient the PARSE statement is, and I use it quite a bit when reading this type of file. But the PARSE statement above is executed on every record, whether or not it's one of the few 0200 records. So would this be more efficient?
/* For each record: */
parse var record 1 type +4
if type<>"0200" then iterate
parse var record 6 key +8 75 name +20 /* ...etc */
if pos("XYZ",name)=0 then iterate
/* ...and so on */
The first statement operates on all the records, say a million of them, and everything except the 0200 records are eliminated; then the second statement operates on just the 0200s, say 20K records. Do I lose more by doing the full parse on all the records on the dataset, or by initiating an additional PARSE statement on just a few of them?
I can do my own test, if I want to know badly enough; I'm not asking anyone to do benchmark testing for me. But if someone ALREADY knows the answer, I'm interested in hearing it. Or even in some discussion and guessing, if you're interested.