better parsing algorithm?

John McKown

unread,

Apr 28, 2013, 3:56:37 PM4/28/13

to

I need to process some data which is generated on UNIX. The data is
in tab separated format. That is, each element is separated by a tab
(x'05' in EBCDIC). What I'm doing is similar to:

tab='05'x
/* assume data is in non-stem variable data */
/* I will put each value in data.n where n starts at 1 */
data.0=0
i=0
do while length(data) > 0
i=i+1
parse var data value (tab) data
data.i=value
data.0=i
end

OK, I can tighten it up a bit by using "data.i" in the parse instead of the
temporary variable "value". And I don't need to assign i to data.0 on
each iteration; I could do it after the "end" of the loop. But I do this just
in case a less knowledgeable person needs to modify the code. I,
personally, consider it easier to understand this way. But I could be
wrong, of course.

What I am hoping for is a "better" or "more understandable" algorithm.
No, I can't
translate the tabs to blanks, or likely do any other type of
translate. The values
may legitimately have blanks as part of the value. Or even other
characters except for tabs.

--
This is a test of the Emergency Broadcast System. If this had been an
actual emergency, do you really think we'd stick around to tell you?

Maranatha! <><
John McKown

----------------------------------------------------------------------
For TSO-REXX subscribe / signoff / archive access instructions,
send email to LIST...@VM.MARIST.EDU with the message: INFO TSO-REXX

Walter Pachl

unread,

Apr 28, 2013, 4:47:34 PM4/28/13

to

How about
data.=0
do i=1 By 1 While length(data) > 0
parse var data data.i (tab) data
end
data.0=i-1
--
Walter Pachl

---- John McKown <john.arch...@GMAIL.COM> schrieb:

John McKown

unread,

Apr 28, 2013, 9:04:48 PM4/28/13

to

Nice. A bit shorter, but the same general algorithm. Definitely will run a
tad faster. Thanks.
On Apr 28, 2013 3:47 PM, "Walter Pachl" <christel....@chello.at>
wrote:

Paul Gilmartin

unread,

Apr 28, 2013, 10:08:03 PM4/28/13

to

On 2013-04-28, at 19:04, John McKown wrote:

> Nice. A bit shorter, but the same general algorithm. Definitely will run a
> tad faster. Thanks.
>

A while back there was a tedious thread on "Performance" here.
I performed an experiment but didn't report the result. PARSE
seems usually to be the best choice. For example,

parse var S . X .

is measurably faster than

X = word( S, 2 )

The string operations would be similar; I attribute the difference
to function call overhead.

>> ---- John McKown <john.arch...@GMAIL.COM> schrieb:
>>>

>>> No, I can't
>>> translate the tabs to blanks, or likely do any other type of translate. The values
>>> may legitimately have blanks as part of the value. Or even other characters except for tabs.
>>>

Thanks. I cringe whenever someone proposes a solution to any such
problem that begins: "Translate every occurrence of 'X' to some
character that doesn't otherwise occur in the input. ..." I can't
always control what's not in the input.

It sounds as if you're dealing with exported spreadsheet data. And
you might yet have trouble with tabs in quoted data strings:

Field 1<TAB>"Field<TAB>2"<TAB>Field 3

Faced with such a problem lately, I exported the spreadsheet from
Star/Oracle/Open/Libre/NeoOffice as HTML. (I parsed it with awk.)

-- gil

John McKown

unread,

Apr 28, 2013, 10:14:17 PM4/28/13

to

A good thought, but I generate the data myself, so I know there are no tabs
as data, only as a separator. But I will need to remember that, in the
general case. Thanks. I'm going with the parse in the better loop, I guess.
Honestly, this should be a low use application and extreme performance is
not a requirement. I am a fool for readability and performance.