On 2013-04-05 23:29, glen herrmannsfeldt wrote:
> Robert AH Prins <
spam...@prino.org> wrote:
>
>> I recently had a look at the code generated by the INDEX builtin. In this
>> particular program that uses it, I'm using it to add (or not) new
>> 4-character+space items (items do not have a leading space) to a CHAR VAR, i.e.
>
>> if index(set, new || ' ') = 0 then
>> set ||= new || ' ';
>
> Hmm. Why not an array of CHAR(4) instead?
Funny...
When I joined my first employer in 1985 as a trainee, everyone of our group
actually used an array, until our instructor showed the string method, and ran
the compiler with the "FLOW" & "COUNT" options, showing that the INDEX method
only resulted in the execution of a single statement, rather than the average
n/2 that an array would use and telling us that using a "long" CHAR VAR would
sort of future-proof the code, with no need to change the bounds of an array.
>> The complete sets are later used to index arrays, i.e.
>
>> i = (index(set, item || ' ') + 4) / 5;
>
> And a hash table to look them up?
The overhead for small sets is probably too high. If items would be a single
character, it would be a no-brainer.
>> and although the above is short, sweet and perfectly readable, it turns out to
>> be far from optimal...
>
>> Adding a bit of code to the program to output all 52,128 possible invocations of
>> the above, with sets varying from 1 to 80 items and item *always* present in the
>> set, and then using this output in another program to process all 52,128
>> possible invocations in a loop bracketed with the datetime() builtin showed that
>> using a simple
>
>> do i = 1 to length(set) by 5 until(item = substr(set, i, 4));
>> end;
>> i = (i + 4) / 5;
>
>> is about 25% faster (datetime()'ed over 100 loops of
>> the 52,128 actual variations) than
>
> So not a huge increase in speed.
Not huge, but definitely significant.
>> i = (index(set, item || ' ') + 4) / 5;
>
>> which makes me wonder, why isn't there an INDEXN or (POSN) builtin that would
>> increment the position of the next character in "set" to scan for "item" by the
>> length of "item", in other words,
>
> Even better, how about a Boyer-Moore version of INDEX. That would
> probably be pretty fast in this case. Though generating the tables
> takes some time.
Don't tell me, I've used BM in the days of Turbo Pascal and the gains were
pretty amazing.
>> if set = 'A1 B2 C3 D4 E5 F6 G7 ' and item = 'B2 ', this theoretical
>> "INDEXN" would do the compare of "item" to "set" only on positions
>> 1, 4, 7 etc of "set", creating an efficient way to scan these kind
>> of strings-(ab)used-as-unsorted-table, which in a way resembles
>> the REXX "wordpos" function.
>
> Seems to me that people who need to do that will usually use arrays.
Actually, at every employer and client I've worked I've seen this kind of code
being used, and until recently I had never given it a second thought, even
though my experience with Turbo Pascal should have given me cause to reconsider.
My INDEXN was partly suggested by the last item on page 57 of
http://proceedings.share.org/client_files/SHARE_in_San_Francisco/Session_12335_handout_4165_0.pdf
i.e.
<quote>
HANDLE operations
.
.
The sensitivity to the underlying type makes the behavior like that in C. So,
for example, adding 1 to a handle increases the associated pointer value by
the number of bytes in the underlying structure type
</quote>
And if there was an INDEXN-like builtin, there would be no need to add a dummy
character to each item to prevent mismatches on partial strings.