Paul
I like the way you were able to stick an umlaut on the "u" of "uber"
(sadly, sans umlaut).
For grins, try you search using with your '0010524' in a format and if
statement. I seem to recall that SAS uses a binary tree search for looking
up format values.
Nat Wooding
the unter geek
"Choate,
Paul@DDS"
<pcho...@DDS.CA.G To
OV> SA...@LISTSERV.UGA.EDU
Sent by: "SAS(r) cc
Discussion"
<SA...@LISTSERV.U Subject
GA.EDU> OT: Binary Tree Lookups for Dummies
04/14/2006 12:32
PM
Please respond to
"Choate,
Paul@DDS"
<pcho...@DDS.CA.G
OV>
One goals for the last SUGI was to learn about and start applying more
optimal lookup and merging methods. Of course Paul Dorfman's most
excellent papers on hashing offering a most interesting place to start.
A couple weeks later, still not nearly caught up at work, but noodling a
little about something heard - binary tree searching. Hadn't thought about
it much before, but the light in my brain flashed on for a brief second..
So here it is - my home-brewed binary tree lookup. Sorry if this is too
pre-school to any über geeks out there, please have mercy.
Any comments? Grossly missed efficiencies? How this is worse (or better?)
than datastep lookups out there in practice?
Does this always work? Does the ceil function keeps me out of trouble,
hmmmmm? It worked on a few dozen tests.
The distance between my target and sample could be interpolated rather than
simply halved with each iteration. Attempting interpolation seemed
inelegant, and this had already had become uglier than anticipated.
Tests show that in a set of .6 million records it takes fewer than 20
lookups to find the target.
The data are sorted on UCI, a seven digit character string.
data found_it;
find='0010524';
*using Zeno's arrow to hit a target - iterate up & down by nobs/2**n;
tries+1;
direction+0;
sample+sum(direction=0,direction)*(ceil(nobs/(2**tries))<>1);
sample=sample><nobs; *if outside 1<->nobs reset to min or max;
sample=sample<>1;
set big_data(keep=uci) point=sample nobs=nobs;
direction= sum(((uci>find)*(-1)),((uci<find)*1));
if uci=find or ceil(nobs/(2**tries))=1 then do;
record=i;
if uci=find then output;
else put / / 'Not Found:' (_all_)(=);
stop;
end;
run;
TIA
Paul Choate
DDS Data Extraction
(916) 654-2160
-----------------------------------------
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.