I found strchr() to be the fastest search available on the X86. This
became the front-end to my new search routine. The result is strchrstr(),
that uses a fast strchr() to find potential matches, and then does a
strncmp() on the remainder of the target string.
The new function is a direct replacement for strstr(). Since this
approach made quite a difference, I made another change, and did a
second character compare before the strncmp(). This is strchrstr2(),
which is NOT an exact replacement for strstr(), because the target
string must be at least 2 characters long.
TESTING:
The search space is a 9 Mb text file, searching for a nine character
target string and looped 255 times. The tests were run on RHL 6.2
and 7.0. The kernel builds were 2.2.14 and 2.2.16 with glibc versions
2.1.3 and 2.1.92 respectively.
system | strstr| strchrstr | strchrstr2 |
------------------------------------------| (timed with gettimeofday())
5x86-160 | 195.5 | 141.4 | 121.9 | RHL 6.2
P-166 | 97.7 | 58.9 | 52.3 | RHL 7.0
Ath-840 | 18.5 | 16.8 | 15.5 | RHL 6.2
Links to the source code for all three test programs and the alternate
search routine are at: http://www.SPIDynamics.com/speed_search.html
/*Fast, compatible strstr() replacement.*/
char *strchrstr(char *haystack, char *needle)
{
size_t sl;
if(needle && haystack)
{
sl = strlen(needle);
if(!sl) return(haystack);
sl--;
while(1)
{
haystack = strchr(haystack, *needle);
if(haystack)
{
if(!sl) return(haystack);
if(!strncmp(haystack + 1, needle + 1, sl)) return(haystack);
haystack++;
}
else
break;
};
}
return((char *)NULL);
}
DB
dbar...@SPIDynamics.com
--
comp.lang.c.moderated - moderation address: cl...@plethora.net
[snip - strstr not fast enough]
> I found strchr() to be the fastest search available on the X86. This
> became the front-end to my new search routine. The result is strchrstr(),
> that uses a fast strchr() to find potential matches, and then does a
> strncmp() on the remainder of the target string.
...or you can try a regex package.
david
--
If 91 were prime, it would be a counterexample to your conjecture.
-- Bruce Wheeler
>TESTING:
>The search space is a 9 Mb text file, searching for a nine character
>target string and looped 255 times. The tests were run on RHL 6.2
>and 7.0. The kernel builds were 2.2.14 and 2.2.16 with glibc versions
>2.1.3 and 2.1.92 respectively.
This looks like a very suitable application for the Boyar-Moore string
searching method. That is a "well known" algorithm which searches a
large string for a fixed short string. It uses the properties of the
short string to optimise the searching of the longer string. It consists
of less than 50 lines of C code and is published. You will find it
fairly easily with Boyar Moore and search. You can whet your appetite
with http://www.cs.utexas.edu/users/best-ideas/string-searching
I have no experience with using this algorithm. It is reported to be
good. From my reading of it, I would expect you to be able to double the
speed of your tests to date (and then some!)
--
Walter Briscoe
> I found strchr() to be the fastest search available on the X86. This
> became the front-end to my new search routine. The result is strchrstr(),
> that uses a fast strchr() to find potential matches, and then does a
> strncmp() on the remainder of the target string.
Your "Subject:" line is at least partly misleading. You do not have a
function faster than strstr() --- you have one faster than the
particular implementation of strstr() you're using, on one particular
architecture, for one size range of "needle", and a rather large
"haystack".
Actually, there are algorithms significantly faster than the
standardized strstr() for your particular application profile, and
also faster than yours. Do a literature search on Boyer-Moore
searching, and you'll see.
--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
There are much more optimal ways to search for a target string in
a body of text. For a review of some of them, see an article in the
journal "Software Practice and Experience" from some considerable
number of years ago (can't recall, except I know it must be pre-1985.)
My recollection was that the best general algorithm was not dissimilar
to yours (use strchr, then do some matching) but with some critical
differences. Primarily, it depended on knowledge of likely letter frequencies
in the searched text. A full explanation is off-topic for almost all the
groups you posted to (IMHO), but you basically search first for the letter
in the target which is least likely to occur. You also construct a jump
table which tells you, having examined a particular letter, how many tests
you can skip. E.g. you are searching for "easy". You start by looking
at the 4th letter of the sample text and see if it is "y". If it isn't,
but is (say) an "m", you can skip the next three letters and examine
position 8 to see if it is "y". And so on.
Better algorithms exist for more tightly-defined problem spaces, but this
is easy to implement and provably fast. It has the advantage that search
speed increases as the length of the target string increases. (OK, that's
sometimes an advantage and sometimes not so good.)
Kevin Ashley
That link's dead. Try
http://www-igm.univ-mlv.fr/~lecroq/string/node14.html
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91351-4454 96.37% of all statistics are made up
Per the FCA, this email address may not be added to any commercial mail list
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91351-4454 96.37% of all statistics are made up
Per the FCA, this email address may not be added to any commercial mail list
> DB wrote:
>
> [snip - strstr not fast enough]
>> I found strchr() to be the fastest search available on the X86. This
>> became the front-end to my new search routine. The result is
>> strchrstr(), that uses a fast strchr() to find potential matches, and
>> then does a strncmp() on the remainder of the target string.
>
> ...or you can try a regex package.
Surely, regex, which is more complex than strstr will be slower?
-Ed
--
You can't go wrong with psycho-rats.
u 9 8 e j r (at) e c s . o x . a c . u k
This URL 404's. Do you have another?
cheers
-Ed
--
You can't go wrong with psycho-rats.
u 9 8 e j r (at) e c s . o x . a c . u k
Yes, Boyer-Moore tends to be best for the "average" string search
application; double the naive approach's speed is about right.
If one is searching for multiple occurrences of the same string,
the pattern set-up phase can be amortized over several searches.
>In article <clcm-2001...@plethora.net>, "David Rubin"
><dlr...@hotmail.com> wrote:
>
>> DB wrote:
>>
>> [snip - strstr not fast enough]
>>> I found strchr() to be the fastest search available on the X86. This
>>> became the front-end to my new search routine. The result is
>>> strchrstr(), that uses a fast strchr() to find potential matches, and
>>> then does a strncmp() on the remainder of the target string.
>>
>> ...or you can try a regex package.
>
>Surely, regex, which is more complex than strstr will be slower?
It would depend on the algorithm used, wouldn't it? I can imagine a
brute-force string matching algorithm going slower than a regexp
package which uses a better algorithm.
-Chris
> Surely, regex, which is more complex than strstr will be slower?
A Pentium III is more complex than an 8086, but which is slower?
--
"When in doubt, treat ``feature'' as a pejorative.
(Think of a hundred-bladed Swiss army knife.)"
--Kernighan and Plauger, _Software Tools_
>My current project for SPI Dynamics made extensive use of the strstr()
>function to detect malicious activity in web server log files. The
>performance of that search routine was quite a bottleneck while LogAlert
>was detecting exploit strings. The strstr() function seemed to be part
>of the problem, I had to find a way to speed things up.
>
>I found strchr() to be the fastest search available on the X86. This
>became the front-end to my new search routine. The result is strchrstr(),
>that uses a fast strchr() to find potential matches, and then does a
>strncmp() on the remainder of the target string.
>
Are you sure you were not just accidentally linking in a non-optimized
version of some kind? People have been fiddling with those routines
for a long time. They should be pretty tight.
Others have pointed out that an alternative search algorithm might
be faster for your application, but reading between the lines I'm
guessing you are calling strstr once for each of a list of "exploit
strings." If so, I suspect you could gain a LOT by generating a new
single-pass search program each time your "exploit string" list
changes, and running that.
Once you set up a make file and a simple script or two, it could be
very convenient. Make could even run the search for you.
The make would (1) run a script to translate the expoit string table
into input for flex. (2) Run flex to take that and generate source for
a parsing routine that would find your strings and call whatever
action routines you define (to interface to your current app, or
maybe just print warnings), (3) compile and link (with your app if
you aren't just printing warnings etc.), and (4) run it.
You could then just type make when you wanted to do a scan, and be
sure your search included the latest "exploit strings."
A detail re your results: How many times in the 9mb did the 9-char
target string occur? And how many partial matches? If only a few,
the results are mostly showing first-char search loop times. Of
course, that may be the normal (hopefully) case in your application.
A single pass total search will make a lot of difference if you have
a hundred strings to search for, and probably even if you only have 5,
if you are searching through 9mb.
How fast do you need to be able to do the whole job?
BTW, flex is available from gnu.org, and you can compile it for
windows as well as the usual.
HTH.
>The new function is a direct replacement for strstr(). Since this
>approach made quite a difference, I made another change, and did a
>second character compare before the strncmp(). This is strchrstr2(),
>which is NOT an exact replacement for strstr(), because the target
>string must be at least 2 characters long.
>
>TESTING:
>The search space is a 9 Mb text file, searching for a nine character
>target string and looped 255 times. The tests were run on RHL 6.2
>and 7.0. The kernel builds were 2.2.14 and 2.2.16 with glibc versions
>2.1.3 and 2.1.92 respectively.
>
> system | strstr| strchrstr | strchrstr2 |
>------------------------------------------| (timed with gettimeofday())
>5x86-160 | 195.5 | 141.4 | 121.9 | RHL 6.2
>P-166 | 97.7 | 58.9 | 52.3 | RHL 7.0
>Ath-840 | 18.5 | 16.8 | 15.5 | RHL 6.2
>
>Links to the source code for all three test programs and the alternate
>search routine are at: http://www.SPIDynamics.com/speed_search.html
>
[...]
1) slow down your PentiumIII to the frequency of your 8086
2) use only instructions from the 8086's set.
3) compare the speed.
You may be surprised by the result...
BTW I agree with you :-)
> --
> "When in doubt, treat ``feature'' as a pejorative.
> (Think of a hundred-bladed Swiss army knife.)"
> --Kernighan and Plauger, _Software Tools_
> --
> comp.lang.c.moderated - moderation address: cl...@plethora.net
a+, ld.
--
[ Laurent Deniau -- Data Analysis and Signal Processing ]
[ CERN -- The European Laboratory for Particle Physics ]
[ Laurent...@cern.ch - http://home.cern.ch/ldeniau ]
[ -- One becomes old when dreams become regrets -- ]
[ -- Il faut avoir une haute opinion sur ]
[ ce qu'on va faire, pas sur ce qu'on a fait -- ]
What I'm saying is that the regex has to do far more that a simple
search just to do a simple search. Surely you could get a fixed string
search algorithm to be faster than a much more general purpose regex
algorithm.
The extra complexity doesn't achieve anything in this case.
Which is faster at integers, a 200MHz Pentium, or a 200MHz StrongArm?
(In all the benchmarks I've seen the strongarm is faster, and it is much
simpler)
-Ed
--
You can't go wrong with psycho-rats.
u 9 8 e j r (at) e c s . o x . a c . u k
Good point. But the fastest fixed string algorithms have to be at least
as fast as regex ones (and likely faster by quite a bit).
-Ed
--
You can't go wrong with psycho-rats.
u 9 8 e j r (at) e c s . o x . a c . u k
Even more so, it will be a function of the problem size. If I have ten
character strings, then strstr() will probably do better than regexp(). If I
have thousand megabyte strings, I doubt if there is any contest.
Then again, why not try a web search to find stuff like this:
http://www.cs.purdue.edu/homes/stelo/pattern.html
--
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
"The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. FAQ: ftp://cap.connx.com/pub/Chess%20Analysis%20Project%20FAQ.htm
One would think so, but it's not always the case. When I ran timing
tests for my chapter on fast string searching in "Software Solutions
in C", I was surprised to discover that the native C library on SunOS
had an implementation of strstr() that performed *horribly*, 70 times
slower than a straight-forward, unoptimized naive implementation of
the obvious brute-force solution. How could this be? It turned out
that somebody had tried to "improve" the brute-force method to not
invoke strcmp() when the text being searched was shorter than the
pattern (this precaution is *not necessary*); they had changed the
loop termination condition from (*tx == '\0') to (strlen(tx) < pat_len).
It was the evaluation of strlen() each iteration that was the problem.
A carefully-tuned implementation of brute-force searching ran 4.3
times faster than the straightforward naive implementation, i.e. 300
times faster than the native SunOS C library! Boyer-Moore improved
upon that by a factor of 2, for the particular tests I ran.
I ran into that one too, maybe 10 years ago. I happened to be searching
for fixed strings in big hunks of data (multiple kilobytes at a time),
and what ran fast on HP-UX didn't finish on SunOS. I used profiling to
find that strlen was being called as often as strstr, and my code wasn't
calling strlen anywhere, so I knew the culprit. And by replacing the
SunOS strstr() with a crude version, I got back the high performance
found on HP-UX.
--
Yours,
Jonathan Leffler (Jonathan...@Informix.com) #include <disclaimer.h>
Guardian of DBD::Informix v1.00.PC1 -- http://www.perl.com/CPAN
"I don't suffer from insanity; I enjoy every minute of it!"
The best article that I've seen is "Fast String Searching" found in
"Software Practice and Experience", volume 21, no. 11, November 1991, pg
1221-1248. I don't know if that's really the article you mean, but
nonetheless it is a very good article comparing performance of various
algorithms, and it includes C source code.
--
Rex Barzee
Hewlett-Packard Unix Systems Enablement Lab
Have you tested an implementation of fgrep (fixed string match)
for comparison, or tried pulling the string search code from one?
Recent implementations should contain Boyer-Moore-Pratt searches
which use end first matching and a couple of arrays containing
skip offsets when unmatched and matched characters are seen to
run an order of magnitude faster than straight [split intended]
forward searches on single/multiple targets.
You can get some of the benefit in your routine by searching for
the last character in your search string and matching preceding
characters backwards to the start of the input and search
strings.
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
--
Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply
tos...@aol.com ab...@aol.com ab...@yahoo.com ab...@hotmail.com ab...@msn.com ab...@sprint.com ab...@earthlink.com ab...@cadvision.com ab...@ibsystems.com u...@ftc.gov
spam traps
fgrep and egrep are obsolete commands, roughly equivalent to grep -F and
grep -E. The grep program should be able to figure out whether or not a
pattern is a fixed string regardless of the command line options; the purpose
of -F is to disable the recognition of regular expression operators in the
search pattern, not to indicate that the search should be optimized in
some way.
[ ... ]
> Recent implementations should contain Boyer-Moore-Pratt searches
> which use end first matching and a couple of arrays containing
> skip offsets when unmatched and matched characters are seen to
> run an order of magnitude faster than straight [split intended]
> forward searches on single/multiple targets.
Just FWIW, I know of:
Knuth-Morris-Pratt
Boyer-Moore
Boyer-Moore-Horspool
Sunday's variant of Boyer-Moore-Horspool
but TTBOMK, Pratt was not involved in any variant on the Boyer-Moore
search algorithm. I suppose to be complete, I could mention that
I've invented another variant of Boyer-Moore that I believe is
original, but isn't on the list above. I'm pretty sure Pratt didn't
have anything to do with it either though. <G>
--
Later,
Jerry.
The Universe is a figment of its own imagination.
It's not the article I mean, but I expect it is a better review
of the issue than the one I was referring to. Until 1985 I worked
somewhere that had a subscription to SP+E, and I read each month's
issue as it came in, as well as consuming all the back issues when
I started the job. I remember it as an excellent and readable journal.
That's what enabled me to put a (vague) date on the article.
At the time of the 1980's article, the author(s) were interested
in comparing software algorithms with hardware implementations
of string search that were often found on CISC machines of the time.
I think they were pleasantly surprised that their best algorithm
beat the hardware hands down. The 'hardware' implementation
would almost certainly have used the naive implementation of strstr,
albeit in microcode and hence saving a number of fetch/decode
cycles. I doubt such comparisons are of interest any more.
Kevin Ashley.
>In article <clcm-2001...@plethora.net>,
>Brian.do...@Compuserve.com says...
>
>[ ... ]
>
>> Recent implementations should contain Boyer-Moore-Pratt searches
>> which use end first matching and a couple of arrays containing
>> skip offsets when unmatched and matched characters are seen to
>> run an order of magnitude faster than straight [split intended]
>> forward searches on single/multiple targets.
>
>Just FWIW, I know of:
>
>Knuth-Morris-Pratt
>Boyer-Moore
>Boyer-Moore-Horspool
>Sunday's variant of Boyer-Moore-Horspool
>
>but TTBOMK, Pratt was not involved in any variant on the Boyer-Moore
>search algorithm. I suppose to be complete, I could mention that
>I've invented another variant of Boyer-Moore that I believe is
>original, but isn't on the list above. I'm pretty sure Pratt didn't
>have anything to do with it either though. <G>
Looks like my memory hashed the initials into the same bucket and
munged the keys instead of chaining them.
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
--
Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply
tos...@aol.com ab...@aol.com ab...@yahoo.com ab...@hotmail.com ab...@msn.com ab...@sprint.com ab...@earthlink.com ab...@cadvision.com ab...@ibsystems.com u...@ftc.gov
spam traps
>In article <clcm-2001...@plethora.net>,
>Brian.do...@Compuserve.com says...
>
>[ ... ]
>
>> Recent implementations should contain Boyer-Moore-Pratt searches
>> which use end first matching and a couple of arrays containing
>> skip offsets when unmatched and matched characters are seen to
>> run an order of magnitude faster than straight [split intended]
>> forward searches on single/multiple targets.
>
>Just FWIW, I know of:
>
>Knuth-Morris-Pratt
>Boyer-Moore
>Boyer-Moore-Horspool
>Sunday's variant of Boyer-Moore-Horspool
>
>but TTBOMK, Pratt was not involved in any variant on the Boyer-Moore
>search algorithm. I suppose to be complete, I could mention that
>I've invented another variant of Boyer-Moore that I believe is
>original, but isn't on the list above. I'm pretty sure Pratt didn't
>have anything to do with it either though. <G>
Just saw Boyer-Moore-Gosper mentioned recently -- will have to
look that one up!
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
--
Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply
tos...@aol.com ab...@aol.com ab...@yahoo.com ab...@hotmail.com ab...@msn.com ab...@sprint.com ab...@earthlink.com ab...@cadvision.com ab...@ibsystems.com u...@ftc.gov
spam traps
I couldn't find the site you list, but this site has a LOT of info on
string searching: http://citeseer.nj.nec.com/context/2702/0
--Jeff
>My current project for SPI Dynamics made extensive use of the strstr()
>function to detect malicious activity in web server log files. The
>performance of that search routine was quite a bottleneck while LogAlert
>was detecting exploit strings. The strstr() function seemed to be part
>of the problem, I had to find a way to speed things up.
why not move laterally? instead of a faster way to repeatedly scan text
why not find a way to scan text once
theres something called a finite state machine that can find any number of
any strings in one pass threough the text
a deterministic fsm examines each character exactly once but it may have a
exponentially large number of states
a nondeterministic fsm can have fewer states but may need to examne a
character multiple times
you can use tools like lex to make gfsms and you can also do it by hand
this is a deterministic gfsm that find all occurence of the substrings
plover and plugh
int character, state = 0, lp = 1, cp = 1;
while (state>=0) {
character = fgetc(inputfile);
if (character=='\n') lp++,cp=0; else cp++;
switch (state) {
case 0: switch (character) {
case 'p': state = 'p'; continue;
default: state = 0;continue;
}
case 'p': switch (character) {
case 'p': state = 'p'; continue;
case 'l': state = 'pl'; continue;
default: state = 0; continue;
}
case 'pl': switch (character) {
case 'p': state = 'p'; continue;
case 'o': state = 'plo'; continue;
case 'u': state = 'plu'; continue;
default: state = 0; continue;
}
case 'plo': switch (character) {
case 'p': state = 'p'; continue;
case 'v': state = 'plov'; continue;
default: state = 0; continue;
}
case 'plov': switch (character) {
case 'p': state = 'p'; continue;
case 'e': state = 'plve'; continue;
default: state = 0; continue;
}
case 'plve': switch (character) {
case 'p': state = 'p'; continue;
case 'r':
fprintf(outputfile,
"found plover line %d"
"column %d through %d\n",
lp,cp-6,cp-1);
state = 0; continue;
default: state = 0; continue;
}
case 'plu': switch (character) {
case 'p': state = 'p'; continue;
case 'g': state = 'plug'; continue;
default: state = 0; continue;
}
case 'plug': switch (character) {
case 'p': state = 'p'; continue;
case 'h': state = 'plo'; continue;
fprintf(outputfile,
"found plugh line %d"
"column %d through %d\n",
lp,cp-5,cp-1);
state = 0; continue;
default: state = 0; continue;
Multi-character constants like 'plu' are non-portable. Since all you
really care in this case about is uniqueness, it makes more sense to
use an enum.
--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Cxiuj via bazo apartenas ni.
A technique used in ID2ID (to replace id strings in multiple
source files in one pass) inputs the candidate strings and forms a
binary tree. It uses an AVL tree to avoid imbalance, but could
also use red-black trees. Having formed that, a single pass
through the source extracts each possible identifier and does a
rapid binary search. If not found the original identifier is
output, otherwise the replacement.
There are complexities in ID2ID to avoid replacing strings with
identifiers that are already in the source and other anomalies,
none of which applies to a simple search.
Look up Ben Pfaffs libraries for AVL and Red-Black trees in
standard C.
In your case things should be especially simple because the input
set of strings is relatively unchanging.
--
Chuck F (cbfal...@my-deja.com) (cbfal...@XXXXworldnet.att.net)
http://www.qwikpages.com/backstreets/cbfalconer :=(down for now)
(Remove "NOSPAM." from reply address. my-deja works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)
> mmf wrote:
> >
> > In article <clcm-2001...@plethora.net>, DB <drba...@inetnow.net> wrote:
> >
> > >My current project for SPI Dynamics made extensive use of the strstr()
> > >function to detect malicious activity in web server log files. The
> > >performance of that search routine was quite a bottleneck while LogAlert
> > >was detecting exploit strings. The strstr() function seemed to be part
> > >of the problem, I had to find a way to speed things up.
> >
> > why not move laterally? instead of a faster way to repeatedly scan text
> > why not find a way to scan text once
>
> A technique used in ID2ID (to replace id strings in multiple
> source files in one pass) inputs the candidate strings and forms a
> binary tree. It uses an AVL tree to avoid imbalance, but could
> also use red-black trees. [...]
If the table of candidate strings does not change during a
program run, then there's a good chance that a balanced tree is a
waste of time and too complicated to boot. I would probably use
qsort() followed by bsearch() for simplicity or a hand-coded
binary search for speed, or maybe a hash table or trie, myself.
Though AVL and red-black trees are pretty, there's an even
prettier algorithm for giving an ordinary binary tree perfect
balance in O(n) time and O(1) space. If you really want to use a
binary tree, then, for a tree with static content, you can't beat
this. Just build a tree that's unbalanced any which way and run
it through the balancer.
This algorithm is implemented in the latest ALPHA release of
libavl, FWIW. As always, libavl is available from
<URL:http://www.msu.edu/~pfaffben>.
--
"What is appropriate for the master is not appropriate for the novice.
You must understand the Tao before transcending structure."
--The Tao of Programming
That runs the risk that the initial build can run into evil
conditions, such as a linked list on sorted input. The tree
builder can then use excessive space for its own recursion. An
intrinsically balanced entry mechanism avoids all this, and is
relatively impervious to peculiar input sets.
--
Chuck F (cbfal...@my-deja.com) (cbfal...@XXXXworldnet.att.net)
http://www.qwikpages.com/backstreets/cbfalconer :=(down for now)
(Remove "NOSPAM." from reply address. my-deja works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)
int bpsuffix( unsigned char * t, int n, unsigned char * p, int m )
{
unsigned b[ 256 ];
unsigned d;
const unsigned mm = 1 << ( m - 1 );
int i, j, shift, matches = 0, avg_shift = 0;
unsigned char c;
/* stupid preprocessing, rewrite it.... */
for( i = 0; i < m / 2; i++ )
{
c = p[ i ];
p[ i ] = p[ m - i - 1 ];
p[ m - i - 1 ] = c;
}
for( i = 0; i < 256; i++ ) b[ i ] = 0;
for( i = 0; i < m; i++ ) b[ p[ i ]] |= 1 << i;
for( i = 0; i < m / 2; i++ )
{
c = p[ i ];
p[ i ] = p[ m - i - 1 ];
p[ m - i - 1 ] = c;
}
i = 0;
do { j = m; shift = m; d = ~0;
do { d &= b[ t[ i + j - 1 ]];
--j;
if( d & mm )
if( j > 0 )
shift = j;
else
{
++ matches;
printf("Match at position %d\n", i );
}
} while( d <<= 1 );
/* the following test is unnecessary, if t is padded
appropriately...*/
if(( i += shift ) >= n ) break;
/* } while( t[ ( i += shift ) ] ); */
} while( t[ i ] );
return matches;