I have posted on cuda forum
http://forums.nvidia.com/index.php?showtopic=193974
also submit a news in GPGPU.org
[code]
PFAC is abbreviated from the Parallel Failureless Aho-Corasick
algorithm.
It is a variant of the well-known Aho-Corasick (AC) algorithm with all
its failure transitions
removed as well as the self-loop transition of the initial state.
The purpose of PFAC is to match all longest patterns in a given input
stream against patterns pre-defined by users.
The data-parallel nature of PFAC perfoms excellent on GPU, especially
Fermi card.
PFAC library provides a C level API and is easy to use. Users even
need not know CUDA programming,
just follow simple example in user guide, then content search or virus
detection can be done on GPU.
PFAC library does not use multi-GPU intrinsically but users can
combine PFAC library with OpenMP or PThread library
to perform string matching on Multi-GPU. One can find OpenMP or
PThread example in the release.
Download and further information:
http://code.google.com/p/pfac/
[/code]
I think that we can think about further optimization possibility on
PFAC library.
Every suggestion is welcome.
Lung-Sheng