Hello and well done for your excellent work! I would like to take some measurements of my own using PFAC in a evaluation case similar to that described at Accelerating Pattern Matching Using a Novel Parallel
Algorithm on GPU using snort rulles and DEFCON packets for my thesis.
Using Device 0: "GeForce GTX 760"
major = 3, minor = 0
----- time-driven version
../test/pattern/snort_ruleset4
@@@@ profile PFAC_matchFromHost + texture ON
The elapsed time is 450.410004 ms
The input size is 245235537 bytes
The throughput is 4.355774 Gbps
The number of matched is 71828400
I tried to lower the number of rules or the number of input to test but i get similar results
The throughput i achieve is relatively low compared to the one around 10Gbps reported in the paper. Is there anything obvious i am missing?
I use the cuda 5.5 toolkit (i had to change some parameters on the function involving the texture reference since it changed a bit)
Furthermore i read in some previous post that you intended to add cuda streams support. Is there a special reason they were not supported from the start (like the thread safety issue with the texture binding)?
If not i may attempt to add them myself and if the result is good i may contribute to the project.
Takis