Lung-Sheng Chien
unread,Apr 29, 2011, 3:13:54 AM4/29/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pfacForum
I want to discuss two potential bugs, and we may fix them and add them
into PFAC v1.2
bug 1: reduce-version cannot report right position of matched
substring if users split input string into two or more small ones
I suggest adding a variable baseOfPosition (base of position) as
following
[code]
PFAC_status_t PFAC_matchFromDeviceReduce( PFAC_handle_t handle, char
*d_inputString, size_t size,
const int baseOfPosition ;
int *d_matched_result, int *d_pos, int *h_num_matched ) ;
[/code]
Example: suppose input string has 1000 characters and is split into
two small strings of 500 bytes,
and maximum length of patterns is 50, then we can do
[code]
size_t maxPatternLen = 50 ;
int h_num_matched1, h_num_matched2 ;
// process first string
PFAC_matchFromDeviceReduce( handle, d_inputString, 500 +
maxPatternLen,
0 ;
d_matched_result, d_pos, &h_num_matched1 ) ;
// process second string
PFAC_matchFromDeviceReduce( handle, d_inputString + 500, 500,
500 ;
d_matched_result + h_num_matched1, d_pos + h_num_matched1,
&h_num_matched2 );
printf("total amount of matched = %d\n", h_num_matched1 +
h_num_matched2 );
[/code]
bug 2: this is a performance bug.
so far, PFAC_matchFromDeviceReduce() requires huge memory even number
of matched is very few.
Suppose input string is 100MB, then users must allocate 400MB
d_matched_result and
400MB d_pos. Even you have GTX480, then only 100MB+ of input string
can be processed.
I plan to remove memory requirement of d_pos. The following is
proposal.
Assume number of patterns < 4 Mega (not size of pattern files, it is
number of patterns),
then I can encode d_pos into d_matched_result, so users only allocate
400MB d_matched_result,
and PFAC_matchFromDeviceReduce() will allocate d_pos for users.
prototype is
[code]
// *d_pos contains an address in device memory allocated by callee
PFAC_status_t PFAC_matchFromDeviceReduce2( PFAC_handle_t handle, char
*d_inputString, size_t size,
const int baseOfPosition ;
int *d_matched_result, int **d_pos, int *h_num_matched ) ;
[/code]
users must free d_pos himself.
any idea?
Lung-Sheng Chien