new features in PFAC v1.2

9 views
Skip to first unread message

Lung-Sheng Chien

unread,
Apr 29, 2011, 3:13:54 AM4/29/11
to pfacForum
I want to discuss two potential bugs, and we may fix them and add them
into PFAC v1.2

bug 1: reduce-version cannot report right position of matched
substring if users split input string into two or more small ones

I suggest adding a variable baseOfPosition (base of position) as
following
[code]
PFAC_status_t PFAC_matchFromDeviceReduce( PFAC_handle_t handle, char
*d_inputString, size_t size,
const int baseOfPosition ;
int *d_matched_result, int *d_pos, int *h_num_matched ) ;
[/code]

Example: suppose input string has 1000 characters and is split into
two small strings of 500 bytes,
and maximum length of patterns is 50, then we can do
[code]
size_t maxPatternLen = 50 ;
int h_num_matched1, h_num_matched2 ;
// process first string
PFAC_matchFromDeviceReduce( handle, d_inputString, 500 +
maxPatternLen,
0 ;
d_matched_result, d_pos, &h_num_matched1 ) ;

// process second string
PFAC_matchFromDeviceReduce( handle, d_inputString + 500, 500,
500 ;
d_matched_result + h_num_matched1, d_pos + h_num_matched1,
&h_num_matched2 );

printf("total amount of matched = %d\n", h_num_matched1 +
h_num_matched2 );
[/code]


bug 2: this is a performance bug.
so far, PFAC_matchFromDeviceReduce() requires huge memory even number
of matched is very few.
Suppose input string is 100MB, then users must allocate 400MB
d_matched_result and
400MB d_pos. Even you have GTX480, then only 100MB+ of input string
can be processed.

I plan to remove memory requirement of d_pos. The following is
proposal.

Assume number of patterns < 4 Mega (not size of pattern files, it is
number of patterns),
then I can encode d_pos into d_matched_result, so users only allocate
400MB d_matched_result,
and PFAC_matchFromDeviceReduce() will allocate d_pos for users.

prototype is
[code]
// *d_pos contains an address in device memory allocated by callee
PFAC_status_t PFAC_matchFromDeviceReduce2( PFAC_handle_t handle, char
*d_inputString, size_t size,
const int baseOfPosition ;
int *d_matched_result, int **d_pos, int *h_num_matched ) ;
[/code]

users must free d_pos himself.

any idea?
Lung-Sheng Chien

Lung-Sheng Chien

unread,
Apr 30, 2011, 12:19:21 AM4/30/11
to pfacForum
also I want to add support of stream, say

PFAC_setKernelStream(PFAC_handle_t handle, cudaStream_t steamId )

just like CUBLAS and CUSPARSE, then you can benefit from concurrent
kernel launch and
overlap data transfer and computatiom.
Reply all
Reply to author
Forward
0 new messages