another form of exact string match

3 views
Skip to first unread message

Lung-Sheng Chien

unread,
Mar 4, 2011, 1:29:52 AM3/4/11
to pfacForum
PFAC can match "ABC" but not work on "ABC[A-Z]".
class representation [A-Z] is not a character but a subset of
characters.
We can re-write state machine to achieve this goal.
Two approaches are possible
1) expend r1[A-Z]r2 to
r1Ar2
r1Br2
....
r1Zr2

pros:
easy to implement
cons:
transition table becomes large.
2) use bit map to represent [A-Z], this is common technique in
hardware (FPGA solution)
pros:
state transition table is the same but we need two extra
tables, one is for bitmap, the other is for state transition of class.
cons:
performance is down due to extra table lookup.

For DNA analysis, legal set is {A,T,C,G}, we must have a space-
efficient version of PFAC.
if we know this is an app of DNA analysis, we can translate
AT.G
as
ATAG
ATTG
ATCG
ATGG

I think two approaches have different goal, they may exist in PFAC
library.
For example, DNA analysis is better on 1).

Can we find a killing app on 2)?

Cheng-Hung Lin

unread,
Mar 9, 2011, 11:18:02 AM3/9/11
to pfacForum
For regular expression in (1)
we may not need to expend r1[A-Z]r2 to 26 rules. We can modify the
create_table to accommodate the character class.

For DNA in (2)
The number of column is only 4 for A, T, C, G, respectively. The
memory requirement is less than the PFAC.

Lung-Sheng Chien

unread,
Mar 9, 2011, 8:08:42 PM3/9/11
to pfacForum

you need to expand r1[A-Z]r2 to handle conflict, for example
r1[A-D]r2
r1[C-E]r3

As for transition table, you can merge them back.
> > Can we find a killing app on 2)?- 隱藏被引用文字 -
>
> - 顯示被引用文字 -
Reply all
Reply to author
Forward
0 new messages