Programmatic Optimization of Regular Expressions

11 views
Skip to first unread message

Tim Zehta

unread,
Nov 20, 2013, 11:52:53 AM11/20/13
to pym...@googlegroups.com
Has anyone seen anything like this for Python?

> Regex::PreSuf - create regular expressions from word lists
http://search.cpan.org/~jhi/Regex-PreSuf-1.17/PreSuf.pm


Using Regex::PreSuf results in a significant performance gain compared
to simply combining the words with pipes.

The following times are for regexes built from a 534 line word list:

1,000 lines evaluated
PreSuf: 0m0.234s
Simple: 0m0.826s

10,000 lines evaluated
PreSuf: 0m1.216s
Simple: 0m8.179s

100,000 lines evaluated
PreSuf: 0m11.818s
Simple: 1m27.690s


Gregg Lind

unread,
Nov 20, 2013, 12:03:28 PM11/20/13
to pymntos
Something like a nicely formatted version of internal data structure of a Trie  (http://pythonhosted.org/PyTrie/) should get most of the way there. 


Jason Furtney

unread,
Nov 20, 2013, 12:24:13 PM11/20/13
to pym...@googlegroups.com
Off topic, but Emacs lisp has this function:

regexp-opt: Return a regexp to match a string in the list STRINGS.
Each string should be unique in STRINGS and should not contain any regexps,
quoted or not.

(regexp-opt
(split-string "def define loop command if
case_of caseof section end end_loop endloop end_command
endcommand end_if endif end_case endcase end_section endsection
case else array local global argument while null then
while_stepping whilestepping exit" " "))

Which gives output like this

"\\(?:ar\\(?:gument\\|ray\\)\\|c\\(?:aseof\\|ommand\\)\\|def\\(?:ine\\)?\\|e\\(?:lse\\|nd\\(?:_\\(?:c\\(?:ase\\|ommand
endcommand\\)\\|if\\|loop\\|section\\)\\|case\\|if\\|loop\\|section
case\\)?\\|xit\\)\\|global\\|if
case_of\\|lo\\(?:cal\\|op\\)\\|null\\|section\\|then
while_stepping\\|while\\(?:stepping\\)?\\)"
> --
> Meetings Schedule / RVSP on our Meetup at http://python.mn
> ---
> You received this message because you are subscribed to the Google Groups
> "PyMNtos" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pymntos+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
--
Jason K. Furtney
Itasca Consulting Group
111 3rd Ave. South, Suite 450
Minneapolis, MN 55401 USA
(612) 371-4711
Reply all
Reply to author
Forward
0 new messages