anyone experience with brics.automaton regex engine?

784 views
Skip to first unread message

bOR_

unread,
Dec 9, 2008, 1:33:26 PM12/9/08
to Clojure
Hi all.

Ported my research project to clojure, and now just benchmarking parts
of it. I am currently spending about 5/6th of all simulation time
doing regular expressions, so I looked into alternative regex engines,
optimizing the regular expression and trying to find out whether
clojure's (re-pattern also compiles the regex.

This old website lists the performance of several regex engines (the
fastest ones are also 'incomplete' compared to the other ones):
http://www.tusker.org/regex/regex_benchmark.html , and I am trying to
get one of them to run in clojure. As I've no experience with java, I
am struggling a bit.

(add-classpath "file:///home/boris/projects/chapter4/automaton.jar")
(import '(brics.automaton.RegExp))
(import '(brics.automaton.Automaton))
(import '(brics.automaton.RunAutomaton))

The above all loads fine, but the examples on the brics website
(http://www.brics.dk/~amoeller/automaton/faq.html ), is quite
minimal.. and I tried a few guesses how to call it through clojure [I
tried (.. new Regex ("ana")), for example]. Any help with translating
the syntax in this example to the clojure version would be
appreciated!

[java example:]
RegExp r = new RegExp("ab(c|d)*");
Automaton a = r.toAutomaton();
String s = "abcccdc";
System.out.println("Match: " + a.run(s)); // prints: true


[example of a typical string and regex in the model]
"GEDFFEDBGBFEBADACACAFGCBGDDEGGEFDFFFGDCGFGAEAGCFFBCDDCDEBCDAAFDCECCGABCGAAABBBCAFGACAABGFEBDACDFEAAGFGCGDFDGDAAEBFBGBCBDAFDFGFBCFBEABECBBAAEBABGAGFBBAAFFGGBDABFGFAFAEBBBACGEACCEBBCAFDGCADEBGCGFEEEFEADBCGCFBCEFGGGECEGEDCFCCBADBEABCCFGEADDDBEDBBFDBFDCBGDAEFECDEBFGBCBCCBEDEFGGEGCEABAFGECGCACFEGDDAEBAACDBFCGCEAEFEBBABAACFEECFDEAFFGAFEFBDCFCABEEBACBFDCEEAFFBCEDAFFDACEAABBEGFCGDCBFFBBFDDDEEBFCEGCFEFCAAGGEBBGDBCEEGFCFDDFBBFECEGGDBEFBGABFBGEACGAADAFBBDEAGDBADEECAEAAGEGEFEDCABBGFGBEFEFACEBEFFGFCFFFFFFDCB"

((?=([ BD][ ABCDF][ FG][ ABDEF][ ABCDFG][ ABCDEFG][ BCEG][ ABEG]
[ BCDEFG]))

bOR_

unread,
Dec 10, 2008, 3:27:20 AM12/10/08
to Clojure
Hired a monkey to hit random keys on my keyboard, and eventually
figured out how to get the automaton working.

RegExp r = new RegExp("ab(c|d)*");
Automaton a = r.toAutomaton();
String s = "abcccdc";
System.out.println("Match: " + a.run(s)); // prints: true


(add-classpath "file:///home/boris/projects/chapter4/automaton.jar")
(import '(brics.automaton.RegExp))
(import '(brics.automaton.Automaton))
(import '(brics.automaton.RunAutomaton)) ; didn't succeed yet in
figuring out how to call this one.

(def r (.toAutomaton (dk.brics.automaton.RegExp. "[ab][c][abc][d]")))
(def s "acbd")
(.run r s)

Next part of the exploration is to figure out how to get it to find
substrings in a large string.

bOR_

unread,
Jan 13, 2009, 3:21:57 AM1/13/09
to Clojure
For posterity: got it working! Here is how (just ignore all the gene
amino window size like words, they are specific for my code)

(add-classpath "file:///linuxhome/tmp/boris/automaton.jar")
(import '(brics.automaton.RegExp))
(import '(brics.automaton.Automaton))
(import '(brics.automaton.RunAutomaton))
(import '(brics.automaton.AutomatonMatcher))

(defn genes-to-single-regexp
"makes the possible recombinations from the genes Ag
pathway,
and returns a single regexp (using brics automaton) to
detect
epitopes"
[genes]
(let [make-regexp (fn [gene] (.toAutomaton
(dk.brics.automaton.RegExp. (reduce str (map #(str "[ " (reduce str %)
"]") gene)))))]
(dk.brics.automaton.RunAutomaton. (reduce (fn [rx1,rx2] (.union
rx1 rx2))
(map (fn [n] (make-
regexp n))
(set (map #(merge-
genes %)
(for [x
(first genes) y (second genes) z (third genes)] (list x y z))))))
true)))


(defn find-all-epi
"turns the rx and string into a
matcher.
don't know if it is a good idea to keep
matchers
or not. Now just forgetting about them after I made them"
[rx string]
(let [matcher (.newMatcher rx string)]
(count (take-while #(= true %) (repeatedly (fn [] (.find
matcher)))))))
Reply all
Reply to author
Forward
0 new messages