regular expression idea

7 views
Skip to first unread message

kyle smith

unread,
Aug 20, 2008, 12:02:25 AM8/20/08
to Clojure
I think it would be nice to allow functions inside regexs. Lets
define \fn to escape a function name. This would allow us to define
recursive regexs*, for example parenthesis matching.

(defn match-parens [x]
(match "\(([^()]+|\fnmatch-parens)*\)" x))

The function arguments could be the last matched group, some user-
defined group, or just the current position in the string/sequence. A
match would be indicated by any non-nil return value, which you could
capture by enclosing the function call in a group. You could also
match on recursion depth (by storing it in a variable and incrementing
it in the function).

I have a few applications for this. First, I'm making a dsl that
involves set operations (intersect, union, etc). I would like to do
syntax checking for user-defined scripts. (They would have a very
hard time decyphering stack traces.) I would also like to check the
logfile of a program to see if the floating point numbers it generates
are reasonable.

In general, this might be useful for writing parsers, validating html/
xml, etc. This seems like a powerful idea, so I'm sure it would be
nice for all sorts of things. What are everyone's thoughts?

*Yes, I know that's not proper terminology.

Chouser

unread,
Aug 20, 2008, 8:44:44 AM8/20/08
to clo...@googlegroups.com
On Wed, Aug 20, 2008 at 12:02 AM, kyle smith <the1ph...@gmail.com> wrote:
>
> I think it would be nice to allow functions inside regexs. Lets
> define \fn to escape a function name. This would allow us to define
> recursive regexs*, for example parenthesis matching.

I agree this would be useful. However, Clojure doesn't provide its
own regex engine currently; it just provides some nicer syntax for
Java's regex engine. And unfortunately, Java's regex doesn't support
functions (as far as I can tell), so we'd have to write a new regex
engine from scratch. Even if that were done, I wouldn't hold my
breath waiting for it to be included in Clojure itself -- I don't
think Rich is that big a fan of regex.

--Chouser

kyle smith

unread,
Aug 20, 2008, 9:55:49 AM8/20/08
to Clojure
Having it as a library would be fine. People build regex libraries in
lisp all the time, so it shouldn't be too hard to modify one. I've
thought of more uses:

If in match-parens you had a huge cond block with an entry for every
clojure form, you could do static syntax checking for the entire
language. This might be generalizable to non-lisps: match on curly
braces (c, java, etc), whitespace (python), and have cond blocks as
before.

In addition to syntax checking, you could possibly do static type
checking. For each operation (. x (something args)) or (a-clojure-
form args), find all types with that method/all types valid for the
given form and add them to the object (perhaps via metadata whenever
it supports that, although its immutability may cause problems).
Similarly, add info implied by the method signature to the args.
Apply iteratively if necessary. When type info is deduced, a cond
statement with instance? and cast pairs could be inserted
automatically to avoid reflection. I don't think there should be any
type conflicts (intersection of possible types of something is nil),
but I could be wrong.

This could also be used to match grammars in general. The wikipedia
example of a context-free language is S -> aSb | ab which can be
loosely translated to clojure:

(defun aSb [x]
(cond ((eq x nil) nil)
((eq x "ab") t)
(t (match "a\fnaSbb" (. s (subString 1 (- (. s length) 2))) ))
)

Rastislav Kassak

unread,
Aug 20, 2008, 10:11:26 AM8/20/08
to clo...@googlegroups.com
On 8/20/08, kyle smith <the1ph...@gmail.com> wrote:
>
> I think it would be nice to allow functions inside regexs. Lets
> define \fn to escape a function name. This would allow us to define
> recursive regexs*, for example parenthesis matching.
>
> (defn match-parens [x]
> (match "\(([^()]+|\fnmatch-parens)*\)" x))
Isn't it task better suited for parser combinator?
In fact, you are escaping from regular grammars I guess, so it's not
regexp anymore. :)
But maybe it's interesting way to build grammars in more succint way.

Matt Revelle

unread,
Aug 20, 2008, 9:10:39 AM8/20/08
to clo...@googlegroups.com
Parser combinators came up on this list before and would probably be a
preferred solution to the given example.
Reply all
Reply to author
Forward
0 new messages