Why doesn't regex implement ifn?

123 views
Skip to first unread message

Sean Devlin

unread,
Aug 26, 2009, 9:17:57 PM8/26/09
to Clojure
Okay, I'm sure this has come up before. I was just wondering if
anyone knew why the regex literal doesn't implement IFn?

At first glance it seems like the following would be useful:

user=>(#"\d{3}" "123")
true

This is defined as...
user=>(not (nil? (re-matches #"\d{3}" "123")))
true

What am I missing?
Sean

Chas Emerick

unread,
Aug 26, 2009, 9:21:26 PM8/26/09
to clo...@googlegroups.com

The #" form produces Java Pattern objects:

user=> (class #"foo")
java.util.regex.Pattern
user=>

...which of course aren't IFns.

- Chas

Timothy Pratley

unread,
Aug 26, 2009, 9:46:22 PM8/26/09
to Clojure
> java.util.regex.Pattern

I imagine a wrapper class could be returned instead that implemented
IFn and Pattern,
but which function would it call?
(re-find m)
(re-find re s)
(re-groups m)
(re-matcher re s)
(re-matches re s)
(re-pattern s)
(re-seq re s)

I don't think there is a clear implicit choice - so explicitly
choosing seems necessary here.

Rich Hickey

unread,
Aug 26, 2009, 10:01:41 PM8/26/09
to Clojure


On Aug 26, 9:46 pm, Timothy Pratley <timothyprat...@gmail.com> wrote:
> > java.util.regex.Pattern
>
> I imagine a wrapper class could be returned instead that implemented
> IFn and Pattern,

Unfortunately, Pattern is a final class.

Rich

Chas Emerick

unread,
Aug 26, 2009, 10:04:21 PM8/26/09
to clo...@googlegroups.com

On Aug 26, 2009, at 9:46 PM, Timothy Pratley wrote:

>> java.util.regex.Pattern
>
> I imagine a wrapper class could be returned instead that implemented
> IFn and Pattern,

Pattern is a final concrete class, so that's not possible.

...I'm counting down until I see an all-clojure regex implementation
announcement. ;-)

- Chas

Sean Devlin

unread,
Aug 26, 2009, 10:31:32 PM8/26/09
to Clojure
Well, with a statically typed language this would be a problem.

What if we used duck-typing to get around this?

Granted, this wouldn't work for anything that gets passed to Java, but
the following gist would be a start.

http://gist.github.com/176032

Now, We'd still have to address Mr. Pratley's issue of which fn to
implement, but now we have an option. Thoughts?

Sean

Timothy Pratley

unread,
Aug 27, 2009, 1:06:17 AM8/27/09
to Clojure
> Granted, this wouldn't work for anything that gets passed to Java, but
> the following gist would be a start.
> http://gist.github.com/176032

You already have a getPattern method for those cases. Which suggests
another solution:

(defn re-fn
"Uses ss to construct a java.util.Pattern.
Returns a function which returns the Pattern if called with no
arguments,
and calls re-seq if called with a string argument."
[ss]
(let [pp (re-pattern ss)]
(fn re
([] pp)
([s] (re-seq pp s)))))

user=> ((re-fn "2.") "12324251")
("23" "24" "25")
user=> ((re-fn "2."))
#"2."

If #"X" created a (re-fn "X") then all the re functions could accept
a function and call it in order to avoid having to do (re-find (pp)
s). The previous signature could be retained so that they would work
with either type of arguments if that were desirable. The downside is
trying to explain that in the docs might be confusing - so a wrapper
seems more obvious for that.

Oh - it seems like re-seq does the most work so perhaps that is the
best candidate?


Regards,
Tim.

Sean Devlin

unread,
Aug 27, 2009, 11:41:36 AM8/27/09
to Clojure
The only feature I want is the ability to use a regex as a predicate.
So, I'd prefer something like re-matches. Maybe this isn't the
biggest use case, though.

Sean

>
> Regards,
> Tim.

Chouser

unread,
Aug 27, 2009, 1:34:42 PM8/27/09
to clo...@googlegroups.com
On Thu, Aug 27, 2009 at 11:41 AM, Sean Devlin<francoi...@gmail.com> wrote:
>
> On Aug 27, 1:06 am, Timothy Pratley <timothyprat...@gmail.com> wrote:
>>
>>  If #"X" created a (re-fn "X") then all the re functions could accept
>> a function and call it in order to avoid having to do (re-find (pp)
>> s). The previous signature could be retained so that they would work
>> with either type of arguments if that were desirable. The downside is
>> trying to explain that in the docs might be confusing - so a wrapper
>> seems more obvious for that.

The benefits of #"" producing a real java.util.regex.Pattern
object instead of some Clojury wrapper will decrease as it
becomes more common to write Clojure code that can run on
non-JVM platforms. So although this idea has come up and
then been abandoned several times before, I think it's worth
bringing up again periodically to see what makes sense.

>> Oh - it seems like re-seq does the most work so perhaps that is the
>> best candidate?
>
> The only feature I want is the ability to use a regex as a predicate.
> So, I'd prefer something like re-matches.  Maybe this isn't the
> biggest use case, though.

Pretty much anything that can be concluded by using
re-matches can also be found using re-seq, so I think I'd
prefer the latter. One proviso being that currently re-seq
returns an empty list, not nil, when there are no matches.
This does reduce its utility as a predicate. Would
automatically forcing the first step to get a nice 'nil' be
unacceptable?

--Chouser

eyeris

unread,
Aug 27, 2009, 1:37:46 PM8/27/09
to Clojure
I have the same urge, to want to use regexps as predicates. However I
definitely would not like to read such code. I can only imagine having
to try to read such code if I didn't understand regexps. E.g. (filter
#"\d+" maybe-numbers) is clear enough to someone who understands
regexps. However (filter is-number-regex maybe-numbers) is clear to
even programmers that don't understand regexps.

Sean Devlin

unread,
Aug 27, 2009, 1:50:43 PM8/27/09
to Clojure
Hmmm... I think you're confusing the issue. Your compliant seems to
be more directed at "magic numbers" than regexes. If I understand
your argument (which I agree with):

;bad code
(filter #"\d+" maybe-numbers)

;good code
(let [is-number-regex #"\d+"]
(filter is-number-regex maybe-numbers))

Granted, some understanding of regexes is required. However, if
you're hacking Lisp, I think you've moved beyond beginner status, and
expecting a basic familiarity of a regex is fair.

My $.02

Sean

eyeris

unread,
Aug 27, 2009, 5:21:16 PM8/27/09
to Clojure
I named is-number-regex poorly. I meant it to be a function that calls
re-matches. Here's a more complete snippet, with better names:

; bad code
(filter #"^\d+$" maybe-numbers)

; good code
(defn re-match? [re s] (not (nil? (re-matches re s))))
(defn number-text? [s] (re-match? #"^\d+$" s))
(filter number-text? maybe-numbers)

My point is that, since magic numbers are bad, you should be giving
the regex a meaningful name, so implementing IFn for regexps isn't a
big savings because it pretty much just transforms:

(defn number-text? [s] (re-match? #"^\d+$" s))

into this:

(defn number-text? [s] (#"^\d+$" s))

Sean Devlin

unread,
Aug 27, 2009, 5:38:27 PM8/27/09
to Clojure
Or this:

(def number-text? #"^\d$")

Which is a savings :)

Chas Emerick

unread,
Aug 27, 2009, 9:05:26 PM8/27/09
to clo...@googlegroups.com

On Aug 27, 2009, at 1:34 PM, Chouser wrote:

> The benefits of #"" producing a real java.util.regex.Pattern
> object instead of some Clojury wrapper will decrease as it
> becomes more common to write Clojure code that can run on
> non-JVM platforms. So although this idea has come up and
> then been abandoned several times before, I think it's worth
> bringing up again periodically to see what makes sense.

Why wouldn't #"" produce whatever the corollary regex object is on
each host platform?

Tangentially, if I think ahead a couple of 'moves', I'd think that
perhaps there's a desire to have clojure code that is thoroughly
portable between, say, Java and .NET host platforms.

- Chas

Timothy Pratley

unread,
Aug 27, 2009, 9:57:36 PM8/27/09
to Clojure
> > The only feature I want is the ability to use a regex as a predicate.
> Would automatically forcing the first step to get a nice 'nil' be unacceptable?

Sounds good to me!
This can be quite easily accommodated:

(defn re-fn
"Construct a regular expression from string.
Calling a regular expression with no arguments returns a Pattern.
Calling a regular expression with a string argument
returns nil if no matches, otherwise the equivalent of (re-seq re
string)."
[string]
(let [pp (re-pattern string)]
(fn re
([] pp)
([s] (let [groups (re-seq pp s)]
(if (first groups)
groups
nil))))))


user=> ((re-fn "7.") "12324251")
nil
user=> ((re-fn "2.") "12324251")
("23" "24" "25")
user=> (if ((re-fn "1") "12") :great :bad)
:great

(And of course a wrapper implementation version could do something
similar).


Regards,
Tim.

Sean Devlin

unread,
Aug 27, 2009, 10:30:13 PM8/27/09
to Clojure
Awesome.

+1

Rich Hickey

unread,
Aug 28, 2009, 9:00:59 AM8/28/09
to clo...@googlegroups.com

I don't think moving specific applications between JVM/CLR/JS is a
target, nor should it be. People need to move their expertise in the
language, and core libraries, to different applications, which might
have different target platforms.

So, UI/web/DB libs IMO should not be portable. Life is too short.
Regex is interesting - is there a lot of core library code that uses
it? Are there portable alternatives? It is quite likely that the
native regex support on any platform will be best for that platform,
so I'm inclined to think that #"" should follow "when in Rome"
principles, and the burden should be on those who want portable regex
to use something portable and eschew #"", but as I've said before, I'm
not much of a regex user.

Other thoughts?

Rich

Shawn Hoover

unread,
Aug 28, 2009, 10:01:24 AM8/28/09
to clo...@googlegroups.com
On Thu, Aug 27, 2009 at 9:05 PM, Chas Emerick <ceme...@snowtide.com> wrote:


On Aug 27, 2009, at 1:34 PM, Chouser wrote:

> The benefits of #"" producing a real java.util.regex.Pattern
> object instead of some Clojury wrapper will decrease as it
> becomes more common to write Clojure code that can run on
> non-JVM platforms.  So although this idea has come up and
> then been abandoned several times before, I think it's worth
> bringing up again periodically to see what makes sense.

Why wouldn't #"" produce whatever the corollary regex object is on
each host platform?

I had a couple suggestions on clojure-dev for ClojureCLR that line up with the "produce the corollary idea": http://groups.google.com/group/clojure-dev/browse_thread/thread/d4286dac9f1cf8ba/7e05daa7b782c075.
 
Tangentially, if I think ahead a couple of 'moves', I'd think that
perhaps there's a desire to have clojure code that is thoroughly
portable between, say, Java and .NET host platforms.

Essentially, as Chouser noted, #"" and re-seq as currently defined in Clojure get you pretty far as a portable API. However, unless the platforms agree on literal regex syntax (they don't, beyond the basic "asdf|[0-9]+" features) will prevent true portability of the literals.

Shawn

Chouser

unread,
Aug 28, 2009, 10:37:06 AM8/28/09
to clo...@googlegroups.com
On Thu, Aug 27, 2009 at 9:05 PM, Chas Emerick<ceme...@snowtide.com> wrote:
>
>
> On Aug 27, 2009, at 1:34 PM, Chouser wrote:
>
>> The benefits of #"" producing a real java.util.regex.Pattern
>> object instead of some Clojury wrapper will decrease as it
>> becomes more common to write Clojure code that can run on
>> non-JVM platforms.  So although this idea has come up and
>> then been abandoned several times before, I think it's worth
>> bringing up again periodically to see what makes sense.
>
> Why wouldn't #"" produce whatever the corollary regex object is on
> each host platform?

What methods could you call on such a thing? The answer
would differ depending on your platform, making the direct
calling of methods on such a thing undesirable as more
platforms are supported.

> Tangentially, if I think ahead a couple of 'moves', I'd think that
> perhaps there's a desire to have clojure code that is thoroughly
> portable between, say, Java and .NET host platforms.

Yes, exactly, at least for some subset of Clojure
functionality. On the one hand, it's probably not worth
making Java binary serialization work transparently on
a JavaScript host, for example. On the other hand, it'd be
nice (and not terribly difficult) to make
(re-seq #"x." "x1y2x3") return ("x1" "x3") on nearly every
platform being considered. So it'd be nice if re-seq and
Clojure's other re-* functions always worked on whatever #""
produced, but it's less important for #"" to have any
particular set of native methods.

Of course you'd still want to provide a way to get to the
underlying platform-specific pattern object for cases where
you want to take advantage a platform feature in code that
doesn't have to be portable.

--Chouser

Chouser

unread,
Aug 28, 2009, 12:13:21 PM8/28/09
to clo...@googlegroups.com
On Fri, Aug 28, 2009 at 10:01 AM, Shawn Hoover<shawn....@gmail.com> wrote:
>
> However, unless the platforms agree on literal regex
> syntax (they don't, beyond the basic "asdf|[0-9]+"
> features) will prevent true portability of the literals.

This is an interesting and crucial assertion. If the regex
syntaxes do not have a useful overlap, only libraries that
allow regexs to "pass through" from the app to the platform
(creating no new regex objects of their own) will be
portable, at which point wrapping the pattern in
a clojure-something becomes rather less useful (except
I suppose for the original IFn point).

But is it true? The amount of overlap between, for example,
JVM and JavaScript is quite substantial, both having
borrowed features and syntax quite heavily from perl.

http://www.regular-expressions.info/refflavors.html

I think that a s long as we're not trying to support ancient
engines (such as sed, awk, emacs...) the subset that
overlaps would be quite useful.

--Chouser

Chas Emerick

unread,
Aug 28, 2009, 12:38:00 PM8/28/09
to clo...@googlegroups.com
On Aug 28, 2009, at 10:01 AM, Shawn Hoover wrote:

Why wouldn't #"" produce whatever the corollary regex object is on
each host platform?

I had a couple suggestions on clojure-dev for ClojureCLR that line up with the "produce the corollary idea": http://groups.google.com/group/clojure-dev/browse_thread/thread/d4286dac9f1cf8ba/7e05daa7b782c075.

Yeah, I wanted to find that thread yesterday, but (surprise!) couldn't.

 Tangentially, if I think ahead a couple of 'moves', I'd think that
perhaps there's a desire to have clojure code that is thoroughly
portable between, say, Java and .NET host platforms.

Essentially, as Chouser noted, #"" and re-seq as currently defined in Clojure get you pretty far as a portable API. However, unless the platforms agree on literal regex syntax (they don't, beyond the basic "asdf|[0-9]+" features) will prevent true portability of the literals.

Indeed.  FWIW, I was *definitely not* suggesting that soup-to-nuts portability across host platforms is a good idea -- I just see things that make me think that others would lean in that direction.

On Aug 28, 2009, at 9:00 AM, Rich Hickey wrote:

I don't think moving specific applications between JVM/CLR/JS is a
target, nor should it be. People need to move their expertise in the
language, and core libraries, to different applications, which might
have different target platforms.

So, UI/web/DB libs IMO should not be portable. Life is too short.

I agree completely.  You can (and many have) spend a lifetime shimming APIs (or reimplementing them, for that matter).  People will do what they want of course, but there's a huge opportunity cost attached.

Regex is interesting - is there a lot of core library code that uses
it? Are there portable alternatives? It is quite likely that the
native regex support on any platform will be best for that platform,
so I'm inclined to think that #"" should follow "when in Rome"
principles, and the burden should be on those who want portable regex
to use something portable and eschew #"", but as I've said before, I'm
not much of a regex user.

AFAIC, regex engines are platform facilities, like threading is.  There's always going to be off-by-one differences (like those related to .NET's regex impl, nevermind the vastly different javascript impl), and any reimplementation will not be as heavily used, tested, or be as fast or feature-complete as the host platform's facility.  The same goes for concurrency approaches, graphics contexts, networking, etc. etc.

- Chas

Chas Emerick

unread,
Aug 28, 2009, 5:04:47 PM8/28/09
to clo...@googlegroups.com

Any kind of substantial difference in the implementation (and not just
in the supported feature set) will lead to lots of confusion when
those differences become apparent. The key thing is that what's
substantial to one is unimportant to another.

e.g.: A quick scan of that page shows two really big differences
between .NET and Java -- only the latter has possessive quantifiers
(which I've come to love for certain tight jams), and only the former
has named groups (a feature I dearly miss from my python days when in
the Java world). I don't think we do anyone any favors trying to come
up with a supported regex variant that is only the intersection of the
host platforms that are of interest (which, of course, will change).

I guess I'm just missing the boat on the motivation here. Will the
same principle apply for networking and graphics and concurrency,
too? I can't imagine so...

- Chas

Reply all
Reply to author
Forward
0 new messages