The #" form produces Java Pattern objects:
user=> (class #"foo")
java.util.regex.Pattern
user=>
...which of course aren't IFns.
- Chas
>> java.util.regex.Pattern
>
> I imagine a wrapper class could be returned instead that implemented
> IFn and Pattern,
Pattern is a final concrete class, so that's not possible.
...I'm counting down until I see an all-clojure regex implementation
announcement. ;-)
- Chas
The benefits of #"" producing a real java.util.regex.Pattern
object instead of some Clojury wrapper will decrease as it
becomes more common to write Clojure code that can run on
non-JVM platforms. So although this idea has come up and
then been abandoned several times before, I think it's worth
bringing up again periodically to see what makes sense.
>> Oh - it seems like re-seq does the most work so perhaps that is the
>> best candidate?
>
> The only feature I want is the ability to use a regex as a predicate.
> So, I'd prefer something like re-matches. Maybe this isn't the
> biggest use case, though.
Pretty much anything that can be concluded by using
re-matches can also be found using re-seq, so I think I'd
prefer the latter. One proviso being that currently re-seq
returns an empty list, not nil, when there are no matches.
This does reduce its utility as a predicate. Would
automatically forcing the first step to get a nice 'nil' be
unacceptable?
--Chouser
> The benefits of #"" producing a real java.util.regex.Pattern
> object instead of some Clojury wrapper will decrease as it
> becomes more common to write Clojure code that can run on
> non-JVM platforms. So although this idea has come up and
> then been abandoned several times before, I think it's worth
> bringing up again periodically to see what makes sense.
Why wouldn't #"" produce whatever the corollary regex object is on
each host platform?
Tangentially, if I think ahead a couple of 'moves', I'd think that
perhaps there's a desire to have clojure code that is thoroughly
portable between, say, Java and .NET host platforms.
- Chas
I don't think moving specific applications between JVM/CLR/JS is a
target, nor should it be. People need to move their expertise in the
language, and core libraries, to different applications, which might
have different target platforms.
So, UI/web/DB libs IMO should not be portable. Life is too short.
Regex is interesting - is there a lot of core library code that uses
it? Are there portable alternatives? It is quite likely that the
native regex support on any platform will be best for that platform,
so I'm inclined to think that #"" should follow "when in Rome"
principles, and the burden should be on those who want portable regex
to use something portable and eschew #"", but as I've said before, I'm
not much of a regex user.
Other thoughts?
Rich
Why wouldn't #"" produce whatever the corollary regex object is on
On Aug 27, 2009, at 1:34 PM, Chouser wrote:
> The benefits of #"" producing a real java.util.regex.Pattern
> object instead of some Clojury wrapper will decrease as it
> becomes more common to write Clojure code that can run on
> non-JVM platforms. So although this idea has come up and
> then been abandoned several times before, I think it's worth
> bringing up again periodically to see what makes sense.
each host platform?
What methods could you call on such a thing? The answer
would differ depending on your platform, making the direct
calling of methods on such a thing undesirable as more
platforms are supported.
> Tangentially, if I think ahead a couple of 'moves', I'd think that
> perhaps there's a desire to have clojure code that is thoroughly
> portable between, say, Java and .NET host platforms.
Yes, exactly, at least for some subset of Clojure
functionality. On the one hand, it's probably not worth
making Java binary serialization work transparently on
a JavaScript host, for example. On the other hand, it'd be
nice (and not terribly difficult) to make
(re-seq #"x." "x1y2x3") return ("x1" "x3") on nearly every
platform being considered. So it'd be nice if re-seq and
Clojure's other re-* functions always worked on whatever #""
produced, but it's less important for #"" to have any
particular set of native methods.
Of course you'd still want to provide a way to get to the
underlying platform-specific pattern object for cases where
you want to take advantage a platform feature in code that
doesn't have to be portable.
--Chouser
This is an interesting and crucial assertion. If the regex
syntaxes do not have a useful overlap, only libraries that
allow regexs to "pass through" from the app to the platform
(creating no new regex objects of their own) will be
portable, at which point wrapping the pattern in
a clojure-something becomes rather less useful (except
I suppose for the original IFn point).
But is it true? The amount of overlap between, for example,
JVM and JavaScript is quite substantial, both having
borrowed features and syntax quite heavily from perl.
http://www.regular-expressions.info/refflavors.html
I think that a s long as we're not trying to support ancient
engines (such as sed, awk, emacs...) the subset that
overlaps would be quite useful.
--Chouser
Why wouldn't #"" produce whatever the corollary regex object is on
each host platform?I had a couple suggestions on clojure-dev for ClojureCLR that line up with the "produce the corollary idea": http://groups.google.com/group/clojure-dev/browse_thread/thread/d4286dac9f1cf8ba/7e05daa7b782c075.
perhaps there's a desire to have clojure code that is thoroughly
portable between, say, Java and .NET host platforms.Essentially, as Chouser noted, #"" and re-seq as currently defined in Clojure get you pretty far as a portable API. However, unless the platforms agree on literal regex syntax (they don't, beyond the basic "asdf|[0-9]+" features) will prevent true portability of the literals.
I don't think moving specific applications between JVM/CLR/JS is a
target, nor should it be. People need to move their expertise in the
language, and core libraries, to different applications, which might
have different target platforms.
So, UI/web/DB libs IMO should not be portable. Life is too short.
Regex is interesting - is there a lot of core library code that uses
it? Are there portable alternatives? It is quite likely that the
native regex support on any platform will be best for that platform,
so I'm inclined to think that #"" should follow "when in Rome"
principles, and the burden should be on those who want portable regex
to use something portable and eschew #"", but as I've said before, I'm
not much of a regex user.
Any kind of substantial difference in the implementation (and not just
in the supported feature set) will lead to lots of confusion when
those differences become apparent. The key thing is that what's
substantial to one is unimportant to another.
e.g.: A quick scan of that page shows two really big differences
between .NET and Java -- only the latter has possessive quantifiers
(which I've come to love for certain tight jams), and only the former
has named groups (a feature I dearly miss from my python days when in
the Java world). I don't think we do anyone any favors trying to come
up with a supported regex variant that is only the intersection of the
host platforms that are of interest (which, of course, will change).
I guess I'm just missing the boat on the motivation here. Will the
same principle apply for networking and graphics and concurrency,
too? I can't imagine so...
- Chas