So here is my attempt formalize a simple proposal.
The reader should take the literal contents of #"..." and pass to Pattern.compile as a raw string, making no changes to the contents. That means all backslashes (\) and double quotes (") would be passed right in. The only other thing the reader need concern itself with, is that when it sees a \" it should not treat that double-quote as the end of the pattern, but rather keep on doing until it sees a double-quote that is not preceded by a backslash. Nevertheless is would pass both the quoting \ and the following " to Pattern.compile.
That's it. Simple. It works because Java's Pattern itself understands backslash quoting, including literal chars like backslash and double quote, hex and octal patterns, as well as other regex patterns.
Some examples:
1. Simple text (re-find #"foo" "foo") --> "foo"
2. Pre-defined character class (re-find #"\w*" "foo!@#$") --> "foo"
3. Special character (regex and string) (re-find #"\t" "\t") --> "\t"
4. Scary special character (regex only) Note that the escape sequences available inside #"" are Java Pattern escape sequences, and therefore by definition different from Clojure String escape sequences. Of course this is what you need for \w and such to work: (re-find #"\a" "\u0007") --> beep ""
5. Special character (string only) The revere of the previous example -- Clojure strings understand "\b" as (str \backspace), but Java patterns do not, so this example uses hex instead: (re-find #"\x08" "\b") --> "\b"
6. Hex (re-find #"\x31" "1") --> "1"
7. Octal (re-find #"\061" "1") --> "1"
8. Word boundary: (re-find #"\bfoo" "foo") --> "foo"
9. Quoting fun -- double quote, a single character: (re-find #"\"" "\"") --> "\""
10. Quoting fun -- backslash, a single character: (re-find #"\\" "\\") --> "\\"
11. Open paren (re-find #"\(" "(") --> "("
I think this demonstrates you can create any pattern you might need. For reference, here are the above patterns expressed in the current (not the proposed) reader syntax:
1. #"foo" 2. #"\\w*" 3. #"\t" or #"\\t" 4. #"\\a" (but #"\a" makes the reader blow up) 5. #"\\x08" 6. #"\\x31" 7. #"\061" or #"\\061" 8. #"\\bfoo" (note #"\bfoo" is legal, but doesn't do what you want) 9. #"\"" or #"\\\"" (but #"\\"" blows up the reader) 10. #"\\\\" (but #"\\" is illegal) 11. #"\\(" (but #"\(" is illegal)
Somehow I'm not sure that communicates how much I dislike the current syntax. Oh well, maybe others can chime in on that point. I implemented this to provide the examples above, not because I think this is a done deal or anything. Please comment!
Here is a new print method to match the attached patch to LispReader:
(defmethod print-method java.util.regex.Pattern [p w] (.write w "#\"") (.write w (.pattern p)) (.write w "\""))
That print method will take a bit more work to properly quote some Patterns that could be created by means other than the Clojure literal.
> So here is my attempt formalize a simple proposal.
> The reader should take the literal contents of #"..." and pass to > Pattern.compile as a raw string, making no changes to the contents. > That means all backslashes (\) and double quotes (") would be passed > right in. The only other thing the reader need concern itself with, > is that when it sees a \" it should not treat that double-quote as the > end of the pattern, but rather keep on doing until it sees a > double-quote that is not preceded by a backslash. Nevertheless is > would pass both the quoting \ and the following " to Pattern.compile.
> That's it. Simple. It works because Java's Pattern itself understands > backslash quoting, including literal chars like backslash and double > quote, hex and octal patterns, as well as other regex patterns.
> Some examples:
> 1. Simple text > (re-find #"foo" "foo") --> "foo"
> 2. Pre-defined character class > (re-find #"\w*" "foo!@#$") --> "foo"
> 3. Special character (regex and string) > (re-find #"\t" "\t") --> "\t"
> 4. Scary special character (regex only) > Note that the escape sequences available inside #"" are Java Pattern > escape sequences, and therefore by definition different from Clojure > String escape sequences. Of course this is what you need for \w and > such to work: > (re-find #"\a" "\u0007") --> beep ""
> 5. Special character (string only) > The revere of the previous example -- Clojure strings understand "\b" > as (str \backspace), but Java patterns do not, so this example uses > hex instead: > (re-find #"\x08" "\b") --> "\b"
> 6. Hex > (re-find #"\x31" "1") --> "1"
> 7. Octal > (re-find #"\061" "1") --> "1"
> 8. Word boundary: > (re-find #"\bfoo" "foo") --> "foo"
> 9. Quoting fun -- double quote, a single character: > (re-find #"\"" "\"") --> "\""
> 10. Quoting fun -- backslash, a single character: > (re-find #"\\" "\\") --> "\\"
> 11. Open paren > (re-find #"\(" "(") --> "("
> I think this demonstrates you can create any pattern you might need. > For reference, here are the above patterns expressed in the current > (not the proposed) reader syntax:
> 1. #"foo" > 2. #"\\w*" > 3. #"\t" or #"\\t" > 4. #"\\a" (but #"\a" makes the reader blow up) > 5. #"\\x08" > 6. #"\\x31" > 7. #"\061" or #"\\061" > 8. #"\\bfoo" (note #"\bfoo" is legal, but doesn't do what you want) > 9. #"\"" or #"\\\"" (but #"\\"" blows up the reader) > 10. #"\\\\" (but #"\\" is illegal) > 11. #"\\(" (but #"\(" is illegal)
> Somehow I'm not sure that communicates how much I dislike the current > syntax. Oh well, maybe others can chime in on that point. I > implemented this to provide the examples above, not because I think > this is a done deal or anything. Please comment!
> Here is a new print method to match the attached patch to LispReader:
> (defmethod print-method java.util.regex.Pattern [p w] > (.write w "#\"") > (.write w (.pattern p)) > (.write w "\""))
> That print method will take a bit more work to properly quote some > Patterns that could be created by means other than the Clojure > literal.
Is it bad etiquette to reply to myself? I thought it might be useful to compare the proposed syntax with that of other languages with good regex support.
I tried all the examples from my previous message in Perl, Python, Ruby, and JavaScript. All but Python have literal regex syntax, while Python has a raw string format that is generally used for regular expressions. All but JavaScript have multiple quote characters which allowed me to use double quotes just like Clojure:
Clojure: #"foo" Perl: m"foo" (although m/foo/ or just /foo/ is more common) Python: r"foo" (you can also use r'foo' or r"""foo""") Ruby: %r"foo" (or %r/foo/ or just /foo/) JS: /foo/
All the examples for the proposed new Clojure syntax work the same in all these languages (with the exception of example 4 in JavaScript, where \a means a plain letter a instead of ASCII 7). If instead you escape things the way you currently have to in Clojure, many of the expressions don't work or mean something different in the other languages.
In other words, under the proposed syntax Clojure regex literals would be less surprising for people used to any of these other languages.
> Is it bad etiquette to reply to myself? I thought it might be useful
> to compare the proposed syntax with that of other languages with good
> regex support.
> I tried all the examples from my previous message in Perl, Python,
> Ruby, and JavaScript. All but Python have literal regex syntax, while
> Python has a raw string format that is generally used for regular
> expressions. All but JavaScript have multiple quote characters which
> allowed me to use double quotes just like Clojure:
> Clojure: #"foo"
> Perl: m"foo" (although m/foo/ or just /foo/ is more common)
> Python: r"foo" (you can also use r'foo' or r"""foo""")
> Ruby: %r"foo" (or %r/foo/ or just /foo/)
> JS: /foo/
> All the examples for the proposed new Clojure syntax work the same in
> all these languages (with the exception of example 4 in JavaScript,
> where \a means a plain letter a instead of ASCII 7). If instead you
> escape things the way you currently have to in Clojure, many of the
> expressions don't work or mean something different in the other
> languages.
> In other words, under the proposed syntax Clojure regex literals would
> be less surprising for people used to any of these other languages.
Will existing Clojure regex (consumer) code need to change, i.e. will
people need to modify their existing #"" literals and if so in what
way?
> On Oct 7, 11:18 pm, Chouser <chou...@gmail.com> wrote:
> > Is it bad etiquette to reply to myself? I thought it might be useful
> > to compare the proposed syntax with that of other languages with good
> > regex support.
> > I tried all the examples from my previous message in Perl, Python,
> > Ruby, and JavaScript. All but Python have literal regex syntax, while
> > Python has a raw string format that is generally used for regular
> > expressions. All but JavaScript have multiple quote characters which
> > allowed me to use double quotes just like Clojure:
> > Clojure: #"foo"
> > Perl: m"foo" (although m/foo/ or just /foo/ is more common)
> > Python: r"foo" (you can also use r'foo' or r"""foo""")
> > Ruby: %r"foo" (or %r/foo/ or just /foo/)
> > JS: /foo/
> > All the examples for the proposed new Clojure syntax work the same in
> > all these languages (with the exception of example 4 in JavaScript,
> > where \a means a plain letter a instead of ASCII 7). If instead you
> > escape things the way you currently have to in Clojure, many of the
> > expressions don't work or mean something different in the other
> > languages.
> > In other words, under the proposed syntax Clojure regex literals would
> > be less surprising for people used to any of these other languages.
> Will existing Clojure regex (consumer) code need to change, i.e. will
> people need to modify their existing #"" literals and if so in what
> way?
On Wed, Oct 8, 2008 at 8:08 AM, Rich Hickey <richhic...@gmail.com> wrote:
> Will existing Clojure regex (consumer) code need to change, i.e. will > people need to modify their existing #"" literals and if so in what > way?
Yes, many existing #"" literals would have new meaning or become invalid under this proposal.
Some patterns wouldn't have to be changed. Here are a couple examples: #"foo" #"(one) *(two)" #"/this.*/that"
By far the most common change people would have to make would be to remove doubled back-slashes: old: #"\\w" new: #"\w" old: #"\\(" new: #"\(" old: #"\\bword\\b" new: #"\bword\b"
Reading through Clojures string reader and Java's Pattern docs, the only other anomaly I've spotted is if someone was using \b to mean \backspace. With the proposed change, #"\b" means word boundary, so: old: #"\b" new: #"\x08"
Most of the time, failing to update your regex literal will result in a valid regex that means something different. Put another way, things that used to match like you wanted will just stop matching. In a few cases (such as #"\\(") what used to be a valid regex will throw an exception at read time, with a detailed error message pointing out the position of the illegal paren.
Of course if this change is unacceptable, these proposed rules could be applied to a new dispatch macro. One option would be something like #r/foo/ that would allow your choice of delimiters to further reduce the need for back-slash quoting within the regex.
On Wed, Oct 8, 2008 at 11:03 AM, Chouser <chou...@gmail.com> wrote:
> Of course if this change is unacceptable, these proposed rules could > be applied to a new dispatch macro. One option would be something > like #r/foo/ that would allow your choice of delimiters to further > reduce the need for back-slash quoting within the regex.
Personally, I'd vote for this. Allowing for a choice of delimiters is a very useful feature of Ruby. That said, I'd be happy with the new #"..." syntax proposed in this thread ... Anything to avoid the double-escaping.
> On Wed, Oct 8, 2008 at 8:08 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > Will existing Clojure regex (consumer) code need to change, i.e. will
> > people need to modify their existing #"" literals and if so in what
> > way?
> Yes, many existing #"" literals would have new meaning or become
> invalid under this proposal.
> Some patterns wouldn't have to be changed. Here are a couple examples:
> #"foo"
> #"(one) *(two)"
> #"/this.*/that"
> By far the most common change people would have to make would be to
> remove doubled back-slashes:
> old: #"\\w" new: #"\w"
> old: #"\\(" new: #"\("
> old: #"\\bword\\b" new: #"\bword\b"
> Reading through Clojures string reader and Java's Pattern docs, the
> only other anomaly I've spotted is if someone was using \b to mean
> \backspace. With the proposed change, #"\b" means word boundary, so:
> old: #"\b" new: #"\x08"
> Most of the time, failing to update your regex literal will result in
> a valid regex that means something different. Put another way, things
> that used to match like you wanted will just stop matching. In a few
> cases (such as #"\\(") what used to be a valid regex will throw an
> exception at read time, with a detailed error message pointing out the
> position of the illegal paren.
> Of course if this change is unacceptable, these proposed rules could
> be applied to a new dispatch macro. One option would be something
> like #r/foo/ that would allow your choice of delimiters to further
> reduce the need for back-slash quoting within the regex.
Yes, dealing with multiple escaped escape sequences is a weak point of
handling regexs. To really integrate them into the language, and that
integration is essential to success, they must be typed straight into
the code literally without baggage.
As much as flexibility and simplicity, the easy application of
powerful first class regular expressions to text processing was the
killer app for Perl. No programmer wants to accept second best
anymore.
-Brian
On Oct 8, 4:24 pm, Bob <bobnnamt...@gmail.com> wrote:
> Yes, I like this. The double backslashing is really a pain and
> confused me at first. Get off that road now while clojure is still
> pretty new.
> Bob
> On Oct 8, 9:03 am, Chouser <chou...@gmail.com> wrote:
> > On Wed, Oct 8, 2008 at 8:08 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > > Will existing Clojure regex (consumer) code need to change, i.e. will
> > > people need to modify their existing #"" literals and if so in what
> > > way?
> > Yes, many existing #"" literals would have new meaning or become
> > invalid under this proposal.
> > Some patterns wouldn't have to be changed. Here are a couple examples:
> > #"foo"
> > #"(one) *(two)"
> > #"/this.*/that"
> > By far the most common change people would have to make would be to
> > remove doubled back-slashes:
> > old: #"\\w" new: #"\w"
> > old: #"\\(" new: #"\("
> > old: #"\\bword\\b" new: #"\bword\b"
> > Reading through Clojures string reader and Java's Pattern docs, the
> > only other anomaly I've spotted is if someone was using \b to mean
> > \backspace. With the proposed change, #"\b" means word boundary, so:
> > old: #"\b" new: #"\x08"
> > Most of the time, failing to update your regex literal will result in
> > a valid regex that means something different. Put another way, things
> > that used to match like you wanted will just stop matching. In a few
> > cases (such as #"\\(") what used to be a valid regex will throw an
> > exception at read time, with a detailed error message pointing out the
> > position of the illegal paren.
> > Of course if this change is unacceptable, these proposed rules could
> > be applied to a new dispatch macro. One option would be something
> > like #r/foo/ that would allow your choice of delimiters to further
> > reduce the need for back-slash quoting within the regex.
> Of course if this change is unacceptable, these proposed rules could > be applied to a new dispatch macro. One option would be something > like #r/foo/ that would allow your choice of delimiters to further > reduce the need for back-slash quoting within the regex.
I like both proposed changes: new escape rules for #" , and #r with arbitrary delimiters. Thanks for moving the issue along.
Comparing the two, I think the arbitrary delimiter allowed by #r is very attractive. The only potential downside I see with that is that it requires a smarter "clojure mode" in an editor to know how to find the end in source code. (I use emacs, so I trust I'll be all set. I suspect "clojure mode" most other editors can be adapted to handle it.)
With #r in place, I would be in favor of leaving #" as it is now, and possibly deprecating and removing it over time.
What about having #"pattern" work like is does now, and then having #/
pattern/ work similarly to Ruby, Python, Perl, etc. regular expression
in that they not require double escaping of characters like '\'? So
in other words:
#/<em>(.*?)<\/em>/
instead of:
#"<em>(.*?)<\\/em>"
The advantage to this is that it is backwards compatible. I don't
think the arbitrary delimiter is as necessary as not having to include
extra escaping characters because it is a string.
On Oct 9, 1:14 pm, "Stephen C. Gilardi" <squee...@mac.com> wrote:
> > Of course if this change is unacceptable, these proposed rules could
> > be applied to a new dispatch macro. One option would be something
> > like #r/foo/ that would allow your choice of delimiters to further
> > reduce the need for back-slash quoting within the regex.
> I like both proposed changes: new escape rules for #" , and #r with
> arbitrary delimiters. Thanks for moving the issue along.
> Comparing the two, I think the arbitrary delimiter allowed by #r is
> very attractive. The only potential downside I see with that is that
> it requires a smarter "clojure mode" in an editor to know how to find
> the end in source code. (I use emacs, so I trust I'll be all set. I
> suspect "clojure mode" most other editors can be adapted to handle it.)
> With #r in place, I would be in favor of leaving #" as it is now, and
> possibly deprecating and removing it over time.
On Thu, Oct 9, 2008 at 2:03 PM, Paul Barry <pauljbar...@gmail.com> wrote:
> What about having #"pattern" work like is does now, and then having #/ > pattern/ work similarly to Ruby, Python, Perl, etc. regular expression > in that they not require double escaping of characters like '\'? So > in other words:
I would vote against any new syntax that doesn't allow for a user-chosen delimiter. If we introduce something new, it should solve the problem more completely than that.
> The advantage to this is that it is backwards compatible.
That's true and good, and if Rich is open to it, I think #r/foo/ or #~/foo/ or something would be a great choice, allowing for / or " or perhaps even () for delimiting the regex.
> I don't think the arbitrary delimiter is as necessary as not having to include > extra escaping characters because it is a string.
I actually prefer " over / as the only allowed delimiter. Matching file paths with / as the delimiter is not uncommon, and rather painful: #/\/usr\/lib\/*.so/
The contexts where " have to be quoted often don't seem quite as bad: #"<img src=\"file://foo/bar\">"
> On Thu, Oct 9, 2008 at 2:03 PM, Paul Barry <pauljbar...@gmail.com> wrote:
> > What about having #"pattern" work like is does now, and then having #/
> > pattern/ work similarly to Ruby, Python, Perl, etc. regular expression
> > in that they not require double escaping of characters like '\'? So
> > in other words:
> I would vote against any new syntax that doesn't allow for a
> user-chosen delimiter. If we introduce something new, it should solve
> the problem more completely than that.
> > The advantage to this is that it is backwards compatible.
> That's true and good, and if Rich is open to it, I think #r/foo/ or
> #~/foo/ or something would be a great choice, allowing for / or " or
> perhaps even () for delimiting the regex.
> > I don't think the arbitrary delimiter is as necessary as not having to include
> > extra escaping characters because it is a string.
> I actually prefer " over / as the only allowed delimiter. Matching
> file paths with / as the delimiter is not uncommon, and rather
> painful:
> #/\/usr\/lib\/*.so/
> The contexts where " have to be quoted often don't seem quite as bad:
> #"<img src=\"file://foo/bar\">"
Let's stay on track with your first simple proposal - eliminating
escaping of \ and \" being non-terminating.
Arbitrary delimiters begs the question of why not in strings too, and
I think the editor/tools issues are real, as well as the general
inability to grok code with user-defined character interpretation.
The question is, are people willing to deal with the breakage in the
short term?
Could there be a reader warning, if \\ is seen in patterns?
Is there a possible conversion or safe interpretation of existing
patterns (esp. \\\" and \\\\)
Clojure is still beta-ish, no? I say either go with the breakage, or use the #r"" without the user chosen delimiters.
The extra backquoting was admittedly a surprise for me at first. It seemed awkward, and unexpected, so if some breakage occurs, then I think its warranted. If breakage is a problem, then I'd vote for going with a new #r"" and deprecating the old syntax.
On Thu, Oct 9, 2008 at 3:09 PM, Rich Hickey <richhic...@gmail.com> wrote:
> The question is, are people willing to deal with the breakage in the > short term?
I certainly am, but that may not mean much. :-) Nobody has spoken out against it yet -- that's a got to be a good sign.
> Could there be a reader warning, if \\ is seen in patterns?
Not really, in general such patterns are valid. For example #"\\w*" currently means match zero or more word"characters. Proposed, #"\\w*" would mean match a backslash followed by any number of "w"s
I suppose we could have a warning that flags "likely" errors, and then a way to turn it off, but that doesn't sound like a nice solution.
> Is there a possible conversion or safe interpretation of existing > patterns (esp. \\\" and \\\\)
I'm not quite sure what you mean. Is this the part of the previous question?
#"\\\"" Currently matches the single char double-quote, Proposed would match a backslash followed by a double-quote
#"\\\\" Currently matches the single char backslash. Proposed would match a backslash followed by another backslash.
If you mean could old patterns be programmatically converted to new ones, the answer is yes. Somehow I don't think you're asking for a code walker that spits out what it reads, except with old regex literals replaced with new ones, but I suppose that could be done.
The only pattern I've found that is currently valid that wouldn't be under the proposal was mentioned in my earlier email: #"\\(" currently matches a literal open paren. Proposed would be interpreted as a backslash and the beginning of a group, but without the closing paren Pattern/compile will throw an exception.
> How tricky is the print side?
Simple for Patterns created using the proposed literal form, as the Pattern hangs on to the original string -- just slap #"" around it.
But I suppose that's not quite good enough, as you can build Patterns other ways: (Pattern/compile "\"") I think the only cause of trouble is the " char, so the Pattern's string would have to be scanned looking for " and determining if it's already quoted. If quoted, leave it alone; if unquoted, insert a backslash. I'll code it up, unless this proposal is already dead on one of the other points.
On Thursday 09 October 2008 12:09, Rich Hickey wrote:
> ...
> The question is, are people willing to deal with the breakage in the > short term?
I have no skin in this game, so what I say is to be taken with a grain of salt, but...
If you make this change, millions of Clojure users for all eternity _will_ thank you.
On the other hand, if you make this change, a couple of dozen users _may_ mutter something under their breath at you once or twice and then acknowledge that the change was a good thing.
> On Thu, Oct 9, 2008 at 3:09 PM, Rich Hickey <richhic...@gmail.com> wrote:
> > The question is, are people willing to deal with the breakage in the
> > short term?
> I certainly am, but that may not mean much. :-)
> Nobody has spoken out against it yet -- that's a got to be a good sign.
> > Could there be a reader warning, if \\ is seen in patterns?
> Not really, in general such patterns are valid. For example #"\\w*"
> currently means match zero or more word"characters. Proposed, #"\\w*"
> would mean match a backslash followed by any number of "w"s
> I suppose we could have a warning that flags "likely" errors, and then
> a way to turn it off, but that doesn't sound like a nice solution.
> > Is there a possible conversion or safe interpretation of existing
> > patterns (esp. \\\" and \\\\)
> I'm not quite sure what you mean. Is this the part of the previous question?
> #"\\\"" Currently matches the single char double-quote, Proposed would
> match a backslash followed by a double-quote
> #"\\\\" Currently matches the single char backslash. Proposed would
> match a backslash followed by another backslash.
> If you mean could old patterns be programmatically converted to new
> ones, the answer is yes. Somehow I don't think you're asking for a
> code walker that spits out what it reads, except with old regex
> literals replaced with new ones, but I suppose that could be done.
> The only pattern I've found that is currently valid that wouldn't be
> under the proposal was mentioned in my earlier email: #"\\(" currently
> matches a literal open paren. Proposed would be interpreted as a
> backslash and the beginning of a group, but without the closing paren
> Pattern/compile will throw an exception.
> > How tricky is the print side?
> Simple for Patterns created using the proposed literal form, as the
> Pattern hangs on to the original string -- just slap #"" around it.
> But I suppose that's not quite good enough, as you can build Patterns
> other ways: (Pattern/compile "\"")
> I think the only cause of trouble is the " char, so the Pattern's
> string would have to be scanned looking for " and determining if it's
> already quoted. If quoted, leave it alone; if unquoted, insert a
> backslash. I'll code it up, unless this proposal is already dead on
> one of the other points.
On Thu, Oct 9, 2008 at 4:57 PM, Rich Hickey <richhic...@gmail.com> wrote:
> Go for it.
(defmethod print-method java.util.regex.Pattern [p #^Writer w] (.write w "#\"") (loop [[#^Character c & r :as s] (seq (.pattern #^java.util.regex.Pattern p))] (when s (cond (= c \\) (do (.append w \\) (.append w #^Character (first r)) (recur (rest r))) (= c \") (do (.write w "\\\"") (recur r)) :else (do (.append w c) (recur r))))) (.append w \"))
> On Thu, Oct 9, 2008 at 4:57 PM, Rich Hickey <richhic...@gmail.com> wrote:
>> Go for it.
> (defmethod print-method java.util.regex.Pattern [p #^Writer w] > (.write w "#\"") > (loop [[#^Character c & r :as s] (seq (.pattern #^java.util.regex.Pattern p))] > (when s > (cond > (= c \\) (do (.append w \\) > (.append w #^Character (first r)) > (recur (rest r))) > (= c \") (do (.write w "\\\"") > (recur r)) > :else (do (.append w c) > (recur r))))) > (.append w \"))
> --Chouser
Hello Chouser, (btw, nice work you are doing with ClojureScript)
Do you plan to support printing of all Pattern instances or only those created using the new literal syntax? In the first case, the story gets more complex with \Q, \E and flags. In the second case, that's fine.
On Fri, Oct 10, 2008 at 4:07 AM, Christophe Grand <christo...@cgrand.net> wrote:
> Hello Chouser, (btw, nice work you are doing with ClojureScript)
Thanks -- good to see you around again!
> Do you plan to support printing of all Pattern instances or only those > created using the new literal syntax? In the first case, the story gets > more complex with \Q, \E and flags. In the second case, that's fine.
I think I need to support all Patterns, and you're right I haven't thought sufficiently about \Q and \E.
Interestingly, I don't think there's any way to get a plain double-quote into a regex after a \Q under the proposal. I don't *think* that's a problem, as you're most likely to use \Q when building a pattern programmatically, and if you're not you can drop out of \Q mode to get your " in there.
If anyone disagrees with this conclusion, speak up!
Of course that means I need to do this for you when printing...
If my print method is correct, prn1 and prn2 should always be the same, and thus PASS. Also what you're seeing there is the "new format". In this case, both the old and new formats are the same.
If you pass in an optional second string, test-re will try to match it with both patterns:
user=> (test-re "a \\\"\\w*\\\" please" "a \"word\" please") raw str: a \"\w*\" please prn1: #"a \"\w*\" please" prn2: #"a \"\w*\" please" PASS match1: a "word" please match2: a "word" please PASS
Again, if my patch if my patch is correct, match1 and match2 should be the same, and thus another PASS. Here you can see how the new format (prn1 and prn2) are simpler than the old format, as well as being identical to what Pattern actually operates on (raw str).
Here's a truly nasty example prompted by Christophe Grand's comment:
> On Fri, Oct 10, 2008 at 7:48 AM, Chouser <chou...@gmail.com> wrote:
> > Of course that means I need to do this for you when printing...
> I've attached an updated patch with a new print method, against the
> latest SVN 1058.
> If you really want to dig into the ugliness, here's a test I've been
> using. Apply the attached patch, rebuild clojure, and then load up
> this code:
> If my print method is correct, prn1 and prn2 should always be the
> same, and thus PASS. Also what you're seeing there is the "new
> format". In this case, both the old and new formats are the same.
> If you pass in an optional second string, test-re will try to match it
> with both patterns:
> user=> (test-re "a \\\"\\w*\\\" please" "a \"word\" please")
> raw str: a \"\w*\" please
> prn1: #"a \"\w*\" please"
> prn2: #"a \"\w*\" please"
> PASS
> match1: a "word" please
> match2: a "word" please
> PASS
> Again, if my patch if my patch is correct, match1 and match2 should be
> the same, and thus another PASS. Here you can see how the new format
> (prn1 and prn2) are simpler than the old format, as well as being
> identical to what Pattern actually operates on (raw str).
> Here's a truly nasty example prompted by Christophe Grand's comment:
On Wed, Oct 15, 2008 at 7:12 PM, Rich Hickey <richhic...@gmail.com> wrote:
> Patch applied - SVN rev 1070 - thanks!
Possible reader docs:
Regex patterns (#"pattern") A regex pattern is read and compiled at read time. The pattern is passed directly to the java.util.regex.Pattern compile method, so its quoting rules apply instead of the string literal quoting rules. This means that unlike string literals, backslashes in regex patterns do not need to be escaped with another backslash. For example, #"\d*" matches zero or more digits.