On Fri, Sep 7, 2012 at 6:53 PM, penduin <
owen...@gmail.com> wrote:
> I wouldn't miss them personally, but I can see useful cases for both
> "pattern" and "patternProperties", and I think there is a need for some type
> of pattern-matching, though regex (specifically ECMA 262 regex) is probably
> overkill. In WJElement (written in C) this stuff gets handled by the
> standard GNU regex library, with the ability to plug in a different regex
> handler should the need arise. In the meantime, we just don't care about
> the differences; if we're doing a big fancy implementation-specific regex in
> schema, we're doing something wrong. ;^)
>
> If what we're after is a spec that's easy to strictly implement in any
> language (that seems a worthy goal) then ECMA 262 probably should not be the
> pattern-matching method of choice. Regular expressions could be ditched
> altogether, or perhaps the spec can MAY and SHOULD its way around this
> issue; a validator looking to be widely-used can document its regex-handling
> details or maybe be configurable. (I know that approach rubs some people
> the wrong way, but that's my non-OCD pragmatist tinkerer side talking)
>
> A lowest-common-denominator regex subset (or even something as basic as a
> handful of wildcards) would be just fine as far as I'm concerned. I haven't
> had any real-world cases come up for either "pattern" or
> "patternProperties", though we did consider "patternProperties" for one of
> our schemas. (I forget what we worked out instead, but it simplified our
> lives a bit.)
>
That is a nice summary, thank you! And I agree that a lowest common
denominator subset would be nice too. Defining it can be tricky,
though. From what I see, regex constructs which can safely be used
are:
* character classes ("[a-z]" etc);
* the "+", "*" and "?" quantifiers, along with their "lazy" versions
("+?", "*?", "??") -- even though I positively loathe the latters :p
* alternation ("|"), grouping ("( ... )") -- BUT NOT non capturing
grouping like "(?: ... )";
* backreferences ("\1", etc)
Disallowed: anything else! No language-specific character classes (not
even "\d" and "\w" -- those differ between regex dialects, for
instance \w will only do ASCII in Java but the full Unicode charset in
JavaScript, and similarly "\d" in .NET languages matches any Unicode
digit and not only "[0-9]"), no possessive quantifiers ("*+", "++",
"?+"), no named captures (the syntax of which differ among languates
anyway) etc.
--
Francis Galiegue,
fgal...@gmail.com
JSON Schema:
https://github.com/json-schema
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)