Message from discussion
Wrap up -- I was wrong (was Re: [json-schema] Re: Open question: should regex support be removed from the JSON Schema specification altogether?)
Received: by 10.58.170.4 with SMTP id ai4mr3223499vec.10.1347099150587;
Sat, 08 Sep 2012 03:12:30 -0700 (PDT)
X-BeenThere: json-schema@googlegroups.com
Received: by 10.220.155.6 with SMTP id q6ls212291vcw.3.gmail; Sat, 08 Sep 2012
03:12:30 -0700 (PDT)
Received: by 10.58.58.100 with SMTP id p4mr3306578veq.38.1347099150104;
Sat, 08 Sep 2012 03:12:30 -0700 (PDT)
Received: by 10.58.58.100 with SMTP id p4mr3306577veq.38.1347099150094;
Sat, 08 Sep 2012 03:12:30 -0700 (PDT)
Return-Path: <fgalie...@gmail.com>
Received: from mail-vb0-f49.google.com (mail-vb0-f49.google.com [209.85.212.49])
by gmr-mx.google.com with ESMTPS id r14si1751887vdu.1.2012.09.08.03.12.30
(version=TLSv1/SSLv3 cipher=OTHER);
Sat, 08 Sep 2012 03:12:30 -0700 (PDT)
Received-SPF: pass (google.com: domain of fgalie...@gmail.com designates 209.85.212.49 as permitted sender) client-ip=209.85.212.49;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of fgalie...@gmail.com designates 209.85.212.49 as permitted sender) smtp.mail=fgalie...@gmail.com; dkim=pass header...@gmail.com
Received: by vbbfo1 with SMTP id fo1so588297vbb.8
for <json-schema@googlegroups.com>; Sat, 08 Sep 2012 03:12:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:date:message-id:subject:from:to:content-type
:content-transfer-encoding;
bh=zTOFyEeEsQO3Z3645MO1n2hUqdFg7/MQA0cYc1jS1vE=;
b=u/b7ok7hIQEJKhRNkAHE44BmdA3teh2w/hAQmQKxpwKMfjKdvkIsjnjDnnu8b6/aTO
JDBScz7VnasVZl4zDcxDWniC5KML5JxnQVIEpy3hBnQRqQIJZzHBZGJXfa6Kj1nu/s7t
02PzavezHvHvgUiDHvQZGUCdQFozF/0GJC3AyXcvYA4eckwyTA2O9qyH5QlMGXZnHdgF
Qd+kJaJez5r3xnJFO6yl2/FFRFv8PrypL3aQU3x/aMAxLxvivJY51vVuubjLK8hSn81g
PWWe2qaHGoml2nuN2VvD3zuvocAPzQKl8pMi2ZP5ttxBMs9NK636kdDPhFQKmSe1Aq8l
YB1A==
MIME-Version: 1.0
Received: by 10.220.221.72 with SMTP id ib8mr10662810vcb.25.1347099149687;
Sat, 08 Sep 2012 03:12:29 -0700 (PDT)
Received: by 10.52.23.103 with HTTP; Sat, 8 Sep 2012 03:12:29 -0700 (PDT)
Date: Sat, 8 Sep 2012 12:12:29 +0200
Message-ID: <CALcybBBe9sE8LfAhpEedjz_PMwBPfinchmmk16K+5zVjS6g...@mail.gmail.com>
Subject: Wrap up -- I was wrong (was Re: [json-schema] Re: Open question:
should regex support be removed from the JSON Schema specification altogether?)
From: Francis Galiegue <fgalie...@gmail.com>
To: json-schema@googlegroups.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Fri, Sep 7, 2012 at 6:53 PM, penduin <owensw...@gmail.com> wrote:
> I wouldn't miss them personally, but I can see useful cases for both
> "pattern" and "patternProperties", and I think there is a need for some t=
ype
> of pattern-matching, though regex (specifically ECMA 262 regex) is probab=
ly
> overkill. In WJElement (written in C) this stuff gets handled by the
> standard GNU regex library, with the ability to plug in a different regex
> handler should the need arise. In the meantime, we just don't care about
> the differences; if we're doing a big fancy implementation-specific regex=
in
> schema, we're doing something wrong. ;^)
>
> If what we're after is a spec that's easy to strictly implement in any
> language (that seems a worthy goal) then ECMA 262 probably should not be =
the
> pattern-matching method of choice. Regular expressions could be ditched
> altogether, or perhaps the spec can MAY and SHOULD its way around this
> issue; a validator looking to be widely-used can document its regex-handl=
ing
> details or maybe be configurable. (I know that approach rubs some people
> the wrong way, but that's my non-OCD pragmatist tinkerer side talking)
>
> A lowest-common-denominator regex subset (or even something as basic as a
> handful of wildcards) would be just fine as far as I'm concerned. I have=
n't
> had any real-world cases come up for either "pattern" or
> "patternProperties", though we did consider "patternProperties" for one o=
f
> our schemas. (I forget what we worked out instead, but it simplified our
> lives a bit.)
>
That is a nice summary, thank you! And I agree that a lowest common
denominator subset would be nice too. Defining it can be tricky,
though. From what I see, regex constructs which can safely be used
are:
* character classes ("[a-z]" etc);
* the "+", "*" and "?" quantifiers, along with their "lazy" versions
("+?", "*?", "??") -- even though I positively loathe the latters :p
* alternation ("|"), grouping ("( ... )") -- BUT NOT non capturing
grouping like "(?: ... )";
* backreferences ("\1", etc)
Disallowed: anything else! No language-specific character classes (not
even "\d" and "\w" -- those differ between regex dialects, for
instance \w will only do ASCII in Java but the full Unicode charset in
JavaScript, and similarly "\d" in .NET languages matches any Unicode
digit and not only "[0-9]"), no possessive quantifiers ("*+", "++",
"?+"), no named captures (the syntax of which differ among languates
anyway) etc.
--=20
Francis Galiegue, fgalie...@gmail.com
JSON Schema: https://github.com/json-schema
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (St=C3=A9phane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)